add new classifier result

author: Christian Krinitsin <mail@krinitsin.com> 2025-06-03 12:04:13 +0000
committer: Christian Krinitsin <mail@krinitsin.com> 2025-06-03 12:04:13 +0000
commit: 256709d2eb3fd80d768a99964be5caa61effa2a0 (patch)
tree: 05b2352fba70923126836a64b6a0de43902e976a /results/classifier/004/other
parent: 2ab14fa96a6c5484b5e4ba8337551bb8dcc79cc5 (diff)
download: emulator-bug-study-256709d2eb3fd80d768a99964be5caa61effa2a0.tar.gz
emulator-bug-study-256709d2eb3fd80d768a99964be5caa61effa2a0.zip
39 files changed, 32381 insertions, 0 deletions
diff --git a/results/classifier/004/other/02364653 b/results/classifier/004/other/02364653
new file mode 100644
index 00000000..5b8062a6
--- /dev/null
+++ b/results/classifier/004/other/02364653
@@ -0,0 +1,371 @@
+other: 0.956
+graphic: 0.948
+semantic: 0.942
+assembly: 0.936
+device: 0.928
+instruction: 0.927
+boot: 0.925
+socket: 0.924
+vnc: 0.922
+mistranslation: 0.912
+KVM: 0.911
+network: 0.881
+
+[Qemu-devel] [BUG] Inappropriate size of target_sigset_t
+
+Hello, Peter, Laurent,
+
+While working on another problem yesterday, I think I discovered a 
+long-standing bug in QEMU Linux user mode: our target_sigset_t structure is 
+eight times smaller as it should be!
+
+In this code segment from syscalls_def.h:
+
+#ifdef TARGET_MIPS
+#define TARGET_NSIG        128
+#else
+#define TARGET_NSIG        64
+#endif
+#define TARGET_NSIG_BPW    TARGET_ABI_BITS
+#define TARGET_NSIG_WORDS  (TARGET_NSIG / TARGET_NSIG_BPW)
+
+typedef struct {
+    abi_ulong sig[TARGET_NSIG_WORDS];
+} target_sigset_t;
+
+... TARGET_ABI_BITS should be replaced by eight times smaller constant (in 
+fact, semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is 
+needed is actually "a byte per signal" in target_sigset_t, and we allow "a bit 
+per signal").
+
+All this probably sounds to you like something impossible, since this code is 
+in QEMU "since forever", but I checked everything, and the bug seems real. I 
+wish you can prove me wrong.
+
+I just wanted to let you know about this, given the sensitive timing of current 
+softfreeze, and the fact that I won't be able to do more investigation on this 
+in coming weeks, since I am busy with other tasks, but perhaps you can analyze 
+and do something which you consider appropriate.
+
+Yours,
+Aleksandar
+
+Le 03/07/2019 Ã  21:46, Aleksandar Markovic a Ã©critÂ :
+>
+Hello, Peter, Laurent,
+>
+>
+While working on another problem yesterday, I think I discovered a
+>
+long-standing bug in QEMU Linux user mode: our target_sigset_t structure is
+>
+eight times smaller as it should be!
+>
+>
+In this code segment from syscalls_def.h:
+>
+>
+#ifdef TARGET_MIPS
+>
+#define TARGET_NSIG      128
+>
+#else
+>
+#define TARGET_NSIG      64
+>
+#endif
+>
+#define TARGET_NSIG_BPW          TARGET_ABI_BITS
+>
+#define TARGET_NSIG_WORDS  (TARGET_NSIG / TARGET_NSIG_BPW)
+>
+>
+typedef struct {
+>
+abi_ulong sig[TARGET_NSIG_WORDS];
+>
+} target_sigset_t;
+>
+>
+... TARGET_ABI_BITS should be replaced by eight times smaller constant (in
+>
+fact, semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is
+>
+needed is actually "a byte per signal" in target_sigset_t, and we allow "a
+>
+bit per signal").
+TARGET_NSIG is divided by TARGET_ABI_BITS which gives you the number of
+abi_ulong words we need in target_sigset_t.
+
+>
+All this probably sounds to you like something impossible, since this code is
+>
+in QEMU "since forever", but I checked everything, and the bug seems real. I
+>
+wish you can prove me wrong.
+>
+>
+I just wanted to let you know about this, given the sensitive timing of
+>
+current softfreeze, and the fact that I won't be able to do more
+>
+investigation on this in coming weeks, since I am busy with other tasks, but
+>
+perhaps you can analyze and do something which you consider appropriate.
+If I compare with kernel, it looks good:
+
+In Linux:
+
+  arch/mips/include/uapi/asm/signal.h
+
+  #define _NSIG           128
+  #define _NSIG_BPW       (sizeof(unsigned long) * 8)
+  #define _NSIG_WORDS     (_NSIG / _NSIG_BPW)
+
+  typedef struct {
+          unsigned long sig[_NSIG_WORDS];
+  } sigset_t;
+
+_NSIG_BPW is 8 * 8 = 64 on MIPS64 or 4 * 8 = 32 on MIPS
+
+In QEMU:
+
+TARGET_NSIG_BPW is TARGET_ABI_BITS which is  TARGET_LONG_BITS which is
+64 on MIPS64 and 32 on MIPS.
+
+I think there is no problem.
+
+Thanks,
+Laurent
+
+From: Laurent Vivier <address@hidden>
+>
+If I compare with kernel, it looks good:
+>
+...
+>
+I think there is no problem.
+Sure, thanks for such fast response - again, I am glad if you are right. 
+However, for some reason, glibc (and musl too) define sigset_t differently than 
+kernel. Please take a look. I am not sure if this is covered fine in our code.
+
+Yours,
+Aleksandar
+
+>
+Thanks,
+>
+Laurent
+
+On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote:
+>
+>
+From: Laurent Vivier <address@hidden>
+>
+> If I compare with kernel, it looks good:
+>
+> ...
+>
+> I think there is no problem.
+>
+>
+Sure, thanks for such fast response - again, I am glad if you are right.
+>
+However, for some reason, glibc (and musl too) define sigset_t differently
+>
+than kernel. Please take a look. I am not sure if this is covered fine in our
+>
+code.
+Yeah, the libc definitions of sigset_t don't match the
+kernel ones (this is for obscure historical reasons IIRC).
+We're providing implementations of the target
+syscall interface, so our target_sigset_t should be the
+target kernel's version (and the target libc's version doesn't
+matter to us). On the other hand we will be using the
+host libc version, I think, so a little caution is required
+and it's possible we have some bugs in our code.
+
+thanks
+-- PMM
+
+>
+From: Peter Maydell <address@hidden>
+>
+>
+On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote:
+>
+>
+>
+> From: Laurent Vivier <address@hidden>
+>
+> > If I compare with kernel, it looks good:
+>
+> > ...
+>
+> > I think there is no problem.
+>
+>
+>
+> Sure, thanks for such fast response - again, I am glad if you are right.
+>
+> However, for some reason, glibc (and musl too) define sigset_t differently
+>
+> than kernel. Please take a look. I am not sure if this is covered fine in
+>
+> our code.
+>
+>
+Yeah, the libc definitions of sigset_t don't match the
+>
+kernel ones (this is for obscure historical reasons IIRC).
+>
+We're providing implementations of the target
+>
+syscall interface, so our target_sigset_t should be the
+>
+target kernel's version (and the target libc's version doesn't
+>
+matter to us). On the other hand we will be using the
+>
+host libc version, I think, so a little caution is required
+>
+and it's possible we have some bugs in our code.
+OK, I gather than this is not something that requires our immediate attention 
+(for 4.1), but we can analyze it later on.
+
+Thanks for response!!
+
+Sincerely,
+Aleksandar
+
+>
+thanks
+>
+-- PMM
+
+Le 03/07/2019 Ã  22:28, Peter Maydell a Ã©critÂ :
+>
+On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote:
+>
+>
+>
+> From: Laurent Vivier <address@hidden>
+>
+>> If I compare with kernel, it looks good:
+>
+>> ...
+>
+>> I think there is no problem.
+>
+>
+>
+> Sure, thanks for such fast response - again, I am glad if you are right.
+>
+> However, for some reason, glibc (and musl too) define sigset_t differently
+>
+> than kernel. Please take a look. I am not sure if this is covered fine in
+>
+> our code.
+>
+>
+Yeah, the libc definitions of sigset_t don't match the
+>
+kernel ones (this is for obscure historical reasons IIRC).
+>
+We're providing implementations of the target
+>
+syscall interface, so our target_sigset_t should be the
+>
+target kernel's version (and the target libc's version doesn't
+>
+matter to us). On the other hand we will be using the
+>
+host libc version, I think, so a little caution is required
+>
+and it's possible we have some bugs in our code.
+It's why we need host_to_target_sigset_internal() and
+target_to_host_sigset_internal() that translates bits and bytes between
+guest kernel interface and host libc interface.
+
+void host_to_target_sigset_internal(target_sigset_t *d,
+                                    const sigset_t *s)
+{
+    int i;
+    target_sigemptyset(d);
+    for (i = 1; i <= TARGET_NSIG; i++) {
+        if (sigismember(s, i)) {
+            target_sigaddset(d, host_to_target_signal(i));
+        }
+    }
+}
+
+void target_to_host_sigset_internal(sigset_t *d,
+                                    const target_sigset_t *s)
+{
+    int i;
+    sigemptyset(d);
+    for (i = 1; i <= TARGET_NSIG; i++) {
+        if (target_sigismember(s, i)) {
+            sigaddset(d, target_to_host_signal(i));
+        }
+    }
+}
+
+Thanks,
+Laurent
+
+Hi Aleksandar,
+
+On Wed, Jul 3, 2019 at 12:48 PM Aleksandar Markovic
+<address@hidden> wrote:
+>
+#define TARGET_NSIG_BPW    TARGET_ABI_BITS
+>
+#define TARGET_NSIG_WORDS  (TARGET_NSIG / TARGET_NSIG_BPW)
+>
+>
+typedef struct {
+>
+abi_ulong sig[TARGET_NSIG_WORDS];
+>
+} target_sigset_t;
+>
+>
+... TARGET_ABI_BITS should be replaced by eight times smaller constant (in
+>
+fact,
+>
+semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is needed
+>
+is actually "a byte per signal" in target_sigset_t, and we allow "a bit per
+>
+signal").
+Why do we need a byte per target signal, if the functions in linux-user/signal.c
+operate with bits?
+
+-- 
+Thanks.
+-- Max
+
+>
+Why do we need a byte per target signal, if the functions in
+>
+linux-user/signal.c
+>
+operate with bits?
+Max,
+
+I did not base my findings on code analysis, but on dumping size/offsets of 
+elements of some structures, as they are emulated in QEMU, and in real systems. 
+So, I can't really answer your question.
+
+Yours,
+Aleksandar
+
+>
+--
+>
+Thanks.
+>
+-- Max
+
diff --git a/results/classifier/004/other/02572177 b/results/classifier/004/other/02572177
new file mode 100644
index 00000000..10871dae
--- /dev/null
+++ b/results/classifier/004/other/02572177
@@ -0,0 +1,429 @@
+other: 0.869
+instruction: 0.794
+device: 0.791
+semantic: 0.770
+assembly: 0.756
+graphic: 0.747
+socket: 0.742
+network: 0.708
+vnc: 0.706
+mistranslation: 0.693
+KVM: 0.669
+boot: 0.658
+
+[Qemu-devel] 答复: Re:  [BUG]COLO failover hang
+
+hi.
+
+
+I test the git qemu master have the same problem.
+
+
+(gdb) bt
+
+
+#0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, niov=1, 
+fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+
+
+#1  0x00007f658e4aa0c2 in qio_channel_read (address@hidden, address@hidden "", 
+address@hidden, address@hidden) at io/channel.c:114
+
+
+#2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, 
+buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at 
+migration/qemu-file-channel.c:78
+
+
+#3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
+migration/qemu-file.c:295
+
+
+#4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, address@hidden) at 
+migration/qemu-file.c:555
+
+
+#5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at 
+migration/qemu-file.c:568
+
+
+#6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at 
+migration/qemu-file.c:648
+
+
+#7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
+address@hidden) at migration/colo.c:244
+
+
+#8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized outï¼, 
+address@hidden, address@hidden)
+
+
+    at migration/colo.c:264
+
+
+#9  0x00007f658e3e740e in colo_process_incoming_thread (opaque=0x7f658eb30360 
+ï¼mis_current.31286ï¼) at migration/colo.c:577
+
+
+#10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+
+
+#11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+
+
+(gdb) p ioc-ï¼name
+
+
+$2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+
+
+(gdb) p ioc-ï¼features          Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+
+
+$3 = 0
+
+
+
+
+
+(gdb) bt
+
+
+#0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, condition=G_IO_IN, 
+opaque=0x7fdcceeafa90) at migration/socket.c:137
+
+
+#1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at 
+gmain.c:3054
+
+
+#2  g_main_context_dispatch (context=ï¼optimized outï¼, address@hidden) at 
+gmain.c:3630
+
+
+#3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+
+
+#4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at util/main-loop.c:258
+
+
+#5  main_loop_wait (address@hidden) at util/main-loop.c:506
+
+
+#6  0x00007fdccb526187 in main_loop () at vl.c:1898
+
+
+#7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized outï¼) at 
+vl.c:4709
+
+
+(gdb) p ioc-ï¼features
+
+
+$1 = 6
+
+
+(gdb) p ioc-ï¼name
+
+
+$2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+
+
+
+
+
+May be socket_accept_incoming_migration should call 
+qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+
+
+
+
+
+thank you.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+æéäººï¼ address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ16æ¥ 14:46
+ä¸» é¢ ï¼Re: [Qemu-devel] COLO failover hang
+
+
+
+
+
+
+
+On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼
+ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼
+ï¼ I found that the colo in qemu is not complete yet.
+ï¼ Do the colo have any plan for development?
+
+Yes, We are developing. You can see some of patch we pushing.
+
+ï¼ Has anyone ever run it successfully? Any help is appreciated!
+
+In our internal version can run it successfully,
+The failover detail you can ask Zhanghailiang for help.
+Next time if you have some question about COLO,
+please cc me and zhanghailiang address@hidden
+
+
+Thanks
+Zhang Chen
+
+
+ï¼
+ï¼
+ï¼
+ï¼ centos7.2+qemu2.7.50
+ï¼ (gdb) bt
+ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ io/channel-socket.c:497
+ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ address@hidden "", address@hidden,
+ï¼ address@hidden) at io/channel.c:97
+ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ migration/qemu-file-channel.c:78
+ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ migration/qemu-file.c:257
+ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ migration/qemu-file.c:523
+ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ migration/qemu-file.c:603
+ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ address@hidden) at migration/colo.c:215
+ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ migration/colo.c:546
+ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ migration/colo.c:649
+ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ --
+ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼
+ï¼
+ï¼
+ï¼
+
+-- 
+Thanks
+Zhang Chen
+
+Hi,Wang.
+
+You can test this branch:
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+and please follow wiki ensure your own configuration correctly.
+http://wiki.qemu-project.org/Features/COLO
+Thanks
+
+Zhang Chen
+
+
+On 03/21/2017 03:27 PM, address@hidden wrote:
+hi.
+
+I test the git qemu master have the same problem.
+
+(gdb) bt
+#0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+#1  0x00007f658e4aa0c2 in qio_channel_read
+(address@hidden, address@hidden "",
+address@hidden, address@hidden) at io/channel.c:114
+#2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+migration/qemu-file-channel.c:78
+#3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+migration/qemu-file.c:295
+#4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+address@hidden) at migration/qemu-file.c:555
+#5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+migration/qemu-file.c:568
+#6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+migration/qemu-file.c:648
+#7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+address@hidden) at migration/colo.c:244
+#8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+outï¼, address@hidden,
+address@hidden)
+at migration/colo.c:264
+#9  0x00007f658e3e740e in colo_process_incoming_thread
+(opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+#10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+
+#11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+
+(gdb) p ioc-ï¼name
+
+$2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+
+(gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+
+$3 = 0
+
+
+(gdb) bt
+#0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+#1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+gmain.c:3054
+#2  g_main_context_dispatch (context=ï¼optimized outï¼,
+address@hidden) at gmain.c:3630
+#3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+#4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+util/main-loop.c:258
+#5  main_loop_wait (address@hidden) at
+util/main-loop.c:506
+#6  0x00007fdccb526187 in main_loop () at vl.c:1898
+#7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+outï¼) at vl.c:4709
+(gdb) p ioc-ï¼features
+
+$1 = 6
+
+(gdb) p ioc-ï¼name
+
+$2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+May be socket_accept_incoming_migration should
+call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+thank you.
+
+
+
+
+
+åå§é®ä»¶
+address@hidden;
+*æ¶ä»¶äººï¼*çå¹¿10165992;address@hidden;
+address@hidden;address@hidden;
+*æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+*ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+
+
+
+
+On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼
+ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼
+ï¼ I found that the colo in qemu is not complete yet.
+ï¼ Do the colo have any plan for development?
+
+Yes, We are developing. You can see some of patch we pushing.
+
+ï¼ Has anyone ever run it successfully? Any help is appreciated!
+
+In our internal version can run it successfully,
+The failover detail you can ask Zhanghailiang for help.
+Next time if you have some question about COLO,
+please cc me and zhanghailiang address@hidden
+
+
+Thanks
+Zhang Chen
+
+
+ï¼
+ï¼
+ï¼
+ï¼ centos7.2+qemu2.7.50
+ï¼ (gdb) bt
+ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ io/channel-socket.c:497
+ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ address@hidden "", address@hidden,
+ï¼ address@hidden) at io/channel.c:97
+ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ migration/qemu-file-channel.c:78
+ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ migration/qemu-file.c:257
+ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ migration/qemu-file.c:523
+ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ migration/qemu-file.c:603
+ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ address@hidden) at migration/colo.c:215
+ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ migration/colo.c:546
+ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ migration/colo.c:649
+ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ --
+ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼
+ï¼
+ï¼
+ï¼
+
+--
+Thanks
+Zhang Chen
+--
+Thanks
+Zhang Chen
+
diff --git a/results/classifier/004/other/12869209 b/results/classifier/004/other/12869209
new file mode 100644
index 00000000..6e8310c0
--- /dev/null
+++ b/results/classifier/004/other/12869209
@@ -0,0 +1,96 @@
+other: 0.964
+device: 0.951
+assembly: 0.937
+mistranslation: 0.935
+instruction: 0.919
+socket: 0.906
+semantic: 0.891
+vnc: 0.885
+graphic: 0.879
+network: 0.858
+KVM: 0.857
+boot: 0.837
+
+[BUG FIX][PATCH v3 0/3] vhost-user-blk: fix bug on device disconnection during initialization
+
+This is a series fixing a bug in
+          host-user-blk.
+Is there any chance for it to be considered for the next rc?
+Thanks!
+Denis
+On 29.03.2021 16:44, Denis Plotnikov
+      wrote:
+ping!
+On 25.03.2021 18:12, Denis Plotnikov
+        wrote:
+v3:
+  * 0003: a new patch added fixing the problem on vm shutdown
+    I stumbled on this bug after v2 sending.
+  * 0001: gramma fixing (Raphael)
+  * 0002: commit message fixing (Raphael)
+
+v2:
+  * split the initial patch into two (Raphael)
+  * rename init to realized (Raphael)
+  * remove unrelated comment (Raphael)
+
+When the vhost-user-blk device lose the connection to the daemon during
+the initialization phase it kills qemu because of the assert in the code.
+The series fixes the bug.
+
+0001 is preparation for the fix
+0002 fixes the bug, patch description has the full motivation for the series
+0003 (added in v3) fix bug on vm shutdown
+
+Denis Plotnikov (3):
+  vhost-user-blk: use different event handlers on initialization
+  vhost-user-blk: perform immediate cleanup if disconnect on
+    initialization
+  vhost-user-blk: add immediate cleanup on shutdown
+
+ hw/block/vhost-user-blk.c | 79 ++++++++++++++++++++++++---------------
+ 1 file changed, 48 insertions(+), 31 deletions(-)
+
+On 01.04.2021 14:21, Denis Plotnikov wrote:
+This is a series fixing a bug in host-user-blk.
+More specifically, it's not just a bug but crasher.
+
+Valentine
+Is there any chance for it to be considered for the next rc?
+
+Thanks!
+
+Denis
+
+On 29.03.2021 16:44, Denis Plotnikov wrote:
+ping!
+
+On 25.03.2021 18:12, Denis Plotnikov wrote:
+v3:
+   * 0003: a new patch added fixing the problem on vm shutdown
+     I stumbled on this bug after v2 sending.
+   * 0001: gramma fixing (Raphael)
+   * 0002: commit message fixing (Raphael)
+
+v2:
+   * split the initial patch into two (Raphael)
+   * rename init to realized (Raphael)
+   * remove unrelated comment (Raphael)
+
+When the vhost-user-blk device lose the connection to the daemon during
+the initialization phase it kills qemu because of the assert in the code.
+The series fixes the bug.
+
+0001 is preparation for the fix
+0002 fixes the bug, patch description has the full motivation for the series
+0003 (added in v3) fix bug on vm shutdown
+
+Denis Plotnikov (3):
+   vhost-user-blk: use different event handlers on initialization
+   vhost-user-blk: perform immediate cleanup if disconnect on
+     initialization
+   vhost-user-blk: add immediate cleanup on shutdown
+
+  hw/block/vhost-user-blk.c | 79 ++++++++++++++++++++++++---------------
+  1 file changed, 48 insertions(+), 31 deletions(-)
+
diff --git a/results/classifier/004/other/13442371 b/results/classifier/004/other/13442371
new file mode 100644
index 00000000..189f6cbb
--- /dev/null
+++ b/results/classifier/004/other/13442371
@@ -0,0 +1,377 @@
+other: 0.886
+device: 0.883
+assembly: 0.877
+KVM: 0.872
+instruction: 0.861
+mistranslation: 0.859
+vnc: 0.858
+graphic: 0.850
+semantic: 0.850
+socket: 0.831
+boot: 0.815
+network: 0.811
+
+[Qemu-devel] [BUG] nanoMIPS support problem related to extract2 support for i386 TCG target
+
+Hello, Richard, Peter, and others.
+
+As a part of activities before 4.1 release, I tested nanoMIPS support
+in QEMU (which was officially fully integrated in 4.0, is currently
+limited to system mode only, and was tested in a similar fashion right
+prior to 4.0).
+
+This support appears to be broken now. Following command line works in
+4.0, but results in kernel panic for the current tip of the tree:
+
+~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
+-cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
+1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
+"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
+
+(kernel and rootfs image files used in this commend line can be
+downloaded from the locations mentioned in our user guide)
+
+The quick bisect points to the commit:
+
+commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
+Author: Richard Henderson <address@hidden>
+Date:   Mon Feb 25 11:42:35 2019 -0800
+
+    tcg/i386: Support INDEX_op_extract2_{i32,i64}
+
+    Signed-off-by: Richard Henderson <address@hidden>
+
+Please advise on further actions.
+
+Yours,
+Aleksandar
+
+On Fri, Jul 12, 2019 at 8:09 PM Aleksandar Markovic
+<address@hidden> wrote:
+>
+>
+Hello, Richard, Peter, and others.
+>
+>
+As a part of activities before 4.1 release, I tested nanoMIPS support
+>
+in QEMU (which was officially fully integrated in 4.0, is currently
+>
+limited to system mode only, and was tested in a similar fashion right
+>
+prior to 4.0).
+>
+>
+This support appears to be broken now. Following command line works in
+>
+4.0, but results in kernel panic for the current tip of the tree:
+>
+>
+~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
+>
+-cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
+>
+1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
+>
+"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
+>
+>
+(kernel and rootfs image files used in this commend line can be
+>
+downloaded from the locations mentioned in our user guide)
+>
+>
+The quick bisect points to the commit:
+>
+>
+commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
+>
+Author: Richard Henderson <address@hidden>
+>
+Date:   Mon Feb 25 11:42:35 2019 -0800
+>
+>
+tcg/i386: Support INDEX_op_extract2_{i32,i64}
+>
+>
+Signed-off-by: Richard Henderson <address@hidden>
+>
+>
+Please advise on further actions.
+>
+Just to add a data point:
+
+If the following change is applied:
+
+diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
+index 928e8b8..b6a4cf2 100644
+--- a/tcg/i386/tcg-target.h
++++ b/tcg/i386/tcg-target.h
+@@ -124,7 +124,7 @@ extern bool have_avx2;
+ #define TCG_TARGET_HAS_deposit_i32      1
+ #define TCG_TARGET_HAS_extract_i32      1
+ #define TCG_TARGET_HAS_sextract_i32     1
+-#define TCG_TARGET_HAS_extract2_i32     1
++#define TCG_TARGET_HAS_extract2_i32     0
+ #define TCG_TARGET_HAS_movcond_i32      1
+ #define TCG_TARGET_HAS_add2_i32         1
+ #define TCG_TARGET_HAS_sub2_i32         1
+@@ -163,7 +163,7 @@ extern bool have_avx2;
+ #define TCG_TARGET_HAS_deposit_i64      1
+ #define TCG_TARGET_HAS_extract_i64      1
+ #define TCG_TARGET_HAS_sextract_i64     0
+-#define TCG_TARGET_HAS_extract2_i64     1
++#define TCG_TARGET_HAS_extract2_i64     0
+ #define TCG_TARGET_HAS_movcond_i64      1
+ #define TCG_TARGET_HAS_add2_i64         1
+ #define TCG_TARGET_HAS_sub2_i64         1
+
+... the problem disappears.
+
+
+>
+Yours,
+>
+Aleksandar
+
+On Fri, Jul 12, 2019 at 8:19 PM Aleksandar Markovic
+<address@hidden> wrote:
+>
+>
+On Fri, Jul 12, 2019 at 8:09 PM Aleksandar Markovic
+>
+<address@hidden> wrote:
+>
+>
+>
+> Hello, Richard, Peter, and others.
+>
+>
+>
+> As a part of activities before 4.1 release, I tested nanoMIPS support
+>
+> in QEMU (which was officially fully integrated in 4.0, is currently
+>
+> limited to system mode only, and was tested in a similar fashion right
+>
+> prior to 4.0).
+>
+>
+>
+> This support appears to be broken now. Following command line works in
+>
+> 4.0, but results in kernel panic for the current tip of the tree:
+>
+>
+>
+> ~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
+>
+> -cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
+>
+> 1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
+>
+> "mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
+>
+>
+>
+> (kernel and rootfs image files used in this commend line can be
+>
+> downloaded from the locations mentioned in our user guide)
+>
+>
+>
+> The quick bisect points to the commit:
+>
+>
+>
+> commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
+>
+> Author: Richard Henderson <address@hidden>
+>
+> Date:   Mon Feb 25 11:42:35 2019 -0800
+>
+>
+>
+>     tcg/i386: Support INDEX_op_extract2_{i32,i64}
+>
+>
+>
+>     Signed-off-by: Richard Henderson <address@hidden>
+>
+>
+>
+> Please advise on further actions.
+>
+>
+>
+>
+Just to add a data point:
+>
+>
+If the following change is applied:
+>
+>
+diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
+>
+index 928e8b8..b6a4cf2 100644
+>
+--- a/tcg/i386/tcg-target.h
+>
++++ b/tcg/i386/tcg-target.h
+>
+@@ -124,7 +124,7 @@ extern bool have_avx2;
+>
+#define TCG_TARGET_HAS_deposit_i32      1
+>
+#define TCG_TARGET_HAS_extract_i32      1
+>
+#define TCG_TARGET_HAS_sextract_i32     1
+>
+-#define TCG_TARGET_HAS_extract2_i32     1
+>
++#define TCG_TARGET_HAS_extract2_i32     0
+>
+#define TCG_TARGET_HAS_movcond_i32      1
+>
+#define TCG_TARGET_HAS_add2_i32         1
+>
+#define TCG_TARGET_HAS_sub2_i32         1
+>
+@@ -163,7 +163,7 @@ extern bool have_avx2;
+>
+#define TCG_TARGET_HAS_deposit_i64      1
+>
+#define TCG_TARGET_HAS_extract_i64      1
+>
+#define TCG_TARGET_HAS_sextract_i64     0
+>
+-#define TCG_TARGET_HAS_extract2_i64     1
+>
++#define TCG_TARGET_HAS_extract2_i64     0
+>
+#define TCG_TARGET_HAS_movcond_i64      1
+>
+#define TCG_TARGET_HAS_add2_i64         1
+>
+#define TCG_TARGET_HAS_sub2_i64         1
+>
+>
+... the problem disappears.
+>
+It looks the problem is in this code segment in of tcg_gen_deposit_i32():
+
+        if (ofs == 0) {
+            tcg_gen_extract2_i32(ret, arg1, arg2, len);
+            tcg_gen_rotli_i32(ret, ret, len);
+            goto done;
+        }
+
+)
+
+If that code segment is deleted altogether (which effectively forces
+usage of "fallback" part of tcg_gen_deposit_i32()), the problem also
+vanishes (without changes from my previous mail).
+
+>
+>
+> Yours,
+>
+> Aleksandar
+
+Aleksandar Markovic <address@hidden> writes:
+
+>
+Hello, Richard, Peter, and others.
+>
+>
+As a part of activities before 4.1 release, I tested nanoMIPS support
+>
+in QEMU (which was officially fully integrated in 4.0, is currently
+>
+limited to system mode only, and was tested in a similar fashion right
+>
+prior to 4.0).
+>
+>
+This support appears to be broken now. Following command line works in
+>
+4.0, but results in kernel panic for the current tip of the tree:
+>
+>
+~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
+>
+-cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
+>
+1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
+>
+"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
+>
+>
+(kernel and rootfs image files used in this commend line can be
+>
+downloaded from the locations mentioned in our user guide)
+>
+>
+The quick bisect points to the commit:
+>
+>
+commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
+>
+Author: Richard Henderson <address@hidden>
+>
+Date:   Mon Feb 25 11:42:35 2019 -0800
+>
+>
+tcg/i386: Support INDEX_op_extract2_{i32,i64}
+>
+>
+Signed-off-by: Richard Henderson <address@hidden>
+>
+>
+Please advise on further actions.
+Please see the fix:
+
+  Subject: [PATCH for-4.1] tcg: Fix constant folding of INDEX_op_extract2_i32
+  Date: Tue,  9 Jul 2019 14:19:00 +0200
+  Message-Id: <address@hidden>
+
+>
+>
+Yours,
+>
+Aleksandar
+--
+Alex BennÃ©e
+
+On Sat, Jul 13, 2019 at 9:21 AM Alex BennÃ©e <address@hidden> wrote:
+>
+>
+Please see the fix:
+>
+>
+Subject: [PATCH for-4.1] tcg: Fix constant folding of INDEX_op_extract2_i32
+>
+Date: Tue,  9 Jul 2019 14:19:00 +0200
+>
+Message-Id: <address@hidden>
+>
+Thanks, this fixed the behavior.
+
+Sincerely,
+Aleksandar
+
+>
+>
+>
+>
+> Yours,
+>
+> Aleksandar
+>
+>
+>
+--
+>
+Alex BennÃ©e
+>
+
diff --git a/results/classifier/004/other/16056596 b/results/classifier/004/other/16056596
new file mode 100644
index 00000000..c08ed7aa
--- /dev/null
+++ b/results/classifier/004/other/16056596
@@ -0,0 +1,106 @@
+other: 0.980
+semantic: 0.979
+assembly: 0.975
+instruction: 0.975
+device: 0.973
+boot: 0.971
+graphic: 0.970
+mistranslation: 0.961
+socket: 0.952
+vnc: 0.946
+network: 0.940
+KVM: 0.934
+
+[BUG][powerpc] KVM Guest Boot Failure and Hang at "Booting Linux via __start()"
+
+Bug Description:
+Encountering a boot failure when launching a KVM guest with
+'qemu-system-ppc64'. The guest hangs at boot, and the QEMU monitor
+crashes.
+Reproduction Steps:
+# qemu-system-ppc64 --version
+QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
+Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
+# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
+pseries,accel=kvm \
+-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
+  -device virtio-scsi-pci,id=scsi \
+-drive
+file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
+\
+-device scsi-hd,drive=drive0,bus=scsi.0 \
+  -netdev bridge,id=net0,br=virbr0 \
+  -device virtio-net-pci,netdev=net0 \
+  -serial pty \
+  -device virtio-balloon-pci \
+  -cpu host
+QEMU 9.2.50 monitor - type 'help' for more information
+char device redirected to /dev/pts/2 (label serial0)
+(qemu)
+(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but
+unavailable: IRQ_XIVE capability must be present for KVM
+Falling back to kernel-irqchip=off
+** Qemu Hang
+
+(In another ssh session)
+# screen /dev/pts/2
+Preparing to boot Linux version 6.10.4-200.fc40.ppc64le
+(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801
+(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11
+15:20:17 UTC 2024
+Detected machine type: 0000000000000101
+command line:
+BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le
+root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
+Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
+Calling ibm,client-architecture-support... done
+memory layout at init:
+  memory_limit : 0000000000000000 (16 MB aligned)
+  alloc_bottom : 0000000008200000
+  alloc_top    : 0000000030000000
+  alloc_top_hi : 0000000800000000
+  rmo_top      : 0000000030000000
+  ram_top      : 0000000800000000
+instantiating rtas at 0x000000002fff0000... done
+prom_hold_cpus: skipped
+copying OF device tree...
+Building dt strings...
+Building dt structure...
+Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
+Device tree struct  0x0000000008220000 -> 0x0000000008230000
+Quiescing Open Firmware ...
+Booting Linux via __start() @ 0x0000000000440000 ...
+** Guest Console Hang
+
+
+Git Bisect:
+Performing git bisect points to the following patch:
+# git bisect bad
+e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
+commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
+Author: Nicholas Piggin <npiggin@gmail.com>
+Date:   Thu Dec 19 13:40:31 2024 +1000
+
+    target/ppc: fix timebase register reset state
+(H)DEC and PURR get reset before icount does, which causes them to
+be
+skewed and not match the init state. This can cause replay to not
+match the recorded trace exactly. For DEC and HDEC this is usually
+not
+noticable since they tend to get programmed before affecting the
+    target machine. PURR has been observed to cause replay bugs when
+    running Linux.
+
+    Fix this by resetting using a time of 0.
+
+    Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
+    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
+
+ hw/ppc/ppc.c | 11 ++++++++---
+ 1 file changed, 8 insertions(+), 3 deletions(-)
+
+
+Reverting the patch helps boot the guest.
+Thanks,
+Misbah Anjum N
+
diff --git a/results/classifier/004/other/16201167 b/results/classifier/004/other/16201167
new file mode 100644
index 00000000..f2ac5def
--- /dev/null
+++ b/results/classifier/004/other/16201167
@@ -0,0 +1,108 @@
+other: 0.954
+mistranslation: 0.947
+vnc: 0.946
+graphic: 0.937
+semantic: 0.933
+KVM: 0.928
+instruction: 0.922
+device: 0.911
+assembly: 0.910
+socket: 0.892
+network: 0.864
+boot: 0.845
+
+[BUG] Qemu abort with error "kvm_mem_ioeventfd_add: error adding ioeventfd: File exists (17)"
+
+Hi list,
+
+When I did some tests in my virtual domain with live-attached virtio deivces, I 
+got a coredump file of Qemu.
+
+The error print from qemu is "kvm_mem_ioeventfd_add: error adding ioeventfd: 
+File exists (17)".
+And the call trace in the coredump file displays as below:
+#0  0x0000ffff89acecc8 in ?? () from /usr/lib64/libc.so.6
+#1  0x0000ffff89a8acbc in raise () from /usr/lib64/libc.so.6
+#2  0x0000ffff89a78d2c in abort () from /usr/lib64/libc.so.6
+#3  0x0000aaaabd7ccf1c in kvm_mem_ioeventfd_add (listener=<optimized out>, 
+section=<optimized out>, match_data=<optimized out>, data=<optimized out>, 
+e=<optimized out>) at ../accel/kvm/kvm-all.c:1607
+#4  0x0000aaaabd6e0304 in address_space_add_del_ioeventfds (fds_old_nb=164, 
+fds_old=0xffff5c80a1d0, fds_new_nb=160, fds_new=0xffff5c565080, 
+as=0xaaaabdfa8810 <address_space_memory>)
+    at ../softmmu/memory.c:795
+#5  address_space_update_ioeventfds (as=0xaaaabdfa8810 <address_space_memory>) 
+at ../softmmu/memory.c:856
+#6  0x0000aaaabd6e24d8 in memory_region_commit () at ../softmmu/memory.c:1113
+#7  0x0000aaaabd6e25c4 in memory_region_transaction_commit () at 
+../softmmu/memory.c:1144
+#8  0x0000aaaabd394eb4 in pci_bridge_update_mappings 
+(br=br@entry=0xaaaae755f7c0) at ../hw/pci/pci_bridge.c:248
+#9  0x0000aaaabd394f4c in pci_bridge_write_config (d=0xaaaae755f7c0, 
+address=44, val=<optimized out>, len=4) at ../hw/pci/pci_bridge.c:272
+#10 0x0000aaaabd39a928 in rp_write_config (d=0xaaaae755f7c0, address=44, 
+val=128, len=4) at ../hw/pci-bridge/pcie_root_port.c:39
+#11 0x0000aaaabd6df328 in memory_region_write_accessor (mr=0xaaaae63898d0, 
+addr=65580, value=<optimized out>, size=4, shift=<optimized out>, 
+mask=<optimized out>, attrs=...) at ../softmmu/memory.c:494
+#12 0x0000aaaabd6dcb6c in access_with_adjusted_size (addr=addr@entry=65580, 
+value=value@entry=0xffff817adc78, size=size@entry=4, access_size_min=<optimized 
+out>, access_size_max=<optimized out>,
+    access_fn=access_fn@entry=0xaaaabd6df284 <memory_region_write_accessor>, 
+mr=mr@entry=0xaaaae63898d0, attrs=attrs@entry=...) at ../softmmu/memory.c:556
+#13 0x0000aaaabd6e0dc8 in memory_region_dispatch_write 
+(mr=mr@entry=0xaaaae63898d0, addr=65580, data=<optimized out>, op=MO_32, 
+attrs=attrs@entry=...) at ../softmmu/memory.c:1534
+#14 0x0000aaaabd6d0574 in flatview_write_continue (fv=fv@entry=0xffff5c02da00, 
+addr=addr@entry=275146407980, attrs=attrs@entry=..., 
+ptr=ptr@entry=0xffff8aa8c028, len=len@entry=4,
+    addr1=<optimized out>, l=<optimized out>, mr=mr@entry=0xaaaae63898d0) at 
+/usr/src/debug/qemu-6.2.0-226.aarch64/include/qemu/host-utils.h:165
+#15 0x0000aaaabd6d4584 in flatview_write (len=4, buf=0xffff8aa8c028, attrs=..., 
+addr=275146407980, fv=0xffff5c02da00) at ../softmmu/physmem.c:3375
+#16 address_space_write (as=<optimized out>, addr=275146407980, attrs=..., 
+buf=buf@entry=0xffff8aa8c028, len=4) at ../softmmu/physmem.c:3467
+#17 0x0000aaaabd6d462c in address_space_rw (as=<optimized out>, addr=<optimized 
+out>, attrs=..., attrs@entry=..., buf=buf@entry=0xffff8aa8c028, len=<optimized 
+out>, is_write=<optimized out>)
+    at ../softmmu/physmem.c:3477
+#18 0x0000aaaabd7cf6e8 in kvm_cpu_exec (cpu=cpu@entry=0xaaaae625dfd0) at 
+../accel/kvm/kvm-all.c:2970
+#19 0x0000aaaabd7d09bc in kvm_vcpu_thread_fn (arg=arg@entry=0xaaaae625dfd0) at 
+../accel/kvm/kvm-accel-ops.c:49
+#20 0x0000aaaabd94ccd8 in qemu_thread_start (args=<optimized out>) at 
+../util/qemu-thread-posix.c:559
+
+
+By printing more info in the coredump file, I found that the addr of 
+fds_old[146] and fds_new[146] are same, but fds_old[146] belonged to a 
+live-attached virtio-scsi device while fds_new[146] was owned by another 
+live-attached virtio-net.
+The reason why addr conflicted was then been found from vm's console log. Just 
+before qemu aborted, the guest kernel crashed and kdump.service booted the 
+dump-capture kernel where re-alloced address for the devices.
+Because those virtio devices were live-attached after vm creating, different 
+addr may been assigned to them in the dump-capture kernel:
+
+the initial kernel booting log:
+[    1.663297] pci 0000:00:02.1: BAR 14: assigned [mem 0x11900000-0x11afffff]
+[    1.664560] pci 0000:00:02.1: BAR 15: assigned [mem 
+0x8001800000-0x80019fffff 64bit pref]
+
+the dump-capture kernel booting log:
+[    1.845211] pci 0000:00:02.0: BAR 14: assigned [mem 0x11900000-0x11bfffff]
+[    1.846542] pci 0000:00:02.0: BAR 15: assigned [mem 
+0x8001800000-0x8001afffff 64bit pref]
+
+
+I think directly aborting the qemu process may not be the best choice in this 
+case cuz it will interrupt the work of kdump.service so that failed to generate 
+memory dump of the crashed guest kernel.
+Perhaps, IMO, the error could be simply ignored in this case and just let kdump 
+to reboot the system after memory-dump finishing, but I failed to find a 
+suitable judgment in the codes.
+
+Any solution for this problem? Hope I can get some helps here.
+
+Hao
+
diff --git a/results/classifier/004/other/16228234 b/results/classifier/004/other/16228234
new file mode 100644
index 00000000..dcb69d04
--- /dev/null
+++ b/results/classifier/004/other/16228234
@@ -0,0 +1,1852 @@
+other: 0.535
+mistranslation: 0.518
+KVM: 0.445
+instruction: 0.442
+network: 0.440
+device: 0.439
+assembly: 0.435
+vnc: 0.420
+semantic: 0.411
+graphic: 0.408
+boot: 0.402
+socket: 0.401
+
+[Qemu-devel] [Bug?] BQL about live migration
+
+Hello Juan & Dave,
+
+We hit a bug in our test:
+Network error occurs when migrating a guest, libvirt then rollback the
+migration, causes qemu coredump
+qemu log:
+2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
+ {"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event": "STOP"}
+2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
+ qmp_cmd_name: migrate_cancel
+2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
+ {"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event": 
+"MIGRATION", "data": {"status": "cancelling"}}
+2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
+ qmp_cmd_name: cont
+2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+ virtio-balloon device status is 7 that means DRIVER OK
+2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+ virtio-net device status is 7 that means DRIVER OK
+2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+ virtio-blk device status is 7 that means DRIVER OK
+2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+ virtio-serial device status is 7 that means DRIVER OK
+2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|: 
+vm_state-notify:3ms
+2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
+ {"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event": 
+"RESUME"}
+2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|:
+ this iteration cycle takes 3s, new dirtied data:0MB
+2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
+ {"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event": 
+"MIGRATION_PASS", "data": {"pass": 3}}
+2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for 
+(131583/18446744073709551615)
+qemu-kvm: /home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519: 
+virtio_net_save: Assertion `!n->vhost_started' failed.
+2017-03-01 12:54:43.028: shutting down
+
+>
+From qemu log, qemu received and processed migrate_cancel/cont qmp commands
+after guest been stopped and entered the last round of migration. Then
+migration thread try to save device state when guest is running(started by
+cont command), causes assert and coredump.
+This is because in last iter, we call cpu_synchronize_all_states() to
+synchronize vcpu states, this call will release qemu_global_mutex and wait
+for do_kvm_cpu_synchronize_state() to be executed on target vcpu:
+(gdb) bt
+#0  0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from 
+/lib64/libpthread.so.0
+#1  0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0 <qemu_work_cond>, 
+mutex=0x7f764445eba0 <qemu_global_mutex>) at util/qemu-thread-posix.c:132
+#2  0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80, func=0x7f7643a46413 
+<do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at 
+/mnt/public/yanghy/qemu-kvm/cpus.c:995
+#3  0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at 
+/mnt/public/yanghy/qemu-kvm/kvm-all.c:1805
+#4  0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at 
+/mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457
+#5  0x00007f7643a2db0c in cpu_synchronize_all_states () at 
+/mnt/public/yanghy/qemu-kvm/cpus.c:766
+#6  0x00007f7643a67b5b in qemu_savevm_state_complete_precopy (f=0x7f76462f2d30, 
+iterable_only=false) at /mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051
+#7  0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0 
+<current_migration.37571>, current_active_state=4, 
+old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at 
+migration/migration.c:1753
+#8  0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0 
+<current_migration.37571>) at migration/migration.c:1922
+#9  0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0
+#10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6
+(gdb) p iothread_locked
+$1 = true
+
+and then, qemu main thread been executed, it won't block because migration
+thread released the qemu_global_mutex:
+(gdb) thr 1
+[Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))]
+#0  os_host_main_loop_wait (timeout=931565) at main-loop.c:270
+270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout %d\n", 
+timeout);
+(gdb) p iothread_locked
+$2 = true
+(gdb) l 268
+263
+264         ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, 
+timeout);
+265
+266
+267         if (timeout) {
+268             qemu_mutex_lock_iothread();
+269             if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
+270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout %d\n", 
+timeout);
+271             }
+272         }
+(gdb)
+
+So, although we've hold iothread_lock in stop&copy phase of migration, we
+can't guarantee the iothread been locked all through the stop & copy phase,
+any thoughts on how to solve this problem?
+
+
+Thanks,
+-Gonglei
+
+On Fri, 03/03 09:29, Gonglei (Arei) wrote:
+>
+Hello Juan & Dave,
+>
+>
+We hit a bug in our test:
+>
+Network error occurs when migrating a guest, libvirt then rollback the
+>
+migration, causes qemu coredump
+>
+qemu log:
+>
+2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
+>
+{"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event":
+>
+"STOP"}
+>
+2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
+>
+qmp_cmd_name: migrate_cancel
+>
+2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
+>
+{"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event":
+>
+"MIGRATION", "data": {"status": "cancelling"}}
+>
+2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
+>
+qmp_cmd_name: cont
+>
+2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+virtio-balloon device status is 7 that means DRIVER OK
+>
+2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+virtio-net device status is 7 that means DRIVER OK
+>
+2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+virtio-blk device status is 7 that means DRIVER OK
+>
+2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+virtio-serial device status is 7 that means DRIVER OK
+>
+2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|:
+>
+vm_state-notify:3ms
+>
+2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
+>
+{"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event":
+>
+"RESUME"}
+>
+2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|:
+>
+this iteration cycle takes 3s, new dirtied data:0MB
+>
+2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
+>
+{"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event":
+>
+"MIGRATION_PASS", "data": {"pass": 3}}
+>
+2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for
+>
+(131583/18446744073709551615)
+>
+qemu-kvm:
+>
+/home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519:
+>
+virtio_net_save: Assertion `!n->vhost_started' failed.
+>
+2017-03-01 12:54:43.028: shutting down
+>
+>
+From qemu log, qemu received and processed migrate_cancel/cont qmp commands
+>
+after guest been stopped and entered the last round of migration. Then
+>
+migration thread try to save device state when guest is running(started by
+>
+cont command), causes assert and coredump.
+>
+This is because in last iter, we call cpu_synchronize_all_states() to
+>
+synchronize vcpu states, this call will release qemu_global_mutex and wait
+>
+for do_kvm_cpu_synchronize_state() to be executed on target vcpu:
+>
+(gdb) bt
+>
+#0  0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from
+>
+/lib64/libpthread.so.0
+>
+#1  0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0
+>
+<qemu_work_cond>, mutex=0x7f764445eba0 <qemu_global_mutex>) at
+>
+util/qemu-thread-posix.c:132
+>
+#2  0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80, func=0x7f7643a46413
+>
+<do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at
+>
+/mnt/public/yanghy/qemu-kvm/cpus.c:995
+>
+#3  0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at
+>
+/mnt/public/yanghy/qemu-kvm/kvm-all.c:1805
+>
+#4  0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at
+>
+/mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457
+>
+#5  0x00007f7643a2db0c in cpu_synchronize_all_states () at
+>
+/mnt/public/yanghy/qemu-kvm/cpus.c:766
+>
+#6  0x00007f7643a67b5b in qemu_savevm_state_complete_precopy
+>
+(f=0x7f76462f2d30, iterable_only=false) at
+>
+/mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051
+>
+#7  0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0
+>
+<current_migration.37571>, current_active_state=4,
+>
+old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at
+>
+migration/migration.c:1753
+>
+#8  0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0
+>
+<current_migration.37571>) at migration/migration.c:1922
+>
+#9  0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0
+>
+#10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6
+>
+(gdb) p iothread_locked
+>
+$1 = true
+>
+>
+and then, qemu main thread been executed, it won't block because migration
+>
+thread released the qemu_global_mutex:
+>
+(gdb) thr 1
+>
+[Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))]
+>
+#0  os_host_main_loop_wait (timeout=931565) at main-loop.c:270
+>
+270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
+>
+%d\n", timeout);
+>
+(gdb) p iothread_locked
+>
+$2 = true
+>
+(gdb) l 268
+>
+263
+>
+264         ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len,
+>
+timeout);
+>
+265
+>
+266
+>
+267         if (timeout) {
+>
+268             qemu_mutex_lock_iothread();
+>
+269             if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
+>
+270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
+>
+%d\n", timeout);
+>
+271             }
+>
+272         }
+>
+(gdb)
+>
+>
+So, although we've hold iothread_lock in stop&copy phase of migration, we
+>
+can't guarantee the iothread been locked all through the stop & copy phase,
+>
+any thoughts on how to solve this problem?
+Could you post a backtrace of the assertion?
+
+Fam
+
+On 2017/3/3 18:42, Fam Zheng wrote:
+>
+On Fri, 03/03 09:29, Gonglei (Arei) wrote:
+>
+> Hello Juan & Dave,
+>
+>
+>
+> We hit a bug in our test:
+>
+> Network error occurs when migrating a guest, libvirt then rollback the
+>
+> migration, causes qemu coredump
+>
+> qemu log:
+>
+> 2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
+>
+>  {"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event":
+>
+> "STOP"}
+>
+> 2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
+>
+>  qmp_cmd_name: migrate_cancel
+>
+> 2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
+>
+>  {"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event":
+>
+> "MIGRATION", "data": {"status": "cancelling"}}
+>
+> 2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
+>
+>  qmp_cmd_name: cont
+>
+> 2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+>  virtio-balloon device status is 7 that means DRIVER OK
+>
+> 2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+>  virtio-net device status is 7 that means DRIVER OK
+>
+> 2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+>  virtio-blk device status is 7 that means DRIVER OK
+>
+> 2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+>  virtio-serial device status is 7 that means DRIVER OK
+>
+> 2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|:
+>
+> vm_state-notify:3ms
+>
+> 2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
+>
+>  {"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event":
+>
+> "RESUME"}
+>
+> 2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|:
+>
+>  this iteration cycle takes 3s, new dirtied data:0MB
+>
+> 2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
+>
+>  {"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event":
+>
+> "MIGRATION_PASS", "data": {"pass": 3}}
+>
+> 2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for
+>
+> (131583/18446744073709551615)
+>
+> qemu-kvm:
+>
+> /home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519:
+>
+> virtio_net_save: Assertion `!n->vhost_started' failed.
+>
+> 2017-03-01 12:54:43.028: shutting down
+>
+>
+>
+> From qemu log, qemu received and processed migrate_cancel/cont qmp commands
+>
+> after guest been stopped and entered the last round of migration. Then
+>
+> migration thread try to save device state when guest is running(started by
+>
+> cont command), causes assert and coredump.
+>
+> This is because in last iter, we call cpu_synchronize_all_states() to
+>
+> synchronize vcpu states, this call will release qemu_global_mutex and wait
+>
+> for do_kvm_cpu_synchronize_state() to be executed on target vcpu:
+>
+> (gdb) bt
+>
+> #0  0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from
+>
+> /lib64/libpthread.so.0
+>
+> #1  0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0
+>
+> <qemu_work_cond>, mutex=0x7f764445eba0 <qemu_global_mutex>) at
+>
+> util/qemu-thread-posix.c:132
+>
+> #2  0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80,
+>
+> func=0x7f7643a46413 <do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at
+>
+> /mnt/public/yanghy/qemu-kvm/cpus.c:995
+>
+> #3  0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at
+>
+> /mnt/public/yanghy/qemu-kvm/kvm-all.c:1805
+>
+> #4  0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at
+>
+> /mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457
+>
+> #5  0x00007f7643a2db0c in cpu_synchronize_all_states () at
+>
+> /mnt/public/yanghy/qemu-kvm/cpus.c:766
+>
+> #6  0x00007f7643a67b5b in qemu_savevm_state_complete_precopy
+>
+> (f=0x7f76462f2d30, iterable_only=false) at
+>
+> /mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051
+>
+> #7  0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0
+>
+> <current_migration.37571>, current_active_state=4,
+>
+> old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at
+>
+> migration/migration.c:1753
+>
+> #8  0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0
+>
+> <current_migration.37571>) at migration/migration.c:1922
+>
+> #9  0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0
+>
+> #10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6
+>
+> (gdb) p iothread_locked
+>
+> $1 = true
+>
+>
+>
+> and then, qemu main thread been executed, it won't block because migration
+>
+> thread released the qemu_global_mutex:
+>
+> (gdb) thr 1
+>
+> [Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))]
+>
+> #0  os_host_main_loop_wait (timeout=931565) at main-loop.c:270
+>
+> 270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
+>
+> %d\n", timeout);
+>
+> (gdb) p iothread_locked
+>
+> $2 = true
+>
+> (gdb) l 268
+>
+> 263
+>
+> 264         ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len,
+>
+> timeout);
+>
+> 265
+>
+> 266
+>
+> 267         if (timeout) {
+>
+> 268             qemu_mutex_lock_iothread();
+>
+> 269             if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
+>
+> 270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
+>
+> %d\n", timeout);
+>
+> 271             }
+>
+> 272         }
+>
+> (gdb)
+>
+>
+>
+> So, although we've hold iothread_lock in stop&copy phase of migration, we
+>
+> can't guarantee the iothread been locked all through the stop & copy phase,
+>
+> any thoughts on how to solve this problem?
+>
+>
+Could you post a backtrace of the assertion?
+#0  0x00007f97b1fbe5d7 in raise () from /usr/lib64/libc.so.6
+#1  0x00007f97b1fbfcc8 in abort () from /usr/lib64/libc.so.6
+#2  0x00007f97b1fb7546 in __assert_fail_base () from /usr/lib64/libc.so.6
+#3  0x00007f97b1fb75f2 in __assert_fail () from /usr/lib64/libc.so.6
+#4  0x000000000049fd19 in virtio_net_save (f=0x7f97a8ca44d0, 
+opaque=0x7f97a86e9018) at /usr/src/debug/qemu-kvm-2.6.0/hw/
+#5  0x000000000047e380 in vmstate_save_old_style (address@hidden, 
+address@hidden, se=0x7f9
+#6  0x000000000047fb93 in vmstate_save (address@hidden, address@hidden, 
+address@hidden
+#7  0x0000000000481ad2 in qemu_savevm_state_complete_precopy (f=0x7f97a8ca44d0, 
+address@hidden)
+#8  0x00000000006c6b60 in migration_completion (address@hidden 
+<current_migration.38312>, current_active_state=curre
+    address@hidden) at migration/migration.c:1761
+#9  0x00000000006c71db in migration_thread (address@hidden 
+<current_migration.38312>) at migration/migrati
+
+>
+>
+Fam
+>
+--
+Thanks,
+Yang
+
+* Gonglei (Arei) (address@hidden) wrote:
+>
+Hello Juan & Dave,
+cc'ing in pbonzini since it's magic involving cpu_synrhonize_all_states()
+
+>
+We hit a bug in our test:
+>
+Network error occurs when migrating a guest, libvirt then rollback the
+>
+migration, causes qemu coredump
+>
+qemu log:
+>
+2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
+>
+{"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event":
+>
+"STOP"}
+>
+2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
+>
+qmp_cmd_name: migrate_cancel
+>
+2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
+>
+{"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event":
+>
+"MIGRATION", "data": {"status": "cancelling"}}
+>
+2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|:
+>
+qmp_cmd_name: cont
+>
+2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+virtio-balloon device status is 7 that means DRIVER OK
+>
+2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+virtio-net device status is 7 that means DRIVER OK
+>
+2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+virtio-blk device status is 7 that means DRIVER OK
+>
+2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|:
+>
+virtio-serial device status is 7 that means DRIVER OK
+>
+2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|:
+>
+vm_state-notify:3ms
+>
+2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|:
+>
+{"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event":
+>
+"RESUME"}
+>
+2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|:
+>
+this iteration cycle takes 3s, new dirtied data:0MB
+>
+2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|:
+>
+{"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event":
+>
+"MIGRATION_PASS", "data": {"pass": 3}}
+>
+2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for
+>
+(131583/18446744073709551615)
+>
+qemu-kvm:
+>
+/home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519:
+>
+virtio_net_save: Assertion `!n->vhost_started' failed.
+>
+2017-03-01 12:54:43.028: shutting down
+>
+>
+From qemu log, qemu received and processed migrate_cancel/cont qmp commands
+>
+after guest been stopped and entered the last round of migration. Then
+>
+migration thread try to save device state when guest is running(started by
+>
+cont command), causes assert and coredump.
+>
+This is because in last iter, we call cpu_synchronize_all_states() to
+>
+synchronize vcpu states, this call will release qemu_global_mutex and wait
+>
+for do_kvm_cpu_synchronize_state() to be executed on target vcpu:
+>
+(gdb) bt
+>
+#0  0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from
+>
+/lib64/libpthread.so.0
+>
+#1  0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0
+>
+<qemu_work_cond>, mutex=0x7f764445eba0 <qemu_global_mutex>) at
+>
+util/qemu-thread-posix.c:132
+>
+#2  0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80, func=0x7f7643a46413
+>
+<do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at
+>
+/mnt/public/yanghy/qemu-kvm/cpus.c:995
+>
+#3  0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at
+>
+/mnt/public/yanghy/qemu-kvm/kvm-all.c:1805
+>
+#4  0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at
+>
+/mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457
+>
+#5  0x00007f7643a2db0c in cpu_synchronize_all_states () at
+>
+/mnt/public/yanghy/qemu-kvm/cpus.c:766
+>
+#6  0x00007f7643a67b5b in qemu_savevm_state_complete_precopy
+>
+(f=0x7f76462f2d30, iterable_only=false) at
+>
+/mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051
+>
+#7  0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0
+>
+<current_migration.37571>, current_active_state=4,
+>
+old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at
+>
+migration/migration.c:1753
+>
+#8  0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0
+>
+<current_migration.37571>) at migration/migration.c:1922
+>
+#9  0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0
+>
+#10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6
+>
+(gdb) p iothread_locked
+>
+$1 = true
+>
+>
+and then, qemu main thread been executed, it won't block because migration
+>
+thread released the qemu_global_mutex:
+>
+(gdb) thr 1
+>
+[Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))]
+>
+#0  os_host_main_loop_wait (timeout=931565) at main-loop.c:270
+>
+270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
+>
+%d\n", timeout);
+>
+(gdb) p iothread_locked
+>
+$2 = true
+>
+(gdb) l 268
+>
+263
+>
+264         ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len,
+>
+timeout);
+>
+265
+>
+266
+>
+267         if (timeout) {
+>
+268             qemu_mutex_lock_iothread();
+>
+269             if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
+>
+270                 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout
+>
+%d\n", timeout);
+>
+271             }
+>
+272         }
+>
+(gdb)
+>
+>
+So, although we've hold iothread_lock in stop&copy phase of migration, we
+>
+can't guarantee the iothread been locked all through the stop & copy phase,
+>
+any thoughts on how to solve this problem?
+Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that
+their were times when run_on_cpu would have to drop the BQL and I worried about 
+it,
+but this is the 1st time I've seen an error due to it.
+
+Do you know what the migration state was at that point? Was it 
+MIGRATION_STATUS_CANCELLING?
+I'm thinking perhaps we should stop 'cont' from continuing while migration is in
+MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED - so 
+that
+perhaps libvirt could avoid sending the 'cont' until then?
+
+Dave
+
+
+>
+>
+Thanks,
+>
+-Gonglei
+>
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
+>
+Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that
+>
+their were times when run_on_cpu would have to drop the BQL and I worried
+>
+about it,
+>
+but this is the 1st time I've seen an error due to it.
+>
+>
+Do you know what the migration state was at that point? Was it
+>
+MIGRATION_STATUS_CANCELLING?
+>
+I'm thinking perhaps we should stop 'cont' from continuing while migration is
+>
+in
+>
+MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED - so
+>
+that
+>
+perhaps libvirt could avoid sending the 'cont' until then?
+No, there's no event, though I thought libvirt would poll until
+"query-migrate" returns the cancelled state.  Of course that is a small
+consolation, because a segfault is unacceptable.
+
+One possibility is to suspend the monitor in qmp_migrate_cancel and
+resume it (with add_migration_state_change_notifier) when we hit the
+CANCELLED state.  I'm not sure what the latency would be between the end
+of migrate_fd_cancel and finally reaching CANCELLED.
+
+Paolo
+
+* Paolo Bonzini (address@hidden) wrote:
+>
+>
+>
+On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
+>
+> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that
+>
+> their were times when run_on_cpu would have to drop the BQL and I worried
+>
+> about it,
+>
+> but this is the 1st time I've seen an error due to it.
+>
+>
+>
+> Do you know what the migration state was at that point? Was it
+>
+> MIGRATION_STATUS_CANCELLING?
+>
+> I'm thinking perhaps we should stop 'cont' from continuing while migration
+>
+> is in
+>
+> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED -
+>
+> so that
+>
+> perhaps libvirt could avoid sending the 'cont' until then?
+>
+>
+No, there's no event, though I thought libvirt would poll until
+>
+"query-migrate" returns the cancelled state.  Of course that is a small
+>
+consolation, because a segfault is unacceptable.
+I think you might get an event if you set the new migrate capability called
+'events' on!
+
+void migrate_set_state(int *state, int old_state, int new_state)
+{
+    if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
+        trace_migrate_set_state(new_state);
+        migrate_generate_event(new_state);
+    }
+}
+
+static void migrate_generate_event(int new_state)
+{
+    if (migrate_use_events()) {
+        qapi_event_send_migration(new_state, &error_abort); 
+    }
+}
+
+That event feature went in sometime after 2.3.0.
+
+>
+One possibility is to suspend the monitor in qmp_migrate_cancel and
+>
+resume it (with add_migration_state_change_notifier) when we hit the
+>
+CANCELLED state.  I'm not sure what the latency would be between the end
+>
+of migrate_fd_cancel and finally reaching CANCELLED.
+I don't like suspending monitors; it can potentially take quite a significant
+time to do a cancel.
+How about making 'cont' fail if we're in CANCELLING?
+
+I'd really love to see the 'run_on_cpu' being more careful about the BQL;
+we really need all of the rest of the devices to stay quiesced at times.
+
+Dave
+
+>
+Paolo
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
+>
+* Paolo Bonzini (address@hidden) wrote:
+>
+>
+>
+>
+>
+> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
+>
+>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that
+>
+>> their were times when run_on_cpu would have to drop the BQL and I worried
+>
+>> about it,
+>
+>> but this is the 1st time I've seen an error due to it.
+>
+>>
+>
+>> Do you know what the migration state was at that point? Was it
+>
+>> MIGRATION_STATUS_CANCELLING?
+>
+>> I'm thinking perhaps we should stop 'cont' from continuing while migration
+>
+>> is in
+>
+>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED -
+>
+>> so that
+>
+>> perhaps libvirt could avoid sending the 'cont' until then?
+>
+>
+>
+> No, there's no event, though I thought libvirt would poll until
+>
+> "query-migrate" returns the cancelled state.  Of course that is a small
+>
+> consolation, because a segfault is unacceptable.
+>
+>
+I think you might get an event if you set the new migrate capability called
+>
+'events' on!
+>
+>
+void migrate_set_state(int *state, int old_state, int new_state)
+>
+{
+>
+if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
+>
+trace_migrate_set_state(new_state);
+>
+migrate_generate_event(new_state);
+>
+}
+>
+}
+>
+>
+static void migrate_generate_event(int new_state)
+>
+{
+>
+if (migrate_use_events()) {
+>
+qapi_event_send_migration(new_state, &error_abort);
+>
+}
+>
+}
+>
+>
+That event feature went in sometime after 2.3.0.
+>
+>
+> One possibility is to suspend the monitor in qmp_migrate_cancel and
+>
+> resume it (with add_migration_state_change_notifier) when we hit the
+>
+> CANCELLED state.  I'm not sure what the latency would be between the end
+>
+> of migrate_fd_cancel and finally reaching CANCELLED.
+>
+>
+I don't like suspending monitors; it can potentially take quite a significant
+>
+time to do a cancel.
+>
+How about making 'cont' fail if we're in CANCELLING?
+Actually I thought that would be the case already (in fact CANCELLING is
+internal only; the outside world sees it as "active" in query-migrate).
+
+Lei, what is the runstate?  (That is, why did cont succeed at all)?
+
+Paolo
+
+>
+I'd really love to see the 'run_on_cpu' being more careful about the BQL;
+>
+we really need all of the rest of the devices to stay quiesced at times.
+That's not really possible, because of how condition variables work. :(
+
+* Paolo Bonzini (address@hidden) wrote:
+>
+>
+>
+On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
+>
+> * Paolo Bonzini (address@hidden) wrote:
+>
+>>
+>
+>>
+>
+>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
+>
+>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago
+>
+>>> that
+>
+>>> their were times when run_on_cpu would have to drop the BQL and I worried
+>
+>>> about it,
+>
+>>> but this is the 1st time I've seen an error due to it.
+>
+>>>
+>
+>>> Do you know what the migration state was at that point? Was it
+>
+>>> MIGRATION_STATUS_CANCELLING?
+>
+>>> I'm thinking perhaps we should stop 'cont' from continuing while
+>
+>>> migration is in
+>
+>>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED -
+>
+>>> so that
+>
+>>> perhaps libvirt could avoid sending the 'cont' until then?
+>
+>>
+>
+>> No, there's no event, though I thought libvirt would poll until
+>
+>> "query-migrate" returns the cancelled state.  Of course that is a small
+>
+>> consolation, because a segfault is unacceptable.
+>
+>
+>
+> I think you might get an event if you set the new migrate capability called
+>
+> 'events' on!
+>
+>
+>
+> void migrate_set_state(int *state, int old_state, int new_state)
+>
+> {
+>
+>     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
+>
+>         trace_migrate_set_state(new_state);
+>
+>         migrate_generate_event(new_state);
+>
+>     }
+>
+> }
+>
+>
+>
+> static void migrate_generate_event(int new_state)
+>
+> {
+>
+>     if (migrate_use_events()) {
+>
+>         qapi_event_send_migration(new_state, &error_abort);
+>
+>     }
+>
+> }
+>
+>
+>
+> That event feature went in sometime after 2.3.0.
+>
+>
+>
+>> One possibility is to suspend the monitor in qmp_migrate_cancel and
+>
+>> resume it (with add_migration_state_change_notifier) when we hit the
+>
+>> CANCELLED state.  I'm not sure what the latency would be between the end
+>
+>> of migrate_fd_cancel and finally reaching CANCELLED.
+>
+>
+>
+> I don't like suspending monitors; it can potentially take quite a
+>
+> significant
+>
+> time to do a cancel.
+>
+> How about making 'cont' fail if we're in CANCELLING?
+>
+>
+Actually I thought that would be the case already (in fact CANCELLING is
+>
+internal only; the outside world sees it as "active" in query-migrate).
+>
+>
+Lei, what is the runstate?  (That is, why did cont succeed at all)?
+I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the device
+save, and that's what we get at the end of a migrate and it's legal to restart
+from there.
+
+>
+Paolo
+>
+>
+> I'd really love to see the 'run_on_cpu' being more careful about the BQL;
+>
+> we really need all of the rest of the devices to stay quiesced at times.
+>
+>
+That's not really possible, because of how condition variables work. :(
+*Really* we need to find a solution to that - there's probably lots of 
+other things that can spring up in that small window other than the
+'cont'.
+
+Dave
+
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+On 03/03/2017 14:26, Dr. David Alan Gilbert wrote:
+>
+* Paolo Bonzini (address@hidden) wrote:
+>
+>
+>
+>
+>
+> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
+>
+>> * Paolo Bonzini (address@hidden) wrote:
+>
+>>>
+>
+>>>
+>
+>>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
+>
+>>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago
+>
+>>>> that
+>
+>>>> their were times when run_on_cpu would have to drop the BQL and I worried
+>
+>>>> about it,
+>
+>>>> but this is the 1st time I've seen an error due to it.
+>
+>>>>
+>
+>>>> Do you know what the migration state was at that point? Was it
+>
+>>>> MIGRATION_STATUS_CANCELLING?
+>
+>>>> I'm thinking perhaps we should stop 'cont' from continuing while
+>
+>>>> migration is in
+>
+>>>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED -
+>
+>>>> so that
+>
+>>>> perhaps libvirt could avoid sending the 'cont' until then?
+>
+>>>
+>
+>>> No, there's no event, though I thought libvirt would poll until
+>
+>>> "query-migrate" returns the cancelled state.  Of course that is a small
+>
+>>> consolation, because a segfault is unacceptable.
+>
+>>
+>
+>> I think you might get an event if you set the new migrate capability called
+>
+>> 'events' on!
+>
+>>
+>
+>> void migrate_set_state(int *state, int old_state, int new_state)
+>
+>> {
+>
+>>     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
+>
+>>         trace_migrate_set_state(new_state);
+>
+>>         migrate_generate_event(new_state);
+>
+>>     }
+>
+>> }
+>
+>>
+>
+>> static void migrate_generate_event(int new_state)
+>
+>> {
+>
+>>     if (migrate_use_events()) {
+>
+>>         qapi_event_send_migration(new_state, &error_abort);
+>
+>>     }
+>
+>> }
+>
+>>
+>
+>> That event feature went in sometime after 2.3.0.
+>
+>>
+>
+>>> One possibility is to suspend the monitor in qmp_migrate_cancel and
+>
+>>> resume it (with add_migration_state_change_notifier) when we hit the
+>
+>>> CANCELLED state.  I'm not sure what the latency would be between the end
+>
+>>> of migrate_fd_cancel and finally reaching CANCELLED.
+>
+>>
+>
+>> I don't like suspending monitors; it can potentially take quite a
+>
+>> significant
+>
+>> time to do a cancel.
+>
+>> How about making 'cont' fail if we're in CANCELLING?
+>
+>
+>
+> Actually I thought that would be the case already (in fact CANCELLING is
+>
+> internal only; the outside world sees it as "active" in query-migrate).
+>
+>
+>
+> Lei, what is the runstate?  (That is, why did cont succeed at all)?
+>
+>
+I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the device
+>
+save, and that's what we get at the end of a migrate and it's legal to restart
+>
+from there.
+Yeah, but I think we get there at the end of a failed migrate only.  So
+perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE and forbid
+"cont" from finish-migrate (only allow it from failed-migrate)?
+
+Paolo
+
+>
+> Paolo
+>
+>
+>
+>> I'd really love to see the 'run_on_cpu' being more careful about the BQL;
+>
+>> we really need all of the rest of the devices to stay quiesced at times.
+>
+>
+>
+> That's not really possible, because of how condition variables work. :(
+>
+>
+*Really* we need to find a solution to that - there's probably lots of
+>
+other things that can spring up in that small window other than the
+>
+'cont'.
+>
+>
+Dave
+>
+>
+--
+>
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+>
+
+Hi Paolo,
+
+On Fri, Mar 3, 2017 at 9:33 PM, Paolo Bonzini <address@hidden> wrote:
+
+>
+>
+>
+On 03/03/2017 14:26, Dr. David Alan Gilbert wrote:
+>
+> * Paolo Bonzini (address@hidden) wrote:
+>
+>>
+>
+>>
+>
+>> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
+>
+>>> * Paolo Bonzini (address@hidden) wrote:
+>
+>>>>
+>
+>>>>
+>
+>>>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
+>
+>>>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while
+>
+ago that
+>
+>>>>> their were times when run_on_cpu would have to drop the BQL and I
+>
+worried about it,
+>
+>>>>> but this is the 1st time I've seen an error due to it.
+>
+>>>>>
+>
+>>>>> Do you know what the migration state was at that point? Was it
+>
+MIGRATION_STATUS_CANCELLING?
+>
+>>>>> I'm thinking perhaps we should stop 'cont' from continuing while
+>
+migration is in
+>
+>>>>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit
+>
+CANCELLED - so that
+>
+>>>>> perhaps libvirt could avoid sending the 'cont' until then?
+>
+>>>>
+>
+>>>> No, there's no event, though I thought libvirt would poll until
+>
+>>>> "query-migrate" returns the cancelled state.  Of course that is a
+>
+small
+>
+>>>> consolation, because a segfault is unacceptable.
+>
+>>>
+>
+>>> I think you might get an event if you set the new migrate capability
+>
+called
+>
+>>> 'events' on!
+>
+>>>
+>
+>>> void migrate_set_state(int *state, int old_state, int new_state)
+>
+>>> {
+>
+>>>     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
+>
+>>>         trace_migrate_set_state(new_state);
+>
+>>>         migrate_generate_event(new_state);
+>
+>>>     }
+>
+>>> }
+>
+>>>
+>
+>>> static void migrate_generate_event(int new_state)
+>
+>>> {
+>
+>>>     if (migrate_use_events()) {
+>
+>>>         qapi_event_send_migration(new_state, &error_abort);
+>
+>>>     }
+>
+>>> }
+>
+>>>
+>
+>>> That event feature went in sometime after 2.3.0.
+>
+>>>
+>
+>>>> One possibility is to suspend the monitor in qmp_migrate_cancel and
+>
+>>>> resume it (with add_migration_state_change_notifier) when we hit the
+>
+>>>> CANCELLED state.  I'm not sure what the latency would be between the
+>
+end
+>
+>>>> of migrate_fd_cancel and finally reaching CANCELLED.
+>
+>>>
+>
+>>> I don't like suspending monitors; it can potentially take quite a
+>
+significant
+>
+>>> time to do a cancel.
+>
+>>> How about making 'cont' fail if we're in CANCELLING?
+>
+>>
+>
+>> Actually I thought that would be the case already (in fact CANCELLING is
+>
+>> internal only; the outside world sees it as "active" in query-migrate).
+>
+>>
+>
+>> Lei, what is the runstate?  (That is, why did cont succeed at all)?
+>
+>
+>
+> I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the
+>
+device
+>
+> save, and that's what we get at the end of a migrate and it's legal to
+>
+restart
+>
+> from there.
+>
+>
+Yeah, but I think we get there at the end of a failed migrate only.  So
+>
+perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE
+I think we do not need to introduce a new state here. If we hit 'cont' and
+the run state is RUN_STATE_FINISH_MIGRATE, we could assume that
+migration failed because 'RUN_STATE_FINISH_MIGRATE' only exists on
+source side, means we are finishing migration, a 'cont' at the meantime
+indicates that we are rolling back, otherwise source side should be
+destroyed.
+
+
+>
+and forbid
+>
+"cont" from finish-migrate (only allow it from failed-migrate)?
+>
+The problem of forbid 'cont' here is that it will result in a failed
+migration and the source
+side will remain paused. We actually expect a usable guest when rollback.
+Is there a way to kill migration thread when we're under main thread, if
+there is, we
+could do the following to solve this problem:
+1. 'cont' received during runstate RUN_STATE_FINISH_MIGRATE
+2. kill migration thread
+3. vm_start()
+
+But this only solves 'cont' problem. As Dave said before, other things could
+happen during the small windows while we are finishing migration, that's
+what I was worried about...
+
+
+>
+Paolo
+>
+>
+>> Paolo
+>
+>>
+>
+>>> I'd really love to see the 'run_on_cpu' being more careful about the
+>
+BQL;
+>
+>>> we really need all of the rest of the devices to stay quiesced at
+>
+times.
+>
+>>
+>
+>> That's not really possible, because of how condition variables work. :(
+>
+>
+>
+> *Really* we need to find a solution to that - there's probably lots of
+>
+> other things that can spring up in that small window other than the
+>
+> 'cont'.
+>
+>
+>
+> Dave
+>
+>
+>
+> --
+>
+> Dr. David Alan Gilbert / address@hidden / Manchester, UK
+>
+>
+>
+>
+
+* Paolo Bonzini (address@hidden) wrote:
+>
+>
+>
+On 03/03/2017 14:26, Dr. David Alan Gilbert wrote:
+>
+> * Paolo Bonzini (address@hidden) wrote:
+>
+>>
+>
+>>
+>
+>> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
+>
+>>> * Paolo Bonzini (address@hidden) wrote:
+>
+>>>>
+>
+>>>>
+>
+>>>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
+>
+>>>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago
+>
+>>>>> that
+>
+>>>>> their were times when run_on_cpu would have to drop the BQL and I
+>
+>>>>> worried about it,
+>
+>>>>> but this is the 1st time I've seen an error due to it.
+>
+>>>>>
+>
+>>>>> Do you know what the migration state was at that point? Was it
+>
+>>>>> MIGRATION_STATUS_CANCELLING?
+>
+>>>>> I'm thinking perhaps we should stop 'cont' from continuing while
+>
+>>>>> migration is in
+>
+>>>>> MIGRATION_STATUS_CANCELLING.  Do we send an event when we hit CANCELLED
+>
+>>>>> - so that
+>
+>>>>> perhaps libvirt could avoid sending the 'cont' until then?
+>
+>>>>
+>
+>>>> No, there's no event, though I thought libvirt would poll until
+>
+>>>> "query-migrate" returns the cancelled state.  Of course that is a small
+>
+>>>> consolation, because a segfault is unacceptable.
+>
+>>>
+>
+>>> I think you might get an event if you set the new migrate capability
+>
+>>> called
+>
+>>> 'events' on!
+>
+>>>
+>
+>>> void migrate_set_state(int *state, int old_state, int new_state)
+>
+>>> {
+>
+>>>     if (atomic_cmpxchg(state, old_state, new_state) == old_state) {
+>
+>>>         trace_migrate_set_state(new_state);
+>
+>>>         migrate_generate_event(new_state);
+>
+>>>     }
+>
+>>> }
+>
+>>>
+>
+>>> static void migrate_generate_event(int new_state)
+>
+>>> {
+>
+>>>     if (migrate_use_events()) {
+>
+>>>         qapi_event_send_migration(new_state, &error_abort);
+>
+>>>     }
+>
+>>> }
+>
+>>>
+>
+>>> That event feature went in sometime after 2.3.0.
+>
+>>>
+>
+>>>> One possibility is to suspend the monitor in qmp_migrate_cancel and
+>
+>>>> resume it (with add_migration_state_change_notifier) when we hit the
+>
+>>>> CANCELLED state.  I'm not sure what the latency would be between the end
+>
+>>>> of migrate_fd_cancel and finally reaching CANCELLED.
+>
+>>>
+>
+>>> I don't like suspending monitors; it can potentially take quite a
+>
+>>> significant
+>
+>>> time to do a cancel.
+>
+>>> How about making 'cont' fail if we're in CANCELLING?
+>
+>>
+>
+>> Actually I thought that would be the case already (in fact CANCELLING is
+>
+>> internal only; the outside world sees it as "active" in query-migrate).
+>
+>>
+>
+>> Lei, what is the runstate?  (That is, why did cont succeed at all)?
+>
+>
+>
+> I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the
+>
+> device
+>
+> save, and that's what we get at the end of a migrate and it's legal to
+>
+> restart
+>
+> from there.
+>
+>
+Yeah, but I think we get there at the end of a failed migrate only.  So
+>
+perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE and forbid
+>
+"cont" from finish-migrate (only allow it from failed-migrate)?
+OK, I was wrong in my previous statement; we actually go 
+FINISH_MIGRATE->POSTMIGRATE
+so no new state is needed; you shouldn't be restarting the cpu in 
+FINISH_MIGRATE.
+
+My preference is to get libvirt to wait for the transition to POSTMIGRATE before
+it issues the 'cont'.  I'd rather not block the monitor with 'cont' but I'm
+not sure how we'd cleanly make cont fail without breaking existing libvirts
+that usually don't hit this race. (cc'ing in Jiri).
+
+Dave
+
+>
+Paolo
+>
+>
+>> Paolo
+>
+>>
+>
+>>> I'd really love to see the 'run_on_cpu' being more careful about the BQL;
+>
+>>> we really need all of the rest of the devices to stay quiesced at times.
+>
+>>
+>
+>> That's not really possible, because of how condition variables work. :(
+>
+>
+>
+> *Really* we need to find a solution to that - there's probably lots of
+>
+> other things that can spring up in that small window other than the
+>
+> 'cont'.
+>
+>
+>
+> Dave
+>
+>
+>
+> --
+>
+> Dr. David Alan Gilbert / address@hidden / Manchester, UK
+>
+>
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+Hi Dave,
+
+On Fri, Mar 3, 2017 at 9:26 PM, Dr. David Alan Gilbert <address@hidden>
+wrote:
+
+>
+* Paolo Bonzini (address@hidden) wrote:
+>
+>
+>
+>
+>
+> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote:
+>
+> > * Paolo Bonzini (address@hidden) wrote:
+>
+> >>
+>
+> >>
+>
+> >> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote:
+>
+...
+>
+> > That event feature went in sometime after 2.3.0.
+>
+> >
+>
+> >> One possibility is to suspend the monitor in qmp_migrate_cancel and
+>
+> >> resume it (with add_migration_state_change_notifier) when we hit the
+>
+> >> CANCELLED state.  I'm not sure what the latency would be between the
+>
+end
+>
+> >> of migrate_fd_cancel and finally reaching CANCELLED.
+>
+> >
+>
+> > I don't like suspending monitors; it can potentially take quite a
+>
+significant
+>
+> > time to do a cancel.
+>
+> > How about making 'cont' fail if we're in CANCELLING?
+>
+>
+>
+> Actually I thought that would be the case already (in fact CANCELLING is
+>
+> internal only; the outside world sees it as "active" in query-migrate).
+>
+>
+>
+> Lei, what is the runstate?  (That is, why did cont succeed at all)?
+>
+>
+I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the
+>
+device
+>
+It is RUN_STATE_FINISH_MIGRATE.
+
+
+>
+save, and that's what we get at the end of a migrate and it's legal to
+>
+restart
+>
+from there.
+>
+>
+> Paolo
+>
+>
+>
+> > I'd really love to see the 'run_on_cpu' being more careful about the
+>
+BQL;
+>
+> > we really need all of the rest of the devices to stay quiesced at
+>
+times.
+>
+>
+>
+> That's not really possible, because of how condition variables work. :(
+>
+>
+*Really* we need to find a solution to that - there's probably lots of
+>
+other things that can spring up in that small window other than the
+>
+'cont'.
+>
+This is what I was worry about. Not only sync_cpu_state() will call
+run_on_cpu()
+but also vm_stop_force_state() will, both of them did hit the small windows
+in our
+test.
+
+
+>
+>
+Dave
+>
+>
+--
+>
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+>
+>
+
diff --git a/results/classifier/004/other/17743720 b/results/classifier/004/other/17743720
new file mode 100644
index 00000000..a3f02429
--- /dev/null
+++ b/results/classifier/004/other/17743720
@@ -0,0 +1,779 @@
+other: 0.984
+graphic: 0.972
+device: 0.971
+instruction: 0.966
+assembly: 0.966
+semantic: 0.962
+mistranslation: 0.959
+socket: 0.954
+vnc: 0.945
+boot: 0.945
+network: 0.944
+KVM: 0.933
+
+[Qemu-devel] [BUG] living migrate vm pause forever
+
+Sometimes, living migrate vm pause forever, migrate job stop, but very small 
+probability, I canât reproduce.
+qemu wait semaphore from libvirt send migrate continue, however libvirt wait 
+semaphore from qemu send vm pause.
+
+follow stack:
+qemu:
+Thread 6 (Thread 0x7f50445f3700 (LWP 18120)):
+#0  0x00007f504b84d670 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
+#1  0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) at 
+qemu-2.12/util/qemu-thread-posix.c:322
+#2  0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, 
+current_active_state=0x7f50445f2ae4, new_state=10)
+    at qemu-2.12/migration/migration.c:2106
+#3  0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at 
+qemu-2.12/migration/migration.c:2137
+#4  migration_iteration_run (s=0x5574ef692f50) at 
+qemu-2.12/migration/migration.c:2311
+#5  migration_thread (opaque=0x5574ef692f50) 
+atqemu-2.12/migration/migration.c:2415
+#6  0x00007f504b847184 in start_thread () from 
+/lib/x86_64-linux-gnu/libpthread.so.0
+#7  0x00007f504b574bed in clone () from /lib/x86_64-linux-gnu/libc.so.6
+
+libvirt:
+Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)):
+#0  0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from 
+/lib/x86_64-linux-gnu/libpthread.so.0
+#1  0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, m=0x7fdbc4002f30) at 
+../../../src/util/virthread.c:252
+#2  0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at 
+../../../src/conf/domain_conf.c:3303
+#3  0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, 
+vm=0x7fdbc4002f20, persist_xml=0x0,
+    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss 
+</hostname>\n  
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
+flags=777,
+    resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, 
+nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, 
+migParams=0x7fdb82ffc900)
+    at ../../../src/qemu/qemu_migration.c:3937
+#4  0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, 
+vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 
+"tcp://172.16.202.17:49152",
+    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss</hostname>\n 
+ <hos---Type <return> to continue, or q <return> to quit---
+tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
+flags=777,
+    resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, 
+migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900)
+    at ../../../src/qemu/qemu_migration.c:4118
+#5  0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, 
+conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0,
+    uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, 
+nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, 
+migParams=0x7fdb82ffc900,
+    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss</hostname>\n 
+ <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
+flags=777,
+    resource=0) at ../../../src/qemu/qemu_migration.c:5030
+#6  0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, 
+conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, 
+dconnuri=0x0,
+    uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, 
+listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, 
+compression=0x7fdb78007990,
+    migParams=0x7fdb82ffc900,
+    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss</hostname>\n 
+ <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
+flags=777,
+    dname=0x0, resource=0, v3proto=true) at 
+../../../src/qemu/qemu_migration.c:5124
+#7  0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, 
+xmlin=0x0,
+    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss</hostname>\n 
+ <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
+dconnuri=0x0,
+    uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, 
+resource=0) at ../../../src/qemu/qemu_driver.c:12996
+#8  0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, 
+xmlin=0x0,
+    cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n  
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss</hostname>\n 
+ <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., 
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, 
+dconnuri=0x0,
+    uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, 
+bandwidth=0) at ../../../src/libvirt-domain.c:4698
+#9  0x000055d13923a939 in remoteDispatchDomainMigratePerform3 
+(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, 
+rerr=0x7fdb82ffcbc0,
+    args=0x7fdb7800b220, ret=0x7fdb78021e90) at ../../../daemon/remote.c:4528
+#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper 
+(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, 
+rerr=0x7fdb82ffcbc0,
+    args=0x7fdb7800b220, ret=0x7fdb78021e90) at 
+../../../daemon/remote_dispatch.h:7944
+#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall (prog=0x55d13af98b50, 
+server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620)
+    at ../../../src/rpc/virnetserverprogram.c:436
+#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, 
+server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620)
+    at ../../../src/rpc/virnetserverprogram.c:307
+#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, 
+client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620)
+    at ../../../src/rpc/virnetserver.c:148
+-------------------------------------------------------------------------------------------------------------------------------------
+æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸ååº
+çä¸ªäººæç¾¤ç»ãç¦æ¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
+ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
+é®ä»¶ï¼
+This e-mail and its attachments contain confidential information from New H3C, 
+which is
+intended only for the person or entity whose address is listed above. Any use 
+of the
+information contained herein in any way (including, but not limited to, total 
+or partial
+disclosure, reproduction, or dissemination) by persons other than the intended
+recipient(s) is prohibited. If you receive this e-mail in error, please notify 
+the sender
+by phone or email immediately and delete it!
+
+* Yuchen (address@hidden) wrote:
+>
+Sometimes, living migrate vm pause forever, migrate job stop, but very small
+>
+probability, I canât reproduce.
+>
+qemu wait semaphore from libvirt send migrate continue, however libvirt wait
+>
+semaphore from qemu send vm pause.
+Hi,
+  I've copied in Jiri Denemark from libvirt.
+Can you confirm exactly which qemu and libvirt versions you're using
+please.
+
+>
+follow stack:
+>
+qemu:
+>
+Thread 6 (Thread 0x7f50445f3700 (LWP 18120)):
+>
+#0  0x00007f504b84d670 in sem_wait () from
+>
+/lib/x86_64-linux-gnu/libpthread.so.0
+>
+#1  0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) at
+>
+qemu-2.12/util/qemu-thread-posix.c:322
+>
+#2  0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50,
+>
+current_active_state=0x7f50445f2ae4, new_state=10)
+>
+at qemu-2.12/migration/migration.c:2106
+>
+#3  0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at
+>
+qemu-2.12/migration/migration.c:2137
+>
+#4  migration_iteration_run (s=0x5574ef692f50) at
+>
+qemu-2.12/migration/migration.c:2311
+>
+#5  migration_thread (opaque=0x5574ef692f50)
+>
+atqemu-2.12/migration/migration.c:2415
+>
+#6  0x00007f504b847184 in start_thread () from
+>
+/lib/x86_64-linux-gnu/libpthread.so.0
+>
+#7  0x00007f504b574bed in clone () from /lib/x86_64-linux-gnu/libc.so.6
+In migration_maybe_pause we have:
+
+    migrate_set_state(&s->state, *current_active_state,
+                      MIGRATION_STATUS_PRE_SWITCHOVER);
+    qemu_sem_wait(&s->pause_sem);
+    migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
+                      new_state);
+
+the line numbers don't match my 2.12.0 checkout; so I guess that it's
+that qemu_sem_wait it's stuck at.
+
+QEMU must have sent the switch to PRE_SWITCHOVER and that should have
+sent an event to libvirt, and libvirt should notice that - I'm
+not sure how to tell whether libvirt has seen that event yet or not?
+
+Dave
+
+>
+libvirt:
+>
+Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)):
+>
+#0  0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from
+>
+/lib/x86_64-linux-gnu/libpthread.so.0
+>
+#1  0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, m=0x7fdbc4002f30) at
+>
+../../../src/util/virthread.c:252
+>
+#2  0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at
+>
+../../../src/conf/domain_conf.c:3303
+>
+#3  0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0,
+>
+vm=0x7fdbc4002f20, persist_xml=0x0,
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss
+>
+</hostname>\n
+>
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+flags=777,
+>
+resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0,
+>
+nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990,
+>
+migParams=0x7fdb82ffc900)
+>
+at ../../../src/qemu/qemu_migration.c:3937
+>
+#4  0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0,
+>
+vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0
+>
+"tcp://172.16.202.17:49152",
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
+>
+<hostname>mss</hostname>\n  <hos---Type <return> to continue, or q <return>
+>
+to quit---
+>
+tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+flags=777,
+>
+resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0,
+>
+migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900)
+>
+at ../../../src/qemu/qemu_migration.c:4118
+>
+#5  0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0,
+>
+conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0,
+>
+uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0,
+>
+nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990,
+>
+migParams=0x7fdb82ffc900,
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
+>
+<hostname>mss</hostname>\n
+>
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+flags=777,
+>
+resource=0) at ../../../src/qemu/qemu_migration.c:5030
+>
+#6  0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0,
+>
+conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0,
+>
+dconnuri=0x0,
+>
+uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0,
+>
+listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0,
+>
+compression=0x7fdb78007990,
+>
+migParams=0x7fdb82ffc900,
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
+>
+<hostname>mss</hostname>\n
+>
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+flags=777,
+>
+dname=0x0, resource=0, v3proto=true) at
+>
+../../../src/qemu/qemu_migration.c:5124
+>
+#7  0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00,
+>
+xmlin=0x0,
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
+>
+<hostname>mss</hostname>\n
+>
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+dconnuri=0x0,
+>
+uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0,
+>
+resource=0) at ../../../src/qemu/qemu_driver.c:12996
+>
+#8  0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00,
+>
+xmlin=0x0,
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
+>
+<hostname>mss</hostname>\n
+>
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+dconnuri=0x0,
+>
+uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0,
+>
+bandwidth=0) at ../../../src/libvirt-domain.c:4698
+>
+#9  0x000055d13923a939 in remoteDispatchDomainMigratePerform3
+>
+(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620,
+>
+rerr=0x7fdb82ffcbc0,
+>
+args=0x7fdb7800b220, ret=0x7fdb78021e90) at ../../../daemon/remote.c:4528
+>
+#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper
+>
+(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620,
+>
+rerr=0x7fdb82ffcbc0,
+>
+args=0x7fdb7800b220, ret=0x7fdb78021e90) at
+>
+../../../daemon/remote_dispatch.h:7944
+>
+#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall
+>
+(prog=0x55d13af98b50, server=0x55d13af90e60, client=0x55d13b0156f0,
+>
+msg=0x55d13afbf620)
+>
+at ../../../src/rpc/virnetserverprogram.c:436
+>
+#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50,
+>
+server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620)
+>
+at ../../../src/rpc/virnetserverprogram.c:307
+>
+#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60,
+>
+client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620)
+>
+at ../../../src/rpc/virnetserver.c:148
+>
+-------------------------------------------------------------------------------------------------------------------------------------
+>
+æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸ååº
+>
+çä¸ªäººæç¾¤ç»ãç¦æ¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
+>
+ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
+>
+é®ä»¶ï¼
+>
+This e-mail and its attachments contain confidential information from New
+>
+H3C, which is
+>
+intended only for the person or entity whose address is listed above. Any use
+>
+of the
+>
+information contained herein in any way (including, but not limited to, total
+>
+or partial
+>
+disclosure, reproduction, or dissemination) by persons other than the intended
+>
+recipient(s) is prohibited. If you receive this e-mail in error, please
+>
+notify the sender
+>
+by phone or email immediately and delete it!
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+In migration_maybe_pause we have:
+
+    migrate_set_state(&s->state, *current_active_state,
+                      MIGRATION_STATUS_PRE_SWITCHOVER);
+    qemu_sem_wait(&s->pause_sem);
+    migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
+                      new_state);
+
+the line numbers don't match my 2.12.0 checkout; so I guess that it's that 
+qemu_sem_wait it's stuck at.
+
+QEMU must have sent the switch to PRE_SWITCHOVER and that should have sent an 
+event to libvirt, and libvirt should notice that - I'm not sure how to tell 
+whether libvirt has seen that event yet or not?
+
+
+Thank you for your attention. 
+Yes, you are right, QEMU wait semaphore in this place.
+I use qemu-2.12.1, libvirt-4.0.0.
+Because I added some debug code, so the line numbers doesn't match open qemu
+
+-----é®ä»¶åä»¶-----
+åä»¶äºº: Dr. David Alan Gilbert [
+mailto:address@hidden
+] 
+åéæ¶é´: 2019å¹´8æ21æ¥ 19:13
+æ¶ä»¶äºº: yuchen (Cloud) <address@hidden>; address@hidden
+æé: address@hidden
+ä¸»é¢: Re: [Qemu-devel] [BUG] living migrate vm pause forever
+
+* Yuchen (address@hidden) wrote:
+>
+Sometimes, living migrate vm pause forever, migrate job stop, but very small
+>
+probability, I canât reproduce.
+>
+qemu wait semaphore from libvirt send migrate continue, however libvirt wait
+>
+semaphore from qemu send vm pause.
+Hi,
+  I've copied in Jiri Denemark from libvirt.
+Can you confirm exactly which qemu and libvirt versions you're using please.
+
+>
+follow stack:
+>
+qemu:
+>
+Thread 6 (Thread 0x7f50445f3700 (LWP 18120)):
+>
+#0  0x00007f504b84d670 in sem_wait () from
+>
+/lib/x86_64-linux-gnu/libpthread.so.0
+>
+#1  0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0)
+>
+at qemu-2.12/util/qemu-thread-posix.c:322
+>
+#2  0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50,
+>
+current_active_state=0x7f50445f2ae4, new_state=10)
+>
+at qemu-2.12/migration/migration.c:2106
+>
+#3  0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at
+>
+qemu-2.12/migration/migration.c:2137
+>
+#4  migration_iteration_run (s=0x5574ef692f50) at
+>
+qemu-2.12/migration/migration.c:2311
+>
+#5  migration_thread (opaque=0x5574ef692f50)
+>
+atqemu-2.12/migration/migration.c:2415
+>
+#6  0x00007f504b847184 in start_thread () from
+>
+/lib/x86_64-linux-gnu/libpthread.so.0
+>
+#7  0x00007f504b574bed in clone () from
+>
+/lib/x86_64-linux-gnu/libc.so.6
+In migration_maybe_pause we have:
+
+    migrate_set_state(&s->state, *current_active_state,
+                      MIGRATION_STATUS_PRE_SWITCHOVER);
+    qemu_sem_wait(&s->pause_sem);
+    migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
+                      new_state);
+
+the line numbers don't match my 2.12.0 checkout; so I guess that it's that 
+qemu_sem_wait it's stuck at.
+
+QEMU must have sent the switch to PRE_SWITCHOVER and that should have sent an 
+event to libvirt, and libvirt should notice that - I'm not sure how to tell 
+whether libvirt has seen that event yet or not?
+
+Dave
+
+>
+libvirt:
+>
+Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)):
+>
+#0  0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from
+>
+/lib/x86_64-linux-gnu/libpthread.so.0
+>
+#1  0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000,
+>
+m=0x7fdbc4002f30) at ../../../src/util/virthread.c:252
+>
+#2  0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at
+>
+../../../src/conf/domain_conf.c:3303
+>
+#3  0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0,
+>
+vm=0x7fdbc4002f20, persist_xml=0x0,
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n  <hostname>mss
+>
+</hostname>\n
+>
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+flags=777,
+>
+resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0,
+>
+nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990,
+>
+migParams=0x7fdb82ffc900)
+>
+at ../../../src/qemu/qemu_migration.c:3937
+>
+#4  0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0,
+>
+vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0
+>
+"tcp://172.16.202.17:49152",
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n
+>
+<name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
+>
+<hostname>mss</hostname>\n  <hos---Type <return> to continue, or q
+>
+<return> to quit---
+>
+tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..
+>
+tuuid>., cookieinlen=207, cookieout=0x7fdb82ffcad0,
+>
+tuuid>cookieoutlen=0x7fdb82ffcac8, flags=777,
+>
+resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0,
+>
+migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900)
+>
+at ../../../src/qemu/qemu_migration.c:4118
+>
+#5  0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0,
+>
+conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0,
+>
+uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0,
+>
+nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990,
+>
+migParams=0x7fdb82ffc900,
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
+>
+<hostname>mss</hostname>\n
+>
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+flags=777,
+>
+resource=0) at ../../../src/qemu/qemu_migration.c:5030
+>
+#6  0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0,
+>
+conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0,
+>
+dconnuri=0x0,
+>
+uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0,
+>
+listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0,
+>
+compression=0x7fdb78007990,
+>
+migParams=0x7fdb82ffc900,
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
+>
+<hostname>mss</hostname>\n
+>
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+flags=777,
+>
+dname=0x0, resource=0, v3proto=true) at
+>
+../../../src/qemu/qemu_migration.c:5124
+>
+#7  0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00,
+>
+xmlin=0x0,
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
+>
+<hostname>mss</hostname>\n
+>
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+dconnuri=0x0,
+>
+uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777,
+>
+dname=0x0, resource=0) at ../../../src/qemu/qemu_driver.c:12996
+>
+#8  0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00,
+>
+xmlin=0x0,
+>
+cookiein=0x7fdb780084e0 "<qemu-migration>\n  <name>mss-pl_652</name>\n
+>
+<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n
+>
+<hostname>mss</hostname>\n
+>
+<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"...,
+>
+cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8,
+>
+dconnuri=0x0,
+>
+uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777,
+>
+dname=0x0, bandwidth=0) at ../../../src/libvirt-domain.c:4698
+>
+#9  0x000055d13923a939 in remoteDispatchDomainMigratePerform3
+>
+(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620,
+>
+rerr=0x7fdb82ffcbc0,
+>
+args=0x7fdb7800b220, ret=0x7fdb78021e90) at
+>
+../../../daemon/remote.c:4528
+>
+#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper
+>
+(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620,
+>
+rerr=0x7fdb82ffcbc0,
+>
+args=0x7fdb7800b220, ret=0x7fdb78021e90) at
+>
+../../../daemon/remote_dispatch.h:7944
+>
+#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall
+>
+(prog=0x55d13af98b50, server=0x55d13af90e60, client=0x55d13b0156f0,
+>
+msg=0x55d13afbf620)
+>
+at ../../../src/rpc/virnetserverprogram.c:436
+>
+#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50,
+>
+server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620)
+>
+at ../../../src/rpc/virnetserverprogram.c:307
+>
+#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60,
+>
+client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620)
+>
+at ../../../src/rpc/virnetserver.c:148
+>
+----------------------------------------------------------------------
+>
+---------------------------------------------------------------
+>
+æ¬é®ä»¶åå¶éä»¶å«ææ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä»éäºåéç»ä¸é¢å°åä¸ååº
+>
+çä¸ªäººæç¾¤ç»ãç¦æ¢ä»»ä½å¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼åæ¬ä½ä¸éäºå¨é¨æé¨åå°æ³é²ãå¤å¶ã
+>
+ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ãå¦ææ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥åä»¶äººå¹¶å é¤æ¬
+>
+é®ä»¶ï¼
+>
+This e-mail and its attachments contain confidential information from
+>
+New H3C, which is intended only for the person or entity whose address
+>
+is listed above. Any use of the information contained herein in any
+>
+way (including, but not limited to, total or partial disclosure,
+>
+reproduction, or dissemination) by persons other than the intended
+>
+recipient(s) is prohibited. If you receive this e-mail in error,
+>
+please notify the sender by phone or email immediately and delete it!
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
diff --git a/results/classifier/004/other/21221931 b/results/classifier/004/other/21221931
new file mode 100644
index 00000000..ed9cdbd7
--- /dev/null
+++ b/results/classifier/004/other/21221931
@@ -0,0 +1,336 @@
+other: 0.979
+network: 0.976
+instruction: 0.974
+device: 0.971
+semantic: 0.967
+assembly: 0.963
+socket: 0.957
+graphic: 0.948
+boot: 0.947
+vnc: 0.944
+mistranslation: 0.933
+KVM: 0.913
+
+[BUG] qemu git error with virgl
+
+Hello,
+
+i can't start any system if i use virgl. I get the following error:
+qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
+`con->gl' failed.
+./and.sh: line 27: 3337167 AbortedÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  qemu-x86_64 -m 4096
+-smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
+virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
+intel-hda,id=sound0,msi=on -device
+hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
+-device usb-tablet,bus=xhci.0 -net
+nic,macaddr=52:54:00:12:34:62,model=e1000 -net
+tap,ifname=$INTERFACE,script=no,downscript=no -drive
+file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
+Set 'tap3' nonpersistent
+
+i have bicected the issue:
+
+towo:Defiant> git bisect good
+b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
+commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
+Author: Paolo Bonzini <pbonzini@redhat.com>
+Date:Â Â  Tue Oct 27 08:44:23 2020 -0400
+
+Â Â Â  vl: remove separate preconfig main_loop
+Â Â Â  Move post-preconfig initialization to the x-exit-preconfig. If
+preconfig
+Â Â Â  is not requested, just exit preconfig mode immediately with the QMP
+Â Â Â  command.
+
+Â Â Â  As a result, the preconfig loop will run with accel_setup_post
+Â Â Â  and os_setup_post restrictions (xen_restrict, chroot, etc.)
+Â Â Â  already done.
+
+Â Â Â  Reviewed-by: Igor Mammedov <imammedo@redhat.com>
+Â Â Â  Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+
+Â include/sysemu/runstate.h |Â  1 -
+Â monitor/qmp-cmds.cÂ Â Â Â Â Â Â  |Â  9 -----
+Â softmmu/vl.cÂ Â Â Â Â Â Â Â Â Â Â Â Â  | 95
+++++++++++++++++++++---------------------------
+Â 3 files changed, 41 insertions(+), 64 deletions(-)
+
+Regards,
+
+Torsten Wohlfarth
+
+Cc'ing Gerd + patch author/reviewer.
+
+On 1/2/21 2:11 PM, Torsten Wohlfarth wrote:
+>
+Hello,
+>
+>
+i can't start any system if i use virgl. I get the following error:
+>
+>
+qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
+>
+`con->gl' failed.
+>
+./and.sh: line 27: 3337167 AbortedÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  qemu-x86_64 -m 4096
+>
+-smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
+>
+virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
+>
+intel-hda,id=sound0,msi=on -device
+>
+hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
+>
+-device usb-tablet,bus=xhci.0 -net
+>
+nic,macaddr=52:54:00:12:34:62,model=e1000 -net
+>
+tap,ifname=$INTERFACE,script=no,downscript=no -drive
+>
+file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
+>
+>
+Set 'tap3' nonpersistent
+>
+>
+i have bicected the issue:
+>
+>
+towo:Defiant> git bisect good
+>
+b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
+>
+commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
+>
+Author: Paolo Bonzini <pbonzini@redhat.com>
+>
+Date:Â Â  Tue Oct 27 08:44:23 2020 -0400
+>
+>
+Â Â Â  vl: remove separate preconfig main_loop
+>
+>
+Â Â Â  Move post-preconfig initialization to the x-exit-preconfig. If
+>
+preconfig
+>
+Â Â Â  is not requested, just exit preconfig mode immediately with the QMP
+>
+Â Â Â  command.
+>
+>
+Â Â Â  As a result, the preconfig loop will run with accel_setup_post
+>
+Â Â Â  and os_setup_post restrictions (xen_restrict, chroot, etc.)
+>
+Â Â Â  already done.
+>
+>
+Â Â Â  Reviewed-by: Igor Mammedov <imammedo@redhat.com>
+>
+Â Â Â  Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+>
+>
+Â include/sysemu/runstate.h |Â  1 -
+>
+Â monitor/qmp-cmds.cÂ Â Â Â Â Â Â  |Â  9 -----
+>
+Â softmmu/vl.cÂ Â Â Â Â Â Â Â Â Â Â Â Â  | 95
+>
+++++++++++++++++++++---------------------------
+>
+Â 3 files changed, 41 insertions(+), 64 deletions(-)
+>
+>
+Regards,
+>
+>
+Torsten Wohlfarth
+>
+>
+>
+
+On Sun, 3 Jan 2021 18:28:11 +0100
+Philippe Mathieu-DaudÃ© <philmd@redhat.com> wrote:
+
+>
+Cc'ing Gerd + patch author/reviewer.
+>
+>
+On 1/2/21 2:11 PM, Torsten Wohlfarth wrote:
+>
+> Hello,
+>
+>
+>
+> i can't start any system if i use virgl. I get the following error:
+>
+>
+>
+> qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
+>
+> `con->gl' failed.
+Does following fix issue:
+  [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration
+
+>
+> ./and.sh: line 27: 3337167 AbortedÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  qemu-x86_64 -m 4096
+>
+> -smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
+>
+> virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
+>
+> intel-hda,id=sound0,msi=on -device
+>
+> hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
+>
+> -device usb-tablet,bus=xhci.0 -net
+>
+> nic,macaddr=52:54:00:12:34:62,model=e1000 -net
+>
+> tap,ifname=$INTERFACE,script=no,downscript=no -drive
+>
+> file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
+>
+>
+>
+> Set 'tap3' nonpersistent
+>
+>
+>
+> i have bicected the issue:
+>
+>
+>
+> towo:Defiant> git bisect good
+>
+> b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
+>
+> commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
+>
+> Author: Paolo Bonzini <pbonzini@redhat.com>
+>
+> Date:Â Â  Tue Oct 27 08:44:23 2020 -0400
+>
+>
+>
+> Â Â Â  vl: remove separate preconfig main_loop
+>
+>
+>
+> Â Â Â  Move post-preconfig initialization to the x-exit-preconfig. If
+>
+> preconfig
+>
+> Â Â Â  is not requested, just exit preconfig mode immediately with the QMP
+>
+> Â Â Â  command.
+>
+>
+>
+> Â Â Â  As a result, the preconfig loop will run with accel_setup_post
+>
+> Â Â Â  and os_setup_post restrictions (xen_restrict, chroot, etc.)
+>
+> Â Â Â  already done.
+>
+>
+>
+> Â Â Â  Reviewed-by: Igor Mammedov <imammedo@redhat.com>
+>
+> Â Â Â  Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+>
+>
+>
+> Â include/sysemu/runstate.h |Â  1 -
+>
+> Â monitor/qmp-cmds.cÂ Â Â Â Â Â Â  |Â  9 -----
+>
+> Â softmmu/vl.cÂ Â Â Â Â Â Â Â Â Â Â Â Â  | 95
+>
+> ++++++++++++++++++++---------------------------
+>
+> Â 3 files changed, 41 insertions(+), 64 deletions(-)
+>
+>
+>
+> Regards,
+>
+>
+>
+> Torsten Wohlfarth
+>
+>
+>
+>
+>
+>
+>
+>
+
+Hi Igor,
+
+yes, that fixes my issue.
+
+Regards, Torsten
+
+Am 04.01.21 um 19:50 schrieb Igor Mammedov:
+On Sun, 3 Jan 2021 18:28:11 +0100
+Philippe Mathieu-DaudÃ© <philmd@redhat.com> wrote:
+Cc'ing Gerd + patch author/reviewer.
+
+On 1/2/21 2:11 PM, Torsten Wohlfarth wrote:
+Hello,
+
+i can't start any system if i use virgl. I get the following error:
+
+qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
+`con->gl' failed.
+Does following fix issue:
+   [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration
+./and.sh: line 27: 3337167 AbortedÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  qemu-x86_64 -m 4096
+-smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
+virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
+intel-hda,id=sound0,msi=on -device
+hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
+-device usb-tablet,bus=xhci.0 -net
+nic,macaddr=52:54:00:12:34:62,model=e1000 -net
+tap,ifname=$INTERFACE,script=no,downscript=no -drive
+file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
+
+Set 'tap3' nonpersistent
+
+i have bicected the issue:
+towo:Defiant> git bisect good
+b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
+commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
+Author: Paolo Bonzini <pbonzini@redhat.com>
+Date:Â Â  Tue Oct 27 08:44:23 2020 -0400
+
+ Â Â Â  vl: remove separate preconfig main_loop
+
+ Â Â Â  Move post-preconfig initialization to the x-exit-preconfig. If
+preconfig
+ Â Â Â  is not requested, just exit preconfig mode immediately with the QMP
+ Â Â Â  command.
+
+ Â Â Â  As a result, the preconfig loop will run with accel_setup_post
+ Â Â Â  and os_setup_post restrictions (xen_restrict, chroot, etc.)
+ Â Â Â  already done.
+
+ Â Â Â  Reviewed-by: Igor Mammedov <imammedo@redhat.com>
+ Â Â Â  Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+
+ Â include/sysemu/runstate.h |Â  1 -
+ Â monitor/qmp-cmds.cÂ Â Â Â Â Â Â  |Â  9 -----
+ Â softmmu/vl.cÂ Â Â Â Â Â Â Â Â Â Â Â Â  | 95
+++++++++++++++++++++---------------------------
+ Â 3 files changed, 41 insertions(+), 64 deletions(-)
+
+Regards,
+
+Torsten Wohlfarth
+
diff --git a/results/classifier/004/other/21247035 b/results/classifier/004/other/21247035
new file mode 100644
index 00000000..080554a9
--- /dev/null
+++ b/results/classifier/004/other/21247035
@@ -0,0 +1,1329 @@
+other: 0.640
+mistranslation: 0.584
+device: 0.525
+KVM: 0.514
+instruction: 0.508
+graphic: 0.426
+assembly: 0.391
+semantic: 0.374
+vnc: 0.367
+boot: 0.345
+network: 0.322
+socket: 0.322
+
+[Qemu-devel] [BUG] I/O thread segfault for QEMU on s390x
+
+Hi,
+I have been noticing some segfaults for QEMU on s390x, and I have been
+hitting this issue quite reliably (at least once in 10 runs of a test
+case). The qemu version is 2.11.50, and I have systemd created coredumps
+when this happens.
+
+Here is a back trace of the segfaulting thread:
+
+
+#0  0x000003ffafed202c in swapcontext () from /lib64/libc.so.6
+#1  0x000002aa355c02ee in qemu_coroutine_new () at
+util/coroutine-ucontext.c:164
+#2  0x000002aa355bec34 in qemu_coroutine_create
+(address@hidden <blk_aio_read_entry>,
+address@hidden) at util/qemu-coroutine.c:76
+#3  0x000002aa35510262 in blk_aio_prwv (blk=0x2aa65fbefa0,
+offset=<optimized out>, bytes=<optimized out>, qiov=0x3ffa002a9c0,
+address@hidden <blk_aio_read_entry>, flags=0,
+cb=0x2aa35340a50 <virtio_blk_rw_complete>, opaque=0x3ffa002a960) at
+block/block-backend.c:1299
+#4  0x000002aa35510376 in blk_aio_preadv (blk=<optimized out>,
+offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
+cb=<optimized out>, opaque=0x3ffa002a960) at block/block-backend.c:1392
+#5  0x000002aa3534114e in submit_requests (niov=<optimized out>,
+num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>,
+blk=<optimized out>) at
+/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:372
+#6  virtio_blk_submit_multireq (blk=<optimized out>,
+address@hidden) at
+/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:402
+#7  0x000002aa353422e0 in virtio_blk_handle_vq (s=0x2aa6611e7d8,
+vq=0x3ffb0f5f010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
+#8  0x000002aa3536655a in virtio_queue_notify_aio_vq
+(address@hidden) at
+/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
+#9  0x000002aa35366cd6 in virtio_queue_notify_aio_vq (vq=0x3ffb0f5f010)
+at /usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1511
+#10 virtio_queue_host_notifier_aio_poll (opaque=0x3ffb0f5f078) at
+/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:2409
+#11 0x000002aa355a8ba4 in run_poll_handlers_once
+(address@hidden) at util/aio-posix.c:497
+#12 0x000002aa355a9b74 in run_poll_handlers (max_ns=<optimized out>,
+ctx=0x2aa65f99310) at util/aio-posix.c:534
+#13 try_poll_mode (blocking=true, ctx=0x2aa65f99310) at util/aio-posix.c:562
+#14 aio_poll (ctx=0x2aa65f99310, address@hidden) at
+util/aio-posix.c:602
+#15 0x000002aa353d2d0a in iothread_run (opaque=0x2aa65f990f0) at
+iothread.c:60
+#16 0x000003ffb0f07e82 in start_thread () from /lib64/libpthread.so.0
+#17 0x000003ffaff91596 in thread_start () from /lib64/libc.so.6
+I don't have much knowledge about i/o threads and the block layer code
+in QEMU, so I would like to report to the community about this issue.
+I believe this very similar to the bug that I reported upstream couple
+of days ago
+(
+https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04452.html
+).
+Any help would be greatly appreciated.
+
+Thanks
+Farhan
+
+On Thu, Mar 1, 2018 at 10:33 PM, Farhan Ali <address@hidden> wrote:
+>
+Hi,
+>
+>
+I have been noticing some segfaults for QEMU on s390x, and I have been
+>
+hitting this issue quite reliably (at least once in 10 runs of a test case).
+>
+The qemu version is 2.11.50, and I have systemd created coredumps
+>
+when this happens.
+Can you describe the test case or suggest how to reproduce it for us?
+
+Fam
+
+On 03/02/2018 01:13 AM, Fam Zheng wrote:
+On Thu, Mar 1, 2018 at 10:33 PM, Farhan Ali <address@hidden> wrote:
+Hi,
+
+I have been noticing some segfaults for QEMU on s390x, and I have been
+hitting this issue quite reliably (at least once in 10 runs of a test case).
+The qemu version is 2.11.50, and I have systemd created coredumps
+when this happens.
+Can you describe the test case or suggest how to reproduce it for us?
+
+Fam
+The test case is with a single guest, running a memory intensive
+workload. The guest has 8 vpcus and 4G of memory.
+Here is the qemu command line, if that helps:
+
+/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \
+-S -object
+secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes
+\
+-machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \
+-m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \
+-object iothread,id=iothread1 -object iothread,id=iothread2 -uuid
+b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \
+-display none -no-user-config -nodefaults -chardev
+socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait
+-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
+-no-shutdown \
+-boot strict=on -drive
+file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
+-device
+virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
+-drive
+file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native
+-device
+virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1
+-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device
+virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000
+-chardev pty,id=charconsole0 -device
+sclpconsole,chardev=charconsole0,id=console0 -device
+virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on
+Please let me know if I need to provide any other information.
+
+Thanks
+Farhan
+
+On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote:
+>
+Hi,
+>
+>
+I have been noticing some segfaults for QEMU on s390x, and I have been
+>
+hitting this issue quite reliably (at least once in 10 runs of a test case).
+>
+The qemu version is 2.11.50, and I have systemd created coredumps
+>
+when this happens.
+>
+>
+Here is a back trace of the segfaulting thread:
+The backtrace looks normal.
+
+Please post the QEMU command-line and the details of the segfault (which
+memory access faulted?).
+
+>
+#0  0x000003ffafed202c in swapcontext () from /lib64/libc.so.6
+>
+#1  0x000002aa355c02ee in qemu_coroutine_new () at
+>
+util/coroutine-ucontext.c:164
+>
+#2  0x000002aa355bec34 in qemu_coroutine_create
+>
+(address@hidden <blk_aio_read_entry>,
+>
+address@hidden) at util/qemu-coroutine.c:76
+>
+#3  0x000002aa35510262 in blk_aio_prwv (blk=0x2aa65fbefa0, offset=<optimized
+>
+out>, bytes=<optimized out>, qiov=0x3ffa002a9c0,
+>
+address@hidden <blk_aio_read_entry>, flags=0,
+>
+cb=0x2aa35340a50 <virtio_blk_rw_complete>, opaque=0x3ffa002a960) at
+>
+block/block-backend.c:1299
+>
+#4  0x000002aa35510376 in blk_aio_preadv (blk=<optimized out>,
+>
+offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
+>
+cb=<optimized out>, opaque=0x3ffa002a960) at block/block-backend.c:1392
+>
+#5  0x000002aa3534114e in submit_requests (niov=<optimized out>,
+>
+num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>,
+>
+blk=<optimized out>) at
+>
+/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:372
+>
+#6  virtio_blk_submit_multireq (blk=<optimized out>,
+>
+address@hidden) at
+>
+/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:402
+>
+#7  0x000002aa353422e0 in virtio_blk_handle_vq (s=0x2aa6611e7d8,
+>
+vq=0x3ffb0f5f010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
+>
+#8  0x000002aa3536655a in virtio_queue_notify_aio_vq
+>
+(address@hidden) at
+>
+/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
+>
+#9  0x000002aa35366cd6 in virtio_queue_notify_aio_vq (vq=0x3ffb0f5f010) at
+>
+/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1511
+>
+#10 virtio_queue_host_notifier_aio_poll (opaque=0x3ffb0f5f078) at
+>
+/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:2409
+>
+#11 0x000002aa355a8ba4 in run_poll_handlers_once
+>
+(address@hidden) at util/aio-posix.c:497
+>
+#12 0x000002aa355a9b74 in run_poll_handlers (max_ns=<optimized out>,
+>
+ctx=0x2aa65f99310) at util/aio-posix.c:534
+>
+#13 try_poll_mode (blocking=true, ctx=0x2aa65f99310) at util/aio-posix.c:562
+>
+#14 aio_poll (ctx=0x2aa65f99310, address@hidden) at
+>
+util/aio-posix.c:602
+>
+#15 0x000002aa353d2d0a in iothread_run (opaque=0x2aa65f990f0) at
+>
+iothread.c:60
+>
+#16 0x000003ffb0f07e82 in start_thread () from /lib64/libpthread.so.0
+>
+#17 0x000003ffaff91596 in thread_start () from /lib64/libc.so.6
+>
+>
+>
+I don't have much knowledge about i/o threads and the block layer code in
+>
+QEMU, so I would like to report to the community about this issue.
+>
+I believe this very similar to the bug that I reported upstream couple of
+>
+days ago
+>
+(
+https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04452.html
+).
+>
+>
+Any help would be greatly appreciated.
+>
+>
+Thanks
+>
+Farhan
+>
+signature.asc
+Description:
+PGP signature
+
+On 03/02/2018 04:23 AM, Stefan Hajnoczi wrote:
+On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote:
+Hi,
+
+I have been noticing some segfaults for QEMU on s390x, and I have been
+hitting this issue quite reliably (at least once in 10 runs of a test case).
+The qemu version is 2.11.50, and I have systemd created coredumps
+when this happens.
+
+Here is a back trace of the segfaulting thread:
+The backtrace looks normal.
+
+Please post the QEMU command-line and the details of the segfault (which
+memory access faulted?).
+I was able to create another crash today and here is the qemu comand line
+
+/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \
+-S -object
+secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes
+\
+-machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \
+-m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \
+-object iothread,id=iothread1 -object iothread,id=iothread2 -uuid
+b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \
+-display none -no-user-config -nodefaults -chardev
+socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait
+-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
+-no-shutdown \
+-boot strict=on -drive
+file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
+-device
+virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
+-drive
+file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native
+-device
+virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1
+-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device
+virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000
+-chardev pty,id=charconsole0 -device
+sclpconsole,chardev=charconsole0,id=console0 -device
+virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on
+This the latest back trace on the segfaulting thread, and it seems to
+segfault in swapcontext.
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
+
+
+This is the remaining back trace:
+
+#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
+#1  0x000002aa33b45566 in qemu_coroutine_new () at
+util/coroutine-ucontext.c:164
+#2  0x000002aa33b43eac in qemu_coroutine_create
+(address@hidden <blk_aio_write_entry>,
+address@hidden) at util/qemu-coroutine.c:76
+#3  0x000002aa33a954da in blk_aio_prwv (blk=0x2aa4f0efda0,
+offset=<optimized out>, bytes=<optimized out>, qiov=0x3ff74019080,
+address@hidden <blk_aio_write_entry>, flags=0,
+cb=0x2aa338c62e8 <virtio_blk_rw_complete>, opaque=0x3ff74019020) at
+block/block-backend.c:1299
+#4  0x000002aa33a9563e in blk_aio_pwritev (blk=<optimized out>,
+offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
+cb=<optimized out>, opaque=0x3ff74019020) at block/block-backend.c:1400
+#5  0x000002aa338c6a38 in submit_requests (niov=<optimized out>,
+num_reqs=1, start=<optimized out>, mrb=0x3ff831fe6e0, blk=<optimized
+out>) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:369
+#6  virtio_blk_submit_multireq (blk=<optimized out>,
+address@hidden) at
+/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:426
+#7  0x000002aa338c7b78 in virtio_blk_handle_vq (s=0x2aa4f2507c8,
+vq=0x3ff869df010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
+#8  0x000002aa338ebdf2 in virtio_queue_notify_aio_vq (vq=0x3ff869df010)
+at /usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
+#9  0x000002aa33b2df46 in aio_dispatch_handlers
+(address@hidden) at util/aio-posix.c:406
+#10 0x000002aa33b2eb50 in aio_poll (ctx=0x2aa4f0ca050,
+address@hidden) at util/aio-posix.c:692
+#11 0x000002aa33957f6a in iothread_run (opaque=0x2aa4f0c9630) at
+iothread.c:60
+#12 0x000003ff86987e82 in start_thread () from /lib64/libpthread.so.0
+#13 0x000003ff85a11596 in thread_start () from /lib64/libc.so.6
+Backtrace stopped: previous frame identical to this frame (corrupt stack?)
+
+On Fri, Mar 02, 2018 at 10:30:57AM -0500, Farhan Ali wrote:
+>
+>
+>
+On 03/02/2018 04:23 AM, Stefan Hajnoczi wrote:
+>
+> On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote:
+>
+> > Hi,
+>
+> >
+>
+> > I have been noticing some segfaults for QEMU on s390x, and I have been
+>
+> > hitting this issue quite reliably (at least once in 10 runs of a test
+>
+> > case).
+>
+> > The qemu version is 2.11.50, and I have systemd created coredumps
+>
+> > when this happens.
+>
+> >
+>
+> > Here is a back trace of the segfaulting thread:
+>
+> The backtrace looks normal.
+>
+>
+>
+> Please post the QEMU command-line and the details of the segfault (which
+>
+> memory access faulted?).
+>
+>
+>
+>
+>
+I was able to create another crash today and here is the qemu comand line
+>
+>
+/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \
+>
+-S -object
+>
+secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes
+>
+\
+>
+-machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \
+>
+-m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \
+>
+-object iothread,id=iothread1 -object iothread,id=iothread2 -uuid
+>
+b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \
+>
+-display none -no-user-config -nodefaults -chardev
+>
+socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait
+>
+>
+-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
+>
+\
+>
+-boot strict=on -drive
+>
+file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
+>
+-device
+>
+virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
+>
+-drive
+>
+file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native
+>
+-device
+>
+virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1
+>
+-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device
+>
+virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000
+>
+-chardev pty,id=charconsole0 -device
+>
+sclpconsole,chardev=charconsole0,id=console0 -device
+>
+virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on
+>
+>
+>
+This the latest back trace on the segfaulting thread, and it seems to
+>
+segfault in swapcontext.
+>
+>
+Program terminated with signal SIGSEGV, Segmentation fault.
+>
+#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
+Please include the following gdb output:
+
+  (gdb) disas swapcontext
+  (gdb) i r
+
+That way it's possible to see which instruction faulted and which
+registers were being accessed.
+
+>
+This is the remaining back trace:
+>
+>
+#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
+>
+#1  0x000002aa33b45566 in qemu_coroutine_new () at
+>
+util/coroutine-ucontext.c:164
+>
+#2  0x000002aa33b43eac in qemu_coroutine_create
+>
+(address@hidden <blk_aio_write_entry>,
+>
+address@hidden) at util/qemu-coroutine.c:76
+>
+#3  0x000002aa33a954da in blk_aio_prwv (blk=0x2aa4f0efda0, offset=<optimized
+>
+out>, bytes=<optimized out>, qiov=0x3ff74019080,
+>
+address@hidden <blk_aio_write_entry>, flags=0,
+>
+cb=0x2aa338c62e8 <virtio_blk_rw_complete>, opaque=0x3ff74019020) at
+>
+block/block-backend.c:1299
+>
+#4  0x000002aa33a9563e in blk_aio_pwritev (blk=<optimized out>,
+>
+offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
+>
+cb=<optimized out>, opaque=0x3ff74019020) at block/block-backend.c:1400
+>
+#5  0x000002aa338c6a38 in submit_requests (niov=<optimized out>, num_reqs=1,
+>
+start=<optimized out>, mrb=0x3ff831fe6e0, blk=<optimized out>) at
+>
+/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:369
+>
+#6  virtio_blk_submit_multireq (blk=<optimized out>,
+>
+address@hidden) at
+>
+/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:426
+>
+#7  0x000002aa338c7b78 in virtio_blk_handle_vq (s=0x2aa4f2507c8,
+>
+vq=0x3ff869df010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
+>
+#8  0x000002aa338ebdf2 in virtio_queue_notify_aio_vq (vq=0x3ff869df010) at
+>
+/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
+>
+#9  0x000002aa33b2df46 in aio_dispatch_handlers
+>
+(address@hidden) at util/aio-posix.c:406
+>
+#10 0x000002aa33b2eb50 in aio_poll (ctx=0x2aa4f0ca050,
+>
+address@hidden) at util/aio-posix.c:692
+>
+#11 0x000002aa33957f6a in iothread_run (opaque=0x2aa4f0c9630) at
+>
+iothread.c:60
+>
+#12 0x000003ff86987e82 in start_thread () from /lib64/libpthread.so.0
+>
+#13 0x000003ff85a11596 in thread_start () from /lib64/libc.so.6
+>
+Backtrace stopped: previous frame identical to this frame (corrupt stack?)
+>
+signature.asc
+Description:
+PGP signature
+
+On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
+Please include the following gdb output:
+
+   (gdb) disas swapcontext
+   (gdb) i r
+
+That way it's possible to see which instruction faulted and which
+registers were being accessed.
+here is the disas out for swapcontext, this is on a coredump with
+debugging symbols enabled for qemu. So the addresses from the previous
+dump is a little different.
+(gdb) disas swapcontext
+Dump of assembler code for function swapcontext:
+   0x000003ff90751fb8 <+0>:       lgr     %r1,%r2
+   0x000003ff90751fbc <+4>:       lgr     %r0,%r3
+   0x000003ff90751fc0 <+8>:       stfpc   248(%r1)
+   0x000003ff90751fc4 <+12>:      std     %f0,256(%r1)
+   0x000003ff90751fc8 <+16>:      std     %f1,264(%r1)
+   0x000003ff90751fcc <+20>:      std     %f2,272(%r1)
+   0x000003ff90751fd0 <+24>:      std     %f3,280(%r1)
+   0x000003ff90751fd4 <+28>:      std     %f4,288(%r1)
+   0x000003ff90751fd8 <+32>:      std     %f5,296(%r1)
+   0x000003ff90751fdc <+36>:      std     %f6,304(%r1)
+   0x000003ff90751fe0 <+40>:      std     %f7,312(%r1)
+   0x000003ff90751fe4 <+44>:      std     %f8,320(%r1)
+   0x000003ff90751fe8 <+48>:      std     %f9,328(%r1)
+   0x000003ff90751fec <+52>:      std     %f10,336(%r1)
+   0x000003ff90751ff0 <+56>:      std     %f11,344(%r1)
+   0x000003ff90751ff4 <+60>:      std     %f12,352(%r1)
+   0x000003ff90751ff8 <+64>:      std     %f13,360(%r1)
+   0x000003ff90751ffc <+68>:      std     %f14,368(%r1)
+   0x000003ff90752000 <+72>:      std     %f15,376(%r1)
+   0x000003ff90752004 <+76>:      slgr    %r2,%r2
+   0x000003ff90752008 <+80>:      stam    %a0,%a15,184(%r1)
+   0x000003ff9075200c <+84>:      stmg    %r0,%r15,56(%r1)
+   0x000003ff90752012 <+90>:      la      %r2,2
+   0x000003ff90752016 <+94>:      lgr     %r5,%r0
+   0x000003ff9075201a <+98>:      la      %r3,384(%r5)
+   0x000003ff9075201e <+102>:     la      %r4,384(%r1)
+   0x000003ff90752022 <+106>:     lghi    %r5,8
+   0x000003ff90752026 <+110>:     svc     175
+   0x000003ff90752028 <+112>:     lgr     %r5,%r0
+=> 0x000003ff9075202c <+116>:  lfpc    248(%r5)
+   0x000003ff90752030 <+120>:     ld      %f0,256(%r5)
+   0x000003ff90752034 <+124>:     ld      %f1,264(%r5)
+   0x000003ff90752038 <+128>:     ld      %f2,272(%r5)
+   0x000003ff9075203c <+132>:     ld      %f3,280(%r5)
+   0x000003ff90752040 <+136>:     ld      %f4,288(%r5)
+   0x000003ff90752044 <+140>:     ld      %f5,296(%r5)
+   0x000003ff90752048 <+144>:     ld      %f6,304(%r5)
+   0x000003ff9075204c <+148>:     ld      %f7,312(%r5)
+   0x000003ff90752050 <+152>:     ld      %f8,320(%r5)
+   0x000003ff90752054 <+156>:     ld      %f9,328(%r5)
+   0x000003ff90752058 <+160>:     ld      %f10,336(%r5)
+   0x000003ff9075205c <+164>:     ld      %f11,344(%r5)
+   0x000003ff90752060 <+168>:     ld      %f12,352(%r5)
+   0x000003ff90752064 <+172>:     ld      %f13,360(%r5)
+   0x000003ff90752068 <+176>:     ld      %f14,368(%r5)
+   0x000003ff9075206c <+180>:     ld      %f15,376(%r5)
+   0x000003ff90752070 <+184>:     lam     %a2,%a15,192(%r5)
+   0x000003ff90752074 <+188>:     lmg     %r0,%r15,56(%r5)
+   0x000003ff9075207a <+194>:     br      %r14
+End of assembler dump.
+
+(gdb) i r
+r0             0x0      0
+r1             0x3ff8fe7de40    4396165881408
+r2             0x0      0
+r3             0x3ff8fe7e1c0    4396165882304
+r4             0x3ff8fe7dfc0    4396165881792
+r5             0x0      0
+r6             0xffffffff88004880       18446744071696304256
+r7             0x3ff880009e0    4396033247712
+r8             0x27ff89000      10736930816
+r9             0x3ff88001460    4396033250400
+r10            0x1000   4096
+r11            0x1261be0        19274720
+r12            0x3ff88001e00    4396033252864
+r13            0x14d0bc0        21826496
+r14            0x1312ac8        19999432
+r15            0x3ff8fe7dc80    4396165880960
+pc             0x3ff9075202c    0x3ff9075202c <swapcontext+116>
+cc             0x2      2
+
+On 03/05/2018 07:45 PM, Farhan Ali wrote:
+>
+>
+>
+On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
+>
+> Please include the following gdb output:
+>
+>
+>
+> Â Â  (gdb) disas swapcontext
+>
+> Â Â  (gdb) i r
+>
+>
+>
+> That way it's possible to see which instruction faulted and which
+>
+> registers were being accessed.
+>
+>
+>
+here is the disas out for swapcontext, this is on a coredump with debugging
+>
+symbols enabled for qemu. So the addresses from the previous dump is a little
+>
+different.
+>
+>
+>
+(gdb) disas swapcontext
+>
+Dump of assembler code for function swapcontext:
+>
+Â Â  0x000003ff90751fb8 <+0>:Â Â Â  lgrÂ Â Â  %r1,%r2
+>
+Â Â  0x000003ff90751fbc <+4>:Â Â Â  lgrÂ Â Â  %r0,%r3
+>
+Â Â  0x000003ff90751fc0 <+8>:Â Â Â  stfpcÂ Â Â  248(%r1)
+>
+Â Â  0x000003ff90751fc4 <+12>:Â Â Â  stdÂ Â Â  %f0,256(%r1)
+>
+Â Â  0x000003ff90751fc8 <+16>:Â Â Â  stdÂ Â Â  %f1,264(%r1)
+>
+Â Â  0x000003ff90751fcc <+20>:Â Â Â  stdÂ Â Â  %f2,272(%r1)
+>
+Â Â  0x000003ff90751fd0 <+24>:Â Â Â  stdÂ Â Â  %f3,280(%r1)
+>
+Â Â  0x000003ff90751fd4 <+28>:Â Â Â  stdÂ Â Â  %f4,288(%r1)
+>
+Â Â  0x000003ff90751fd8 <+32>:Â Â Â  stdÂ Â Â  %f5,296(%r1)
+>
+Â Â  0x000003ff90751fdc <+36>:Â Â Â  stdÂ Â Â  %f6,304(%r1)
+>
+Â Â  0x000003ff90751fe0 <+40>:Â Â Â  stdÂ Â Â  %f7,312(%r1)
+>
+Â Â  0x000003ff90751fe4 <+44>:Â Â Â  stdÂ Â Â  %f8,320(%r1)
+>
+Â Â  0x000003ff90751fe8 <+48>:Â Â Â  stdÂ Â Â  %f9,328(%r1)
+>
+Â Â  0x000003ff90751fec <+52>:Â Â Â  stdÂ Â Â  %f10,336(%r1)
+>
+Â Â  0x000003ff90751ff0 <+56>:Â Â Â  stdÂ Â Â  %f11,344(%r1)
+>
+Â Â  0x000003ff90751ff4 <+60>:Â Â Â  stdÂ Â Â  %f12,352(%r1)
+>
+Â Â  0x000003ff90751ff8 <+64>:Â Â Â  stdÂ Â Â  %f13,360(%r1)
+>
+Â Â  0x000003ff90751ffc <+68>:Â Â Â  stdÂ Â Â  %f14,368(%r1)
+>
+Â Â  0x000003ff90752000 <+72>:Â Â Â  stdÂ Â Â  %f15,376(%r1)
+>
+Â Â  0x000003ff90752004 <+76>:Â Â Â  slgrÂ Â Â  %r2,%r2
+>
+Â Â  0x000003ff90752008 <+80>:Â Â Â  stamÂ Â Â  %a0,%a15,184(%r1)
+>
+Â Â  0x000003ff9075200c <+84>:Â Â Â  stmgÂ Â Â  %r0,%r15,56(%r1)
+>
+Â Â  0x000003ff90752012 <+90>:Â Â Â  laÂ Â Â  %r2,2
+>
+Â Â  0x000003ff90752016 <+94>:Â Â Â  lgrÂ Â Â  %r5,%r0
+>
+Â Â  0x000003ff9075201a <+98>:Â Â Â  laÂ Â Â  %r3,384(%r5)
+>
+Â Â  0x000003ff9075201e <+102>:Â Â Â  laÂ Â Â  %r4,384(%r1)
+>
+Â Â  0x000003ff90752022 <+106>:Â Â Â  lghiÂ Â Â  %r5,8
+>
+Â Â  0x000003ff90752026 <+110>:Â Â Â  svcÂ Â Â  175
+sys_rt_sigprocmask. r0 should not be changed by the system call.
+
+>
+Â Â  0x000003ff90752028 <+112>:Â Â Â  lgrÂ Â Â  %r5,%r0
+>
+=> 0x000003ff9075202c <+116>:Â Â Â  lfpcÂ Â Â  248(%r5)
+so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the 
+2nd parameter to this
+function). Now this is odd.
+
+>
+Â Â  0x000003ff90752030 <+120>:Â Â Â  ldÂ Â Â  %f0,256(%r5)
+>
+Â Â  0x000003ff90752034 <+124>:Â Â Â  ldÂ Â Â  %f1,264(%r5)
+>
+Â Â  0x000003ff90752038 <+128>:Â Â Â  ldÂ Â Â  %f2,272(%r5)
+>
+Â Â  0x000003ff9075203c <+132>:Â Â Â  ldÂ Â Â  %f3,280(%r5)
+>
+Â Â  0x000003ff90752040 <+136>:Â Â Â  ldÂ Â Â  %f4,288(%r5)
+>
+Â Â  0x000003ff90752044 <+140>:Â Â Â  ldÂ Â Â  %f5,296(%r5)
+>
+Â Â  0x000003ff90752048 <+144>:Â Â Â  ldÂ Â Â  %f6,304(%r5)
+>
+Â Â  0x000003ff9075204c <+148>:Â Â Â  ldÂ Â Â  %f7,312(%r5)
+>
+Â Â  0x000003ff90752050 <+152>:Â Â Â  ldÂ Â Â  %f8,320(%r5)
+>
+Â Â  0x000003ff90752054 <+156>:Â Â Â  ldÂ Â Â  %f9,328(%r5)
+>
+Â Â  0x000003ff90752058 <+160>:Â Â Â  ldÂ Â Â  %f10,336(%r5)
+>
+Â Â  0x000003ff9075205c <+164>:Â Â Â  ldÂ Â Â  %f11,344(%r5)
+>
+Â Â  0x000003ff90752060 <+168>:Â Â Â  ldÂ Â Â  %f12,352(%r5)
+>
+Â Â  0x000003ff90752064 <+172>:Â Â Â  ldÂ Â Â  %f13,360(%r5)
+>
+Â Â  0x000003ff90752068 <+176>:Â Â Â  ldÂ Â Â  %f14,368(%r5)
+>
+Â Â  0x000003ff9075206c <+180>:Â Â Â  ldÂ Â Â  %f15,376(%r5)
+>
+Â Â  0x000003ff90752070 <+184>:Â Â Â  lamÂ Â Â  %a2,%a15,192(%r5)
+>
+Â Â  0x000003ff90752074 <+188>:Â Â Â  lmgÂ Â Â  %r0,%r15,56(%r5)
+>
+Â Â  0x000003ff9075207a <+194>:Â Â Â  brÂ Â Â  %r14
+>
+End of assembler dump.
+>
+>
+(gdb) i r
+>
+r0Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
+>
+r1Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7de40Â Â Â  4396165881408
+>
+r2Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
+>
+r3Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7e1c0Â Â Â  4396165882304
+>
+r4Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dfc0Â Â Â  4396165881792
+>
+r5Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
+>
+r6Â Â Â Â Â Â Â Â Â Â Â Â  0xffffffff88004880Â Â Â  18446744071696304256
+>
+r7Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff880009e0Â Â Â  4396033247712
+>
+r8Â Â Â Â Â Â Â Â Â Â Â Â  0x27ff89000Â Â Â  10736930816
+>
+r9Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001460Â Â Â  4396033250400
+>
+r10Â Â Â Â Â Â Â Â Â Â Â  0x1000Â Â Â  4096
+>
+r11Â Â Â Â Â Â Â Â Â Â Â  0x1261be0Â Â Â  19274720
+>
+r12Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001e00Â Â Â  4396033252864
+>
+r13Â Â Â Â Â Â Â Â Â Â Â  0x14d0bc0Â Â Â  21826496
+>
+r14Â Â Â Â Â Â Â Â Â Â Â  0x1312ac8Â Â Â  19999432
+>
+r15Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dc80Â Â Â  4396165880960
+>
+pcÂ Â Â Â Â Â Â Â Â Â Â Â  0x3ff9075202cÂ Â Â  0x3ff9075202c <swapcontext+116>
+>
+ccÂ Â Â Â Â Â Â Â Â Â Â Â  0x2Â Â Â  2
+
+On 5 March 2018 at 18:54, Christian Borntraeger <address@hidden> wrote:
+>
+>
+>
+On 03/05/2018 07:45 PM, Farhan Ali wrote:
+>
+>    0x000003ff90752026 <+110>:    svc    175
+>
+>
+sys_rt_sigprocmask. r0 should not be changed by the system call.
+>
+>
+>    0x000003ff90752028 <+112>:    lgr    %r5,%r0
+>
+> => 0x000003ff9075202c <+116>:    lfpc    248(%r5)
+>
+>
+so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the
+>
+2nd parameter to this
+>
+function). Now this is odd.
+...particularly given that the only place we call swapcontext()
+the second parameter is always the address of a local variable
+and can't be 0...
+
+thanks
+-- PMM
+
+Do you happen to run with a recent host kernel that has 
+
+commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
+    s390: scrub registers on kernel entry and KVM exit
+
+
+
+
+
+Can you run with this on top
+diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
+index 13a133a6015c..d6dc0e5e8f74 100644
+--- a/arch/s390/kernel/entry.S
++++ b/arch/s390/kernel/entry.S
+@@ -426,13 +426,13 @@ ENTRY(system_call)
+        UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
+        BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
+        stmg    %r0,%r7,__PT_R0(%r11)
+-       # clear user controlled register to prevent speculative use
+-       xgr     %r0,%r0
+        mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
+        mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
+        mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
+        stg     %r14,__PT_FLAGS(%r11)
+ .Lsysc_do_svc:
++       # clear user controlled register to prevent speculative use
++       xgr     %r0,%r0
+        # load address of system call table
+        lg      %r10,__THREAD_sysc_table(%r13,%r12)
+        llgh    %r8,__PT_INT_CODE+2(%r11)
+
+
+To me it looks like that the critical section cleanup (interrupt during system 
+call entry) might
+save the registers again into ptregs but we have already zeroed out r0.
+This patch moves the clearing of r0 after sysc_do_svc, which should fix the 
+critical
+section cleanup.
+
+Adding Martin and Heiko. Will spin a patch.
+
+
+On 03/05/2018 07:54 PM, Christian Borntraeger wrote:
+>
+>
+>
+On 03/05/2018 07:45 PM, Farhan Ali wrote:
+>
+>
+>
+>
+>
+> On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
+>
+>> Please include the following gdb output:
+>
+>>
+>
+>> Â Â  (gdb) disas swapcontext
+>
+>> Â Â  (gdb) i r
+>
+>>
+>
+>> That way it's possible to see which instruction faulted and which
+>
+>> registers were being accessed.
+>
+>
+>
+>
+>
+> here is the disas out for swapcontext, this is on a coredump with debugging
+>
+> symbols enabled for qemu. So the addresses from the previous dump is a
+>
+> little different.
+>
+>
+>
+>
+>
+> (gdb) disas swapcontext
+>
+> Dump of assembler code for function swapcontext:
+>
+> Â Â  0x000003ff90751fb8 <+0>:Â Â Â  lgrÂ Â Â  %r1,%r2
+>
+> Â Â  0x000003ff90751fbc <+4>:Â Â Â  lgrÂ Â Â  %r0,%r3
+>
+> Â Â  0x000003ff90751fc0 <+8>:Â Â Â  stfpcÂ Â Â  248(%r1)
+>
+> Â Â  0x000003ff90751fc4 <+12>:Â Â Â  stdÂ Â Â  %f0,256(%r1)
+>
+> Â Â  0x000003ff90751fc8 <+16>:Â Â Â  stdÂ Â Â  %f1,264(%r1)
+>
+> Â Â  0x000003ff90751fcc <+20>:Â Â Â  stdÂ Â Â  %f2,272(%r1)
+>
+> Â Â  0x000003ff90751fd0 <+24>:Â Â Â  stdÂ Â Â  %f3,280(%r1)
+>
+> Â Â  0x000003ff90751fd4 <+28>:Â Â Â  stdÂ Â Â  %f4,288(%r1)
+>
+> Â Â  0x000003ff90751fd8 <+32>:Â Â Â  stdÂ Â Â  %f5,296(%r1)
+>
+> Â Â  0x000003ff90751fdc <+36>:Â Â Â  stdÂ Â Â  %f6,304(%r1)
+>
+> Â Â  0x000003ff90751fe0 <+40>:Â Â Â  stdÂ Â Â  %f7,312(%r1)
+>
+> Â Â  0x000003ff90751fe4 <+44>:Â Â Â  stdÂ Â Â  %f8,320(%r1)
+>
+> Â Â  0x000003ff90751fe8 <+48>:Â Â Â  stdÂ Â Â  %f9,328(%r1)
+>
+> Â Â  0x000003ff90751fec <+52>:Â Â Â  stdÂ Â Â  %f10,336(%r1)
+>
+> Â Â  0x000003ff90751ff0 <+56>:Â Â Â  stdÂ Â Â  %f11,344(%r1)
+>
+> Â Â  0x000003ff90751ff4 <+60>:Â Â Â  stdÂ Â Â  %f12,352(%r1)
+>
+> Â Â  0x000003ff90751ff8 <+64>:Â Â Â  stdÂ Â Â  %f13,360(%r1)
+>
+> Â Â  0x000003ff90751ffc <+68>:Â Â Â  stdÂ Â Â  %f14,368(%r1)
+>
+> Â Â  0x000003ff90752000 <+72>:Â Â Â  stdÂ Â Â  %f15,376(%r1)
+>
+> Â Â  0x000003ff90752004 <+76>:Â Â Â  slgrÂ Â Â  %r2,%r2
+>
+> Â Â  0x000003ff90752008 <+80>:Â Â Â  stamÂ Â Â  %a0,%a15,184(%r1)
+>
+> Â Â  0x000003ff9075200c <+84>:Â Â Â  stmgÂ Â Â  %r0,%r15,56(%r1)
+>
+> Â Â  0x000003ff90752012 <+90>:Â Â Â  laÂ Â Â  %r2,2
+>
+> Â Â  0x000003ff90752016 <+94>:Â Â Â  lgrÂ Â Â  %r5,%r0
+>
+> Â Â  0x000003ff9075201a <+98>:Â Â Â  laÂ Â Â  %r3,384(%r5)
+>
+> Â Â  0x000003ff9075201e <+102>:Â Â Â  laÂ Â Â  %r4,384(%r1)
+>
+> Â Â  0x000003ff90752022 <+106>:Â Â Â  lghiÂ Â Â  %r5,8
+>
+> Â Â  0x000003ff90752026 <+110>:Â Â Â  svcÂ Â Â  175
+>
+>
+sys_rt_sigprocmask. r0 should not be changed by the system call.
+>
+>
+> Â Â  0x000003ff90752028 <+112>:Â Â Â  lgrÂ Â Â  %r5,%r0
+>
+> => 0x000003ff9075202c <+116>:Â Â Â  lfpcÂ Â Â  248(%r5)
+>
+>
+so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the
+>
+2nd parameter to this
+>
+function). Now this is odd.
+>
+>
+> Â Â  0x000003ff90752030 <+120>:Â Â Â  ldÂ Â Â  %f0,256(%r5)
+>
+> Â Â  0x000003ff90752034 <+124>:Â Â Â  ldÂ Â Â  %f1,264(%r5)
+>
+> Â Â  0x000003ff90752038 <+128>:Â Â Â  ldÂ Â Â  %f2,272(%r5)
+>
+> Â Â  0x000003ff9075203c <+132>:Â Â Â  ldÂ Â Â  %f3,280(%r5)
+>
+> Â Â  0x000003ff90752040 <+136>:Â Â Â  ldÂ Â Â  %f4,288(%r5)
+>
+> Â Â  0x000003ff90752044 <+140>:Â Â Â  ldÂ Â Â  %f5,296(%r5)
+>
+> Â Â  0x000003ff90752048 <+144>:Â Â Â  ldÂ Â Â  %f6,304(%r5)
+>
+> Â Â  0x000003ff9075204c <+148>:Â Â Â  ldÂ Â Â  %f7,312(%r5)
+>
+> Â Â  0x000003ff90752050 <+152>:Â Â Â  ldÂ Â Â  %f8,320(%r5)
+>
+> Â Â  0x000003ff90752054 <+156>:Â Â Â  ldÂ Â Â  %f9,328(%r5)
+>
+> Â Â  0x000003ff90752058 <+160>:Â Â Â  ldÂ Â Â  %f10,336(%r5)
+>
+> Â Â  0x000003ff9075205c <+164>:Â Â Â  ldÂ Â Â  %f11,344(%r5)
+>
+> Â Â  0x000003ff90752060 <+168>:Â Â Â  ldÂ Â Â  %f12,352(%r5)
+>
+> Â Â  0x000003ff90752064 <+172>:Â Â Â  ldÂ Â Â  %f13,360(%r5)
+>
+> Â Â  0x000003ff90752068 <+176>:Â Â Â  ldÂ Â Â  %f14,368(%r5)
+>
+> Â Â  0x000003ff9075206c <+180>:Â Â Â  ldÂ Â Â  %f15,376(%r5)
+>
+> Â Â  0x000003ff90752070 <+184>:Â Â Â  lamÂ Â Â  %a2,%a15,192(%r5)
+>
+> Â Â  0x000003ff90752074 <+188>:Â Â Â  lmgÂ Â Â  %r0,%r15,56(%r5)
+>
+> Â Â  0x000003ff9075207a <+194>:Â Â Â  brÂ Â Â  %r14
+>
+> End of assembler dump.
+>
+>
+>
+> (gdb) i r
+>
+> r0Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
+>
+> r1Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7de40Â Â Â  4396165881408
+>
+> r2Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
+>
+> r3Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7e1c0Â Â Â  4396165882304
+>
+> r4Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dfc0Â Â Â  4396165881792
+>
+> r5Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
+>
+> r6Â Â Â Â Â Â Â Â Â Â Â Â  0xffffffff88004880Â Â Â  18446744071696304256
+>
+> r7Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff880009e0Â Â Â  4396033247712
+>
+> r8Â Â Â Â Â Â Â Â Â Â Â Â  0x27ff89000Â Â Â  10736930816
+>
+> r9Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001460Â Â Â  4396033250400
+>
+> r10Â Â Â Â Â Â Â Â Â Â Â  0x1000Â Â Â  4096
+>
+> r11Â Â Â Â Â Â Â Â Â Â Â  0x1261be0Â Â Â  19274720
+>
+> r12Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001e00Â Â Â  4396033252864
+>
+> r13Â Â Â Â Â Â Â Â Â Â Â  0x14d0bc0Â Â Â  21826496
+>
+> r14Â Â Â Â Â Â Â Â Â Â Â  0x1312ac8Â Â Â  19999432
+>
+> r15Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dc80Â Â Â  4396165880960
+>
+> pcÂ Â Â Â Â Â Â Â Â Â Â Â  0x3ff9075202cÂ Â Â  0x3ff9075202c <swapcontext+116>
+>
+> ccÂ Â Â Â Â Â Â Â Â Â Â Â  0x2Â Â Â  2
+
+On 03/05/2018 02:08 PM, Christian Borntraeger wrote:
+Do you happen to run with a recent host kernel that has
+
+commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
+     s390: scrub registers on kernel entry and KVM exit
+Yes.
+Can you run with this on top
+diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
+index 13a133a6015c..d6dc0e5e8f74 100644
+--- a/arch/s390/kernel/entry.S
++++ b/arch/s390/kernel/entry.S
+@@ -426,13 +426,13 @@ ENTRY(system_call)
+         UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
+         BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
+         stmg    %r0,%r7,__PT_R0(%r11)
+-       # clear user controlled register to prevent speculative use
+-       xgr     %r0,%r0
+         mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
+         mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
+         mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
+         stg     %r14,__PT_FLAGS(%r11)
+  .Lsysc_do_svc:
++       # clear user controlled register to prevent speculative use
++       xgr     %r0,%r0
+         # load address of system call table
+         lg      %r10,__THREAD_sysc_table(%r13,%r12)
+         llgh    %r8,__PT_INT_CODE+2(%r11)
+
+
+To me it looks like that the critical section cleanup (interrupt during system 
+call entry) might
+save the registers again into ptregs but we have already zeroed out r0.
+This patch moves the clearing of r0 after sysc_do_svc, which should fix the 
+critical
+section cleanup.
+Okay I will run with this.
+Adding Martin and Heiko. Will spin a patch.
+
+
+On 03/05/2018 07:54 PM, Christian Borntraeger wrote:
+On 03/05/2018 07:45 PM, Farhan Ali wrote:
+On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
+Please include the following gdb output:
+
+ Â Â  (gdb) disas swapcontext
+ Â Â  (gdb) i r
+
+That way it's possible to see which instruction faulted and which
+registers were being accessed.
+here is the disas out for swapcontext, this is on a coredump with debugging 
+symbols enabled for qemu. So the addresses from the previous dump is a little 
+different.
+
+
+(gdb) disas swapcontext
+Dump of assembler code for function swapcontext:
+ Â Â  0x000003ff90751fb8 <+0>:Â Â Â  lgrÂ Â Â  %r1,%r2
+ Â Â  0x000003ff90751fbc <+4>:Â Â Â  lgrÂ Â Â  %r0,%r3
+ Â Â  0x000003ff90751fc0 <+8>:Â Â Â  stfpcÂ Â Â  248(%r1)
+ Â Â  0x000003ff90751fc4 <+12>:Â Â Â  stdÂ Â Â  %f0,256(%r1)
+ Â Â  0x000003ff90751fc8 <+16>:Â Â Â  stdÂ Â Â  %f1,264(%r1)
+ Â Â  0x000003ff90751fcc <+20>:Â Â Â  stdÂ Â Â  %f2,272(%r1)
+ Â Â  0x000003ff90751fd0 <+24>:Â Â Â  stdÂ Â Â  %f3,280(%r1)
+ Â Â  0x000003ff90751fd4 <+28>:Â Â Â  stdÂ Â Â  %f4,288(%r1)
+ Â Â  0x000003ff90751fd8 <+32>:Â Â Â  stdÂ Â Â  %f5,296(%r1)
+ Â Â  0x000003ff90751fdc <+36>:Â Â Â  stdÂ Â Â  %f6,304(%r1)
+ Â Â  0x000003ff90751fe0 <+40>:Â Â Â  stdÂ Â Â  %f7,312(%r1)
+ Â Â  0x000003ff90751fe4 <+44>:Â Â Â  stdÂ Â Â  %f8,320(%r1)
+ Â Â  0x000003ff90751fe8 <+48>:Â Â Â  stdÂ Â Â  %f9,328(%r1)
+ Â Â  0x000003ff90751fec <+52>:Â Â Â  stdÂ Â Â  %f10,336(%r1)
+ Â Â  0x000003ff90751ff0 <+56>:Â Â Â  stdÂ Â Â  %f11,344(%r1)
+ Â Â  0x000003ff90751ff4 <+60>:Â Â Â  stdÂ Â Â  %f12,352(%r1)
+ Â Â  0x000003ff90751ff8 <+64>:Â Â Â  stdÂ Â Â  %f13,360(%r1)
+ Â Â  0x000003ff90751ffc <+68>:Â Â Â  stdÂ Â Â  %f14,368(%r1)
+ Â Â  0x000003ff90752000 <+72>:Â Â Â  stdÂ Â Â  %f15,376(%r1)
+ Â Â  0x000003ff90752004 <+76>:Â Â Â  slgrÂ Â Â  %r2,%r2
+ Â Â  0x000003ff90752008 <+80>:Â Â Â  stamÂ Â Â  %a0,%a15,184(%r1)
+ Â Â  0x000003ff9075200c <+84>:Â Â Â  stmgÂ Â Â  %r0,%r15,56(%r1)
+ Â Â  0x000003ff90752012 <+90>:Â Â Â  laÂ Â Â  %r2,2
+ Â Â  0x000003ff90752016 <+94>:Â Â Â  lgrÂ Â Â  %r5,%r0
+ Â Â  0x000003ff9075201a <+98>:Â Â Â  laÂ Â Â  %r3,384(%r5)
+ Â Â  0x000003ff9075201e <+102>:Â Â Â  laÂ Â Â  %r4,384(%r1)
+ Â Â  0x000003ff90752022 <+106>:Â Â Â  lghiÂ Â Â  %r5,8
+ Â Â  0x000003ff90752026 <+110>:Â Â Â  svcÂ Â Â  175
+sys_rt_sigprocmask. r0 should not be changed by the system call.
+Â Â  0x000003ff90752028 <+112>:Â Â Â  lgrÂ Â Â  %r5,%r0
+=> 0x000003ff9075202c <+116>:Â Â Â  lfpcÂ Â Â  248(%r5)
+so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the 
+2nd parameter to this
+function). Now this is odd.
+Â Â  0x000003ff90752030 <+120>:Â Â Â  ldÂ Â Â  %f0,256(%r5)
+ Â Â  0x000003ff90752034 <+124>:Â Â Â  ldÂ Â Â  %f1,264(%r5)
+ Â Â  0x000003ff90752038 <+128>:Â Â Â  ldÂ Â Â  %f2,272(%r5)
+ Â Â  0x000003ff9075203c <+132>:Â Â Â  ldÂ Â Â  %f3,280(%r5)
+ Â Â  0x000003ff90752040 <+136>:Â Â Â  ldÂ Â Â  %f4,288(%r5)
+ Â Â  0x000003ff90752044 <+140>:Â Â Â  ldÂ Â Â  %f5,296(%r5)
+ Â Â  0x000003ff90752048 <+144>:Â Â Â  ldÂ Â Â  %f6,304(%r5)
+ Â Â  0x000003ff9075204c <+148>:Â Â Â  ldÂ Â Â  %f7,312(%r5)
+ Â Â  0x000003ff90752050 <+152>:Â Â Â  ldÂ Â Â  %f8,320(%r5)
+ Â Â  0x000003ff90752054 <+156>:Â Â Â  ldÂ Â Â  %f9,328(%r5)
+ Â Â  0x000003ff90752058 <+160>:Â Â Â  ldÂ Â Â  %f10,336(%r5)
+ Â Â  0x000003ff9075205c <+164>:Â Â Â  ldÂ Â Â  %f11,344(%r5)
+ Â Â  0x000003ff90752060 <+168>:Â Â Â  ldÂ Â Â  %f12,352(%r5)
+ Â Â  0x000003ff90752064 <+172>:Â Â Â  ldÂ Â Â  %f13,360(%r5)
+ Â Â  0x000003ff90752068 <+176>:Â Â Â  ldÂ Â Â  %f14,368(%r5)
+ Â Â  0x000003ff9075206c <+180>:Â Â Â  ldÂ Â Â  %f15,376(%r5)
+ Â Â  0x000003ff90752070 <+184>:Â Â Â  lamÂ Â Â  %a2,%a15,192(%r5)
+ Â Â  0x000003ff90752074 <+188>:Â Â Â  lmgÂ Â Â  %r0,%r15,56(%r5)
+ Â Â  0x000003ff9075207a <+194>:Â Â Â  brÂ Â Â  %r14
+End of assembler dump.
+
+(gdb) i r
+r0Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
+r1Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7de40Â Â Â  4396165881408
+r2Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
+r3Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7e1c0Â Â Â  4396165882304
+r4Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dfc0Â Â Â  4396165881792
+r5Â Â Â Â Â Â Â Â Â Â Â Â  0x0Â Â Â  0
+r6Â Â Â Â Â Â Â Â Â Â Â Â  0xffffffff88004880Â Â Â  18446744071696304256
+r7Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff880009e0Â Â Â  4396033247712
+r8Â Â Â Â Â Â Â Â Â Â Â Â  0x27ff89000Â Â Â  10736930816
+r9Â Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001460Â Â Â  4396033250400
+r10Â Â Â Â Â Â Â Â Â Â Â  0x1000Â Â Â  4096
+r11Â Â Â Â Â Â Â Â Â Â Â  0x1261be0Â Â Â  19274720
+r12Â Â Â Â Â Â Â Â Â Â Â  0x3ff88001e00Â Â Â  4396033252864
+r13Â Â Â Â Â Â Â Â Â Â Â  0x14d0bc0Â Â Â  21826496
+r14Â Â Â Â Â Â Â Â Â Â Â  0x1312ac8Â Â Â  19999432
+r15Â Â Â Â Â Â Â Â Â Â Â  0x3ff8fe7dc80Â Â Â  4396165880960
+pcÂ Â Â Â Â Â Â Â Â Â Â Â  0x3ff9075202cÂ Â Â  0x3ff9075202c <swapcontext+116>
+ccÂ Â Â Â Â Â Â Â Â Â Â Â  0x2Â Â Â  2
+
+On Mon, 5 Mar 2018 20:08:45 +0100
+Christian Borntraeger <address@hidden> wrote:
+
+>
+Do you happen to run with a recent host kernel that has
+>
+>
+commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
+>
+s390: scrub registers on kernel entry and KVM exit
+>
+>
+Can you run with this on top
+>
+diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
+>
+index 13a133a6015c..d6dc0e5e8f74 100644
+>
+--- a/arch/s390/kernel/entry.S
+>
++++ b/arch/s390/kernel/entry.S
+>
+@@ -426,13 +426,13 @@ ENTRY(system_call)
+>
+UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
+>
+BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
+>
+stmg    %r0,%r7,__PT_R0(%r11)
+>
+-       # clear user controlled register to prevent speculative use
+>
+-       xgr     %r0,%r0
+>
+mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
+>
+mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
+>
+mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
+>
+stg     %r14,__PT_FLAGS(%r11)
+>
+.Lsysc_do_svc:
+>
++       # clear user controlled register to prevent speculative use
+>
++       xgr     %r0,%r0
+>
+# load address of system call table
+>
+lg      %r10,__THREAD_sysc_table(%r13,%r12)
+>
+llgh    %r8,__PT_INT_CODE+2(%r11)
+>
+>
+>
+To me it looks like that the critical section cleanup (interrupt during
+>
+system call entry) might
+>
+save the registers again into ptregs but we have already zeroed out r0.
+>
+This patch moves the clearing of r0 after sysc_do_svc, which should fix the
+>
+critical
+>
+section cleanup.
+>
+>
+Adding Martin and Heiko. Will spin a patch.
+Argh, yes. Thanks Chrisitan, this is it. I have been searching for the bug
+for days now. The point is that if the system call handler is interrupted
+after the xgr but before .Lsysc_do_svc the code at .Lcleanup_system_call 
+repeats the stmg for %r0-%r7 but now %r0 is already zero.
+
+Please commit a patch for this and I'll will queue it up immediately.
+
+-- 
+blue skies,
+   Martin.
+
+"Reality continues to ruin my life." - Calvin.
+
+On 03/06/2018 01:34 AM, Martin Schwidefsky wrote:
+On Mon, 5 Mar 2018 20:08:45 +0100
+Christian Borntraeger <address@hidden> wrote:
+Do you happen to run with a recent host kernel that has
+
+commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
+     s390: scrub registers on kernel entry and KVM exit
+
+Can you run with this on top
+diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
+index 13a133a6015c..d6dc0e5e8f74 100644
+--- a/arch/s390/kernel/entry.S
++++ b/arch/s390/kernel/entry.S
+@@ -426,13 +426,13 @@ ENTRY(system_call)
+         UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
+         BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
+         stmg    %r0,%r7,__PT_R0(%r11)
+-       # clear user controlled register to prevent speculative use
+-       xgr     %r0,%r0
+         mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
+         mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
+         mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
+         stg     %r14,__PT_FLAGS(%r11)
+  .Lsysc_do_svc:
++       # clear user controlled register to prevent speculative use
++       xgr     %r0,%r0
+         # load address of system call table
+         lg      %r10,__THREAD_sysc_table(%r13,%r12)
+         llgh    %r8,__PT_INT_CODE+2(%r11)
+
+
+To me it looks like that the critical section cleanup (interrupt during system 
+call entry) might
+save the registers again into ptregs but we have already zeroed out r0.
+This patch moves the clearing of r0 after sysc_do_svc, which should fix the 
+critical
+section cleanup.
+
+Adding Martin and Heiko. Will spin a patch.
+Argh, yes. Thanks Chrisitan, this is it. I have been searching for the bug
+for days now. The point is that if the system call handler is interrupted
+after the xgr but before .Lsysc_do_svc the code at .Lcleanup_system_call
+repeats the stmg for %r0-%r7 but now %r0 is already zero.
+
+Please commit a patch for this and I'll will queue it up immediately.
+This patch does fix the QEMU crash. I haven't seen the crash after
+running the test case for more than a day. Thanks to everyone for taking
+a look at this problem :)
+Thanks
+Farhan
+
diff --git a/results/classifier/004/other/23300761 b/results/classifier/004/other/23300761
new file mode 100644
index 00000000..e546903c
--- /dev/null
+++ b/results/classifier/004/other/23300761
@@ -0,0 +1,321 @@
+other: 0.963
+assembly: 0.958
+semantic: 0.950
+device: 0.932
+boot: 0.929
+instruction: 0.929
+socket: 0.927
+vnc: 0.926
+graphic: 0.924
+KVM: 0.897
+network: 0.879
+mistranslation: 0.770
+
+[Qemu-devel] [BUG] 216 Alerts reported by LGTM for QEMU (some might be release critical)
+
+Hi,
+LGTM reports 16 errors, 81 warnings and 119 recommendations:
+https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
+.
+Some of them are already know (wrong format strings), others look like
+real errors:
+- several multiplication results which don't work as they should in
+contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
+32 bit!),Â  target/i386/translate.c and other files
+- potential buffer overflows in gdbstub.c and other files
+I am afraid that the overflows in the block code are release critical,
+maybe that in target/i386/translate.c and other errors, too.
+About half of the alerts are issues which can be fixed later.
+
+Regards
+
+Stefan
+
+On 13/07/19 19:46, Stefan Weil wrote:
+>
+>
+LGTM reports 16 errors, 81 warnings and 119 recommendations:
+>
+https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
+.
+>
+>
+Some of them are already know (wrong format strings), others look like
+>
+real errors:
+>
+>
+- several multiplication results which don't work as they should in
+>
+contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
+>
+32 bit!),Â  target/i386/translate.c and other files
+m->nb_clusters here is limited by s->l2_slice_size (see for example
+handle_alloc) so I wouldn't be surprised if this is a false positive.  I
+couldn't find this particular multiplication in Coverity, but it has
+about 250 issues marked as intentional or false positive so there's
+probably a lot of overlap with what LGTM found.
+
+Paolo
+
+Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
+>
+On 13/07/19 19:46, Stefan Weil wrote:
+>
+> LGTM reports 16 errors, 81 warnings and 119 recommendations:
+>
+>
+https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
+.
+>
+>
+>
+> Some of them are already known (wrong format strings), others look like
+>
+> real errors:
+>
+>
+>
+> - several multiplication results which don't work as they should in
+>
+> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
+>
+> 32 bit!),Â  target/i386/translate.c and other files
+>
+m->nb_clusters here is limited by s->l2_slice_size (see for example
+>
+handle_alloc) so I wouldn't be surprised if this is a false positive.  I
+>
+couldn't find this particular multiplication in Coverity, but it has
+>
+about 250 issues marked as intentional or false positive so there's
+>
+probably a lot of overlap with what LGTM found.
+>
+>
+Paolo
+>
+From other projects I know that there is a certain overlap between the
+results from Coverity Scan an LGTM, but it is good to have both
+analyzers, and the results from LGTM are typically quite reliable.
+
+Even if we know that there is no multiplication overflow, the code could
+be modified. Either the assigned value should use the same data type as
+the factors (possible when there is never an overflow, avoids a size
+extension), or the multiplication could use the larger data type by
+adding a type cast to one of the factors (then an overflow cannot
+happen, static code analysers and human reviewers have an easier job,
+but the multiplication costs more time).
+
+Stefan
+
+Am 14.07.2019 um 15:28 hat Stefan Weil geschrieben:
+>
+Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
+>
+> On 13/07/19 19:46, Stefan Weil wrote:
+>
+>> LGTM reports 16 errors, 81 warnings and 119 recommendations:
+>
+>>
+https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
+.
+>
+>>
+>
+>> Some of them are already known (wrong format strings), others look like
+>
+>> real errors:
+>
+>>
+>
+>> - several multiplication results which don't work as they should in
+>
+>> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
+>
+>> 32 bit!),Â  target/i386/translate.c and other files
+Request sizes are limited to 32 bit in the generic block layer before
+they are even passed to the individual block drivers, so most if not all
+of these are going to be false positives.
+
+>
+> m->nb_clusters here is limited by s->l2_slice_size (see for example
+>
+> handle_alloc) so I wouldn't be surprised if this is a false positive.  I
+>
+> couldn't find this particular multiplication in Coverity, but it has
+>
+> about 250 issues marked as intentional or false positive so there's
+>
+> probably a lot of overlap with what LGTM found.
+>
+>
+>
+> Paolo
+>
+>
+From other projects I know that there is a certain overlap between the
+>
+results from Coverity Scan an LGTM, but it is good to have both
+>
+analyzers, and the results from LGTM are typically quite reliable.
+>
+>
+Even if we know that there is no multiplication overflow, the code could
+>
+be modified. Either the assigned value should use the same data type as
+>
+the factors (possible when there is never an overflow, avoids a size
+>
+extension), or the multiplication could use the larger data type by
+>
+adding a type cast to one of the factors (then an overflow cannot
+>
+happen, static code analysers and human reviewers have an easier job,
+>
+but the multiplication costs more time).
+But if you look at the code we're talking about, you see that it's
+complaining about things where being more explicit would make things
+less readable.
+
+For example, if complains about the multiplication in this line:
+
+    s->file_size += n * s->header.cluster_size;
+
+We know that n * s->header.cluster_size fits in 32 bits, but
+s->file_size is 64 bits (and has to be 64 bits). Do you really think we
+should introduce another uint32_t variable to store the intermediate
+result? And if we cast n to uint64_t, not only might the multiplication
+cost more time, but also human readers would wonder why the result could
+become larger than 32 bits. So a cast would be misleading.
+
+
+It also complains about this line:
+
+    ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size,
+                        PREALLOC_MODE_OFF, &local_err);
+
+Here, we don't even assign the result to a 64 bit variable, but just
+pass it to a function which takes a 64 bit parameter. Again, I don't
+think introducing additional variables for the intermediate result or
+adding casts would be an improvement of the situation.
+
+
+So I don't think this is a good enough tool to base our code on what it
+does and doesn't understand. It would have too much of a negative impact
+on our code. We'd rather need a way to mark false positives as such and
+move on without changing the code in such cases.
+
+Kevin
+
+On Sat, 13 Jul 2019 at 18:46, Stefan Weil <address@hidden> wrote:
+>
+LGTM reports 16 errors, 81 warnings and 119 recommendations:
+>
+https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
+.
+I had a look at some of these before, but mostly I came
+to the conclusion that it wasn't worth trying to put the
+effort into keeping up with the site because they didn't
+seem to provide any useful way to mark things as false
+positives. Coverity has its flaws but at least you can do
+that kind of thing in its UI (it runs at about a 33% fp
+rate, I think.) "Analyzer thinks this multiply can overflow
+but in fact it's not possible" is quite a common false
+positive cause...
+
+Anyway, if you want to fish out specific issues, analyse
+whether they're false positive or real, and report them
+to the mailing list as followups to the patches which
+introduced the issue, that's probably the best way for
+us to make use of this analyzer. (That is essentially
+what I do for coverity.)
+
+thanks
+-- PMM
+
+Am 14.07.2019 um 19:30 schrieb Peter Maydell:
+[...]
+>
+"Analyzer thinks this multiply can overflow
+>
+but in fact it's not possible" is quite a common false
+>
+positive cause...
+The analysers don't complain because a multiply can overflow.
+
+They complain because the code indicates that a larger result is
+expected, for example uint64_t = uint32_t * uint32_t. They would not
+complain for the same multiplication if it were assigned to a uint32_t.
+
+So there is a simple solution to write the code in a way which avoids
+false positives...
+
+Stefan
+
+Stefan Weil <address@hidden> writes:
+
+>
+Am 14.07.2019 um 19:30 schrieb Peter Maydell:
+>
+[...]
+>
+> "Analyzer thinks this multiply can overflow
+>
+> but in fact it's not possible" is quite a common false
+>
+> positive cause...
+>
+>
+>
+The analysers don't complain because a multiply can overflow.
+>
+>
+They complain because the code indicates that a larger result is
+>
+expected, for example uint64_t = uint32_t * uint32_t. They would not
+>
+complain for the same multiplication if it were assigned to a uint32_t.
+I agree this is an anti-pattern.
+
+>
+So there is a simple solution to write the code in a way which avoids
+>
+false positives...
+You wrote elsewhere in this thread:
+
+    Either the assigned value should use the same data type as the
+    factors (possible when there is never an overflow, avoids a size
+    extension), or the multiplication could use the larger data type by
+    adding a type cast to one of the factors (then an overflow cannot
+    happen, static code analysers and human reviewers have an easier
+    job, but the multiplication costs more time).
+
+Makes sense to me.
+
+On 7/14/19 5:30 PM, Peter Maydell wrote:
+>
+I had a look at some of these before, but mostly I came
+>
+to the conclusion that it wasn't worth trying to put the
+>
+effort into keeping up with the site because they didn't
+>
+seem to provide any useful way to mark things as false
+>
+positives. Coverity has its flaws but at least you can do
+>
+that kind of thing in its UI (it runs at about a 33% fp
+>
+rate, I think.)
+Yes, LGTM wants you to modify the source code with
+
+  /* lgtm [cpp/some-warning-code] */
+
+and on the same line as the reported problem.  Which is mildly annoying in that
+you're definitely committing to LGTM in the long term.  Also for any
+non-trivial bit of code, it will almost certainly run over 80 columns.
+
+
+r~
+
diff --git a/results/classifier/004/other/23448582 b/results/classifier/004/other/23448582
new file mode 100644
index 00000000..1d086d26
--- /dev/null
+++ b/results/classifier/004/other/23448582
@@ -0,0 +1,273 @@
+other: 0.990
+semantic: 0.987
+graphic: 0.987
+assembly: 0.986
+instruction: 0.983
+socket: 0.982
+mistranslation: 0.982
+device: 0.979
+network: 0.973
+vnc: 0.973
+boot: 0.967
+KVM: 0.958
+
+[BUG REPORT] cxl process in infinity loop
+
+Hi, all
+
+When I did the cxl memory hot-plug test on QEMU, I accidentally connected 
+two memdev to the same downstream port, the command like below:
+
+>
+-object memory-backend-ram,size=262144k,share=on,id=vmem0 \
+>
+-object memory-backend-ram,size=262144k,share=on,id=vmem1 \
+>
+-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
+>
+-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
+>
+-device cxl-upstream,bus=root_port0,id=us0 \
+>
+-device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \
+>
+-device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \
+same downstream port but has different slot!
+
+>
+-device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \
+>
+-device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \
+>
+-M
+>
+cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k
+>
+\
+There is no error occurred when vm start, but when I executed the âcxl listâ 
+command to view
+the CXL objects info, the process can not end properly.
+
+Then I used strace to trace the process, I found that the process is in 
+infinity loop:
+# strace cxl list
+......
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+write(3, "1\n\0", 3)                    = 3
+close(3)                                = 0
+access("/run/udev/queue", F_OK)         = 0
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+write(3, "1\n\0", 3)                    = 3
+close(3)                                = 0
+access("/run/udev/queue", F_OK)         = 0
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+write(3, "1\n\0", 3)                    = 3
+close(3)                                = 0
+access("/run/udev/queue", F_OK)         = 0
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+write(3, "1\n\0", 3)                    = 3
+close(3)                                = 0
+access("/run/udev/queue", F_OK)         = 0
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+write(3, "1\n\0", 3)                    = 3
+close(3)                                = 0
+access("/run/udev/queue", F_OK)         = 0
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+write(3, "1\n\0", 3)                    = 3
+close(3)                                = 0
+access("/run/udev/queue", F_OK)         = 0
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+write(3, "1\n\0", 3)                    = 3
+close(3)                                = 0
+access("/run/udev/queue", F_OK)         = 0
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+write(3, "1\n\0", 3)                    = 3
+close(3)                                = 0
+access("/run/udev/queue", F_OK)         = 0
+
+[Environment]:
+linux: V6.10-rc3
+QEMU: V9.0.0
+ndctl: v79
+
+I know this is because of the wrong use of the QEMU command, but I think we 
+should 
+be aware of this error in one of the QEMU, OS or ndctl side at least.
+
+Thanks
+Xingtao
+
+On Tue, 2 Jul 2024 00:30:06 +0000
+"Xingtao Yao (Fujitsu)" <yaoxt.fnst@fujitsu.com> wrote:
+
+>
+Hi, all
+>
+>
+When I did the cxl memory hot-plug test on QEMU, I accidentally connected
+>
+two memdev to the same downstream port, the command like below:
+>
+>
+> -object memory-backend-ram,size=262144k,share=on,id=vmem0 \
+>
+> -object memory-backend-ram,size=262144k,share=on,id=vmem1 \
+>
+> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
+>
+> -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
+>
+> -device cxl-upstream,bus=root_port0,id=us0 \
+>
+> -device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \
+>
+> -device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \
+>
+same downstream port but has different slot!
+>
+>
+> -device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \
+>
+> -device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \
+>
+> -M
+>
+> cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k
+>
+>  \
+>
+>
+There is no error occurred when vm start, but when I executed the âcxl listâ
+>
+command to view
+>
+the CXL objects info, the process can not end properly.
+I'd be happy to look preventing this on QEMU side if you send one,
+but in general there are are lots of ways to shoot yourself in the
+foot with CXL and PCI device emulation in QEMU so I'm not going
+to rush to solve this specific one.
+
+Likewise, some hardening in kernel / userspace probably makes sense but
+this is a non compliant switch so priority of a fix is probably fairly low.
+
+Jonathan
+
+>
+>
+Then I used strace to trace the process, I found that the process is in
+>
+infinity loop:
+>
+# strace cxl list
+>
+......
+>
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+>
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+>
+write(3, "1\n\0", 3)                    = 3
+>
+close(3)                                = 0
+>
+access("/run/udev/queue", F_OK)         = 0
+>
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+>
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+>
+write(3, "1\n\0", 3)                    = 3
+>
+close(3)                                = 0
+>
+access("/run/udev/queue", F_OK)         = 0
+>
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+>
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+>
+write(3, "1\n\0", 3)                    = 3
+>
+close(3)                                = 0
+>
+access("/run/udev/queue", F_OK)         = 0
+>
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+>
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+>
+write(3, "1\n\0", 3)                    = 3
+>
+close(3)                                = 0
+>
+access("/run/udev/queue", F_OK)         = 0
+>
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+>
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+>
+write(3, "1\n\0", 3)                    = 3
+>
+close(3)                                = 0
+>
+access("/run/udev/queue", F_OK)         = 0
+>
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+>
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+>
+write(3, "1\n\0", 3)                    = 3
+>
+close(3)                                = 0
+>
+access("/run/udev/queue", F_OK)         = 0
+>
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+>
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+>
+write(3, "1\n\0", 3)                    = 3
+>
+close(3)                                = 0
+>
+access("/run/udev/queue", F_OK)         = 0
+>
+clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
+>
+openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3
+>
+write(3, "1\n\0", 3)                    = 3
+>
+close(3)                                = 0
+>
+access("/run/udev/queue", F_OK)         = 0
+>
+>
+[Environment]:
+>
+linux: V6.10-rc3
+>
+QEMU: V9.0.0
+>
+ndctl: v79
+>
+>
+I know this is because of the wrong use of the QEMU command, but I think we
+>
+should
+>
+be aware of this error in one of the QEMU, OS or ndctl side at least.
+>
+>
+Thanks
+>
+Xingtao
+
diff --git a/results/classifier/004/other/25892827 b/results/classifier/004/other/25892827
new file mode 100644
index 00000000..4477b5ae
--- /dev/null
+++ b/results/classifier/004/other/25892827
@@ -0,0 +1,1085 @@
+other: 0.892
+KVM: 0.872
+vnc: 0.846
+instruction: 0.842
+mistranslation: 0.842
+boot: 0.839
+network: 0.839
+device: 0.839
+graphic: 0.832
+assembly: 0.829
+semantic: 0.825
+socket: 0.822
+
+[Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot
+
+Hi,
+
+Recently we encountered a problem in our project: 2 CPUs in VM are not brought 
+up normally after reboot.
+
+Our host is using KVM kmod 3.6 and QEMU 2.1.
+A SLES 11 sp3 VM configured with 8 vcpus,
+cpu model is configured with 'host-passthrough'.
+
+After VM's first time started up, everything seems to be OK.
+and then VM is paniced and rebooted.
+After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not online.
+
+This is the only message we can get from VM:
+VM dmesg shows:
+[    0.069867] Booting Node   0, Processors  #1
+[    5.060042] CPU1: Stuck ??
+[    5.060499]  #2
+[    5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock
+[    5.088335] KVM setup async PF for cpu 2
+[    5.092967] NMI watchdog enabled, takes one hw-pmu counter.
+[    5.094405]  #3
+[    5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock
+[    5.108333] KVM setup async PF for cpu 3
+[    5.113553] NMI watchdog enabled, takes one hw-pmu counter.
+[    5.114970]  #4
+[    5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock
+[    5.128336] KVM setup async PF for cpu 4
+[    5.134576] NMI watchdog enabled, takes one hw-pmu counter.
+[    5.135998]  #5
+[    5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock
+[    5.152334] KVM setup async PF for cpu 5
+[    5.154764] NMI watchdog enabled, takes one hw-pmu counter.
+[    5.156467]  #6
+[    5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock
+[    5.172341] KVM setup async PF for cpu 6
+[    5.180738] NMI watchdog enabled, takes one hw-pmu counter.
+[    5.182173]  #7 Ok.
+[   10.170815] CPU7: Stuck ??
+[   10.171648] Brought up 6 CPUs
+[   10.172394] Total of 6 processors activated (28799.97 BogoMIPS).
+
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not consuming 
+any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck' warning,
+and found that the only possible is the emulation of 'cpuid' instruct in 
+kvm/qemu has something wrong.
+But since we canât reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+
+Has anyone come across these problem before? Or any idea?
+
+Thanks,
+zhanghailiang
+
+On 06/07/2015 09:54, zhanghailiang wrote:
+>
+>
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+>
+consuming any cpu (Should be in idle state),
+>
+All of VCPUs' stacks in host is like bellow:
+>
+>
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+>
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+>
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+>
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+>
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+>
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+>
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+>
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+>
+[<ffffffffffffffff>] 0xffffffffffffffff
+>
+>
+We looked into the kernel codes that could leading to the above 'Stuck'
+>
+warning,
+>
+and found that the only possible is the emulation of 'cpuid' instruct in
+>
+kvm/qemu has something wrong.
+>
+But since we canât reproduce this problem, we are not quite sure.
+>
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation?  What do the
+traces say about vcpus 1 and 7?
+
+Paolo
+
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+On 06/07/2015 09:54, zhanghailiang wrote:
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+consuming any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck'
+warning,
+and found that the only possible is the emulation of 'cpuid' instruct in
+kvm/qemu has something wrong.
+But since we canât reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation?  What do the
+traces say about vcpus 1 and 7?
+OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
+located in
+do_boot_cpu(). It's in BSP context, the call process is:
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
+-> wakeup_secondary_via_INIT() to trigger APs.
+It will wait 5s for APs to startup, if some AP not startup normally, it will 
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
+begins to execute the code
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
+smp_callin()(smpboot.c).
+The follow is the starup process of BSP and AP.
+BSP:
+start_kernel()
+  ->smp_init()
+     ->smp_boot_cpus()
+       ->do_boot_cpu()
+           ->start_ip = trampoline_address(); //set the address that AP will go 
+to execute
+           ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+           ->for (timeout = 0; timeout < 50000; timeout++)
+               if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if AP 
+startup or not
+
+APs:
+ENTRY(trampoline_data) (trampoline_64.S)
+      ->ENTRY(secondary_startup_64) (head_64.S)
+         ->start_secondary() (smpboot.c)
+            ->cpu_init();
+            ->smp_callin();
+                ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP comes 
+here, the BSP will not prints the error message.
+
+From above call process, we can be sure that, the AP has been stuck between 
+trampoline_data and the cpumask_set_cpu() in
+smp_callin(), we look through these codes path carefully, and only found a 
+'hlt' instruct that could block the process.
+It is located in trampoline_data():
+
+ENTRY(trampoline_data)
+        ...
+
+        call    verify_cpu              # Verify the cpu supports long mode
+        testl   %eax, %eax              # Check for return code
+        jnz     no_longmode
+
+        ...
+
+no_longmode:
+        hlt
+        jmp no_longmode
+
+For the process verify_cpu(),
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
+No-root mode.
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
+the fail in verify_cpu.
+
+From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+[    5.060042] CPU1: Stuck ??
+[   10.170815] CPU7: Stuck ??
+[   10.171648] Brought up 6 CPUs
+
+Besides, the follow is the cpus message got from host.
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
+instance-0000000
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+  CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+  CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+  CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+  CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+  CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+  CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+  CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+
+Oh, i also forgot to mention in the above message that, we have bond each vCPU 
+to different physical CPU in
+host.
+
+Thanks,
+zhanghailiang
+
+On 06/07/2015 11:59, zhanghailiang wrote:
+>
+>
+>
+Besides, the follow is the cpus message got from host.
+>
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
+>
+qemu-monitor-command instance-0000000
+>
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+>
+CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+>
+CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+>
+CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+>
+CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+>
+CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+>
+CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+>
+CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+>
+>
+Oh, i also forgot to mention in the above message that, we have bond
+>
+each vCPU to different physical CPU in
+>
+host.
+Can you capture a trace on the host (trace-cmd record -e kvm) and send
+it privately?  Please note which CPUs get stuck, since I guess it's not
+always 1 and 7.
+
+Paolo
+
+On Mon, 6 Jul 2015 17:59:10 +0800
+zhanghailiang <address@hidden> wrote:
+
+>
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+>
+>
+>
+>
+>
+> On 06/07/2015 09:54, zhanghailiang wrote:
+>
+>>
+>
+>>  From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+>
+>> consuming any cpu (Should be in idle state),
+>
+>> All of VCPUs' stacks in host is like bellow:
+>
+>>
+>
+>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+>
+>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+>
+>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+>
+>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+>
+>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+>
+>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+>
+>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+>
+>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+>
+>> [<ffffffffffffffff>] 0xffffffffffffffff
+>
+>>
+>
+>> We looked into the kernel codes that could leading to the above 'Stuck'
+>
+>> warning,
+in current upstream there isn't any printk(...Stuck...) left since that code 
+path
+has been reworked.
+I've often seen this on over-committed host during guest CPUs up/down torture 
+test.
+Could you update guest kernel to upstream and see if issue reproduces?
+
+>
+>> and found that the only possible is the emulation of 'cpuid' instruct in
+>
+>> kvm/qemu has something wrong.
+>
+>> But since we canât reproduce this problem, we are not quite sure.
+>
+>> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+>
+>
+>
+> Can you explain the relationship to the cpuid emulation?  What do the
+>
+> traces say about vcpus 1 and 7?
+>
+>
+OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is
+>
+located in
+>
+do_boot_cpu(). It's in BSP context, the call process is:
+>
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu()
+>
+-> wakeup_secondary_via_INIT() to trigger APs.
+>
+It will wait 5s for APs to startup, if some AP not startup normally, it will
+>
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+>
+>
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and
+>
+begins to execute the code
+>
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places
+>
+before smp_callin()(smpboot.c).
+>
+The follow is the starup process of BSP and AP.
+>
+BSP:
+>
+start_kernel()
+>
+->smp_init()
+>
+->smp_boot_cpus()
+>
+->do_boot_cpu()
+>
+->start_ip = trampoline_address(); //set the address that AP will
+>
+go to execute
+>
+->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+>
+->for (timeout = 0; timeout < 50000; timeout++)
+>
+if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if
+>
+AP startup or not
+>
+>
+APs:
+>
+ENTRY(trampoline_data) (trampoline_64.S)
+>
+->ENTRY(secondary_startup_64) (head_64.S)
+>
+->start_secondary() (smpboot.c)
+>
+->cpu_init();
+>
+->smp_callin();
+>
+->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
+>
+comes here, the BSP will not prints the error message.
+>
+>
+From above call process, we can be sure that, the AP has been stuck between
+>
+trampoline_data and the cpumask_set_cpu() in
+>
+smp_callin(), we look through these codes path carefully, and only found a
+>
+'hlt' instruct that could block the process.
+>
+It is located in trampoline_data():
+>
+>
+ENTRY(trampoline_data)
+>
+...
+>
+>
+call    verify_cpu              # Verify the cpu supports long mode
+>
+testl   %eax, %eax              # Check for return code
+>
+jnz     no_longmode
+>
+>
+...
+>
+>
+no_longmode:
+>
+hlt
+>
+jmp no_longmode
+>
+>
+For the process verify_cpu(),
+>
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from
+>
+No-root mode.
+>
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to
+>
+the fail in verify_cpu.
+>
+>
+From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+>
+[    5.060042] CPU1: Stuck ??
+>
+[   10.170815] CPU7: Stuck ??
+>
+[   10.171648] Brought up 6 CPUs
+>
+>
+Besides, the follow is the cpus message got from host.
+>
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
+>
+qemu-monitor-command instance-0000000
+>
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+>
+CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+>
+CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+>
+CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+>
+CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+>
+CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+>
+CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+>
+CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+>
+>
+Oh, i also forgot to mention in the above message that, we have bond each
+>
+vCPU to different physical CPU in
+>
+host.
+>
+>
+Thanks,
+>
+zhanghailiang
+>
+>
+>
+>
+>
+--
+>
+To unsubscribe from this list: send the line "unsubscribe kvm" in
+>
+the body of a message to address@hidden
+>
+More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+
+On 2015/7/7 19:23, Igor Mammedov wrote:
+On Mon, 6 Jul 2015 17:59:10 +0800
+zhanghailiang <address@hidden> wrote:
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+On 06/07/2015 09:54, zhanghailiang wrote:
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+consuming any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck'
+warning,
+in current upstream there isn't any printk(...Stuck...) left since that code 
+path
+has been reworked.
+I've often seen this on over-committed host during guest CPUs up/down torture 
+test.
+Could you update guest kernel to upstream and see if issue reproduces?
+Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to 
+reproduce it.
+
+For your test case, is it a kernel bug?
+Or is there any related patch could solve your test problem been merged into
+upstream ?
+
+Thanks,
+zhanghailiang
+and found that the only possible is the emulation of 'cpuid' instruct in
+kvm/qemu has something wrong.
+But since we canât reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation?  What do the
+traces say about vcpus 1 and 7?
+OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
+located in
+do_boot_cpu(). It's in BSP context, the call process is:
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
+-> wakeup_secondary_via_INIT() to trigger APs.
+It will wait 5s for APs to startup, if some AP not startup normally, it will 
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
+begins to execute the code
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
+smp_callin()(smpboot.c).
+The follow is the starup process of BSP and AP.
+BSP:
+start_kernel()
+    ->smp_init()
+       ->smp_boot_cpus()
+         ->do_boot_cpu()
+             ->start_ip = trampoline_address(); //set the address that AP will 
+go to execute
+             ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+             ->for (timeout = 0; timeout < 50000; timeout++)
+                 if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if 
+AP startup or not
+
+APs:
+ENTRY(trampoline_data) (trampoline_64.S)
+        ->ENTRY(secondary_startup_64) (head_64.S)
+           ->start_secondary() (smpboot.c)
+              ->cpu_init();
+              ->smp_callin();
+                  ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP 
+comes here, the BSP will not prints the error message.
+
+  From above call process, we can be sure that, the AP has been stuck between 
+trampoline_data and the cpumask_set_cpu() in
+smp_callin(), we look through these codes path carefully, and only found a 
+'hlt' instruct that could block the process.
+It is located in trampoline_data():
+
+ENTRY(trampoline_data)
+          ...
+
+        call    verify_cpu              # Verify the cpu supports long mode
+        testl   %eax, %eax              # Check for return code
+        jnz     no_longmode
+
+          ...
+
+no_longmode:
+        hlt
+        jmp no_longmode
+
+For the process verify_cpu(),
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
+No-root mode.
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
+the fail in verify_cpu.
+
+  From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+[    5.060042] CPU1: Stuck ??
+[   10.170815] CPU7: Stuck ??
+[   10.171648] Brought up 6 CPUs
+
+Besides, the follow is the cpus message got from host.
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
+instance-0000000
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+    CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+    CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+    CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+    CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+    CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+    CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+    CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+
+Oh, i also forgot to mention in the above message that, we have bond each vCPU 
+to different physical CPU in
+host.
+
+Thanks,
+zhanghailiang
+
+
+
+
+--
+To unsubscribe from this list: send the line "unsubscribe kvm" in
+the body of a message to address@hidden
+More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+.
+
+On Tue, 7 Jul 2015 19:43:35 +0800
+zhanghailiang <address@hidden> wrote:
+
+>
+On 2015/7/7 19:23, Igor Mammedov wrote:
+>
+> On Mon, 6 Jul 2015 17:59:10 +0800
+>
+> zhanghailiang <address@hidden> wrote:
+>
+>
+>
+>> On 2015/7/6 16:45, Paolo Bonzini wrote:
+>
+>>>
+>
+>>>
+>
+>>> On 06/07/2015 09:54, zhanghailiang wrote:
+>
+>>>>
+>
+>>>>   From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+>
+>>>> consuming any cpu (Should be in idle state),
+>
+>>>> All of VCPUs' stacks in host is like bellow:
+>
+>>>>
+>
+>>>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+>
+>>>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+>
+>>>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+>
+>>>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+>
+>>>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+>
+>>>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+>
+>>>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+>
+>>>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+>
+>>>> [<ffffffffffffffff>] 0xffffffffffffffff
+>
+>>>>
+>
+>>>> We looked into the kernel codes that could leading to the above 'Stuck'
+>
+>>>> warning,
+>
+> in current upstream there isn't any printk(...Stuck...) left since that
+>
+> code path
+>
+> has been reworked.
+>
+> I've often seen this on over-committed host during guest CPUs up/down
+>
+> torture test.
+>
+> Could you update guest kernel to upstream and see if issue reproduces?
+>
+>
+>
+>
+Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to
+>
+reproduce it.
+>
+>
+For your test case, is it a kernel bug?
+>
+Or is there any related patch could solve your test problem been merged into
+>
+upstream ?
+I don't remember all prerequisite patches but you should be able to find
+http://marc.info/?l=linux-kernel&m=140326703108009&w=2
+"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
+and then look for dependencies.
+
+
+>
+>
+Thanks,
+>
+zhanghailiang
+>
+>
+>>>> and found that the only possible is the emulation of 'cpuid' instruct in
+>
+>>>> kvm/qemu has something wrong.
+>
+>>>> But since we canât reproduce this problem, we are not quite sure.
+>
+>>>> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+>
+>>>
+>
+>>> Can you explain the relationship to the cpuid emulation?  What do the
+>
+>>> traces say about vcpus 1 and 7?
+>
+>>
+>
+>> OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is
+>
+>> located in
+>
+>> do_boot_cpu(). It's in BSP context, the call process is:
+>
+>> BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() ->
+>
+>> do_boot_cpu() -> wakeup_secondary_via_INIT() to trigger APs.
+>
+>> It will wait 5s for APs to startup, if some AP not startup normally, it
+>
+>> will print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+>
+>>
+>
+>> If it prints 'Stuck', it means the AP has received the SIPI interrupt and
+>
+>> begins to execute the code
+>
+>> 'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places
+>
+>> before smp_callin()(smpboot.c).
+>
+>> The follow is the starup process of BSP and AP.
+>
+>> BSP:
+>
+>> start_kernel()
+>
+>>     ->smp_init()
+>
+>>        ->smp_boot_cpus()
+>
+>>          ->do_boot_cpu()
+>
+>>              ->start_ip = trampoline_address(); //set the address that AP
+>
+>> will go to execute
+>
+>>              ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+>
+>>              ->for (timeout = 0; timeout < 50000; timeout++)
+>
+>>                  if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;//
+>
+>> check if AP startup or not
+>
+>>
+>
+>> APs:
+>
+>> ENTRY(trampoline_data) (trampoline_64.S)
+>
+>>         ->ENTRY(secondary_startup_64) (head_64.S)
+>
+>>            ->start_secondary() (smpboot.c)
+>
+>>               ->cpu_init();
+>
+>>               ->smp_callin();
+>
+>>                   ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
+>
+>> comes here, the BSP will not prints the error message.
+>
+>>
+>
+>>   From above call process, we can be sure that, the AP has been stuck
+>
+>> between trampoline_data and the cpumask_set_cpu() in
+>
+>> smp_callin(), we look through these codes path carefully, and only found a
+>
+>> 'hlt' instruct that could block the process.
+>
+>> It is located in trampoline_data():
+>
+>>
+>
+>> ENTRY(trampoline_data)
+>
+>>           ...
+>
+>>
+>
+>>    call    verify_cpu              # Verify the cpu supports long mode
+>
+>>    testl   %eax, %eax              # Check for return code
+>
+>>    jnz     no_longmode
+>
+>>
+>
+>>           ...
+>
+>>
+>
+>> no_longmode:
+>
+>>    hlt
+>
+>>    jmp no_longmode
+>
+>>
+>
+>> For the process verify_cpu(),
+>
+>> we can only find the 'cpuid' sensitive instruct that could lead VM exit
+>
+>> from No-root mode.
+>
+>> This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading
+>
+>> to the fail in verify_cpu.
+>
+>>
+>
+>>   From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+>
+>> [    5.060042] CPU1: Stuck ??
+>
+>> [   10.170815] CPU7: Stuck ??
+>
+>> [   10.171648] Brought up 6 CPUs
+>
+>>
+>
+>> Besides, the follow is the cpus message got from host.
+>
+>> 80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
+>
+>> qemu-monitor-command instance-0000000
+>
+>> * CPU #0: pc=0x00007f64160c683d thread_id=68570
+>
+>>     CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+>
+>>     CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+>
+>>     CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+>
+>>     CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+>
+>>     CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+>
+>>     CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+>
+>>     CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+>
+>>
+>
+>> Oh, i also forgot to mention in the above message that, we have bond each
+>
+>> vCPU to different physical CPU in
+>
+>> host.
+>
+>>
+>
+>> Thanks,
+>
+>> zhanghailiang
+>
+>>
+>
+>>
+>
+>>
+>
+>>
+>
+>> --
+>
+>> To unsubscribe from this list: send the line "unsubscribe kvm" in
+>
+>> the body of a message to address@hidden
+>
+>> More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+>
+>
+>
+>
+>
+> .
+>
+>
+>
+>
+>
+
+On 2015/7/7 20:21, Igor Mammedov wrote:
+On Tue, 7 Jul 2015 19:43:35 +0800
+zhanghailiang <address@hidden> wrote:
+On 2015/7/7 19:23, Igor Mammedov wrote:
+On Mon, 6 Jul 2015 17:59:10 +0800
+zhanghailiang <address@hidden> wrote:
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+On 06/07/2015 09:54, zhanghailiang wrote:
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+consuming any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck'
+warning,
+in current upstream there isn't any printk(...Stuck...) left since that code 
+path
+has been reworked.
+I've often seen this on over-committed host during guest CPUs up/down torture 
+test.
+Could you update guest kernel to upstream and see if issue reproduces?
+Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to 
+reproduce it.
+
+For your test case, is it a kernel bug?
+Or is there any related patch could solve your test problem been merged into
+upstream ?
+I don't remember all prerequisite patches but you should be able to find
+http://marc.info/?l=linux-kernel&m=140326703108009&w=2
+"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
+and then look for dependencies.
+Er, we have investigated this patch, and it is not related to our problem, :)
+
+Thanks.
+Thanks,
+zhanghailiang
+and found that the only possible is the emulation of 'cpuid' instruct in
+kvm/qemu has something wrong.
+But since we canât reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation?  What do the
+traces say about vcpus 1 and 7?
+OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
+located in
+do_boot_cpu(). It's in BSP context, the call process is:
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
+-> wakeup_secondary_via_INIT() to trigger APs.
+It will wait 5s for APs to startup, if some AP not startup normally, it will 
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
+begins to execute the code
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
+smp_callin()(smpboot.c).
+The follow is the starup process of BSP and AP.
+BSP:
+start_kernel()
+     ->smp_init()
+        ->smp_boot_cpus()
+          ->do_boot_cpu()
+              ->start_ip = trampoline_address(); //set the address that AP will 
+go to execute
+              ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+              ->for (timeout = 0; timeout < 50000; timeout++)
+                  if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if 
+AP startup or not
+
+APs:
+ENTRY(trampoline_data) (trampoline_64.S)
+         ->ENTRY(secondary_startup_64) (head_64.S)
+            ->start_secondary() (smpboot.c)
+               ->cpu_init();
+               ->smp_callin();
+                   ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP 
+comes here, the BSP will not prints the error message.
+
+   From above call process, we can be sure that, the AP has been stuck between 
+trampoline_data and the cpumask_set_cpu() in
+smp_callin(), we look through these codes path carefully, and only found a 
+'hlt' instruct that could block the process.
+It is located in trampoline_data():
+
+ENTRY(trampoline_data)
+           ...
+
+        call    verify_cpu              # Verify the cpu supports long mode
+        testl   %eax, %eax              # Check for return code
+        jnz     no_longmode
+
+           ...
+
+no_longmode:
+        hlt
+        jmp no_longmode
+
+For the process verify_cpu(),
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
+No-root mode.
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
+the fail in verify_cpu.
+
+   From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+[    5.060042] CPU1: Stuck ??
+[   10.170815] CPU7: Stuck ??
+[   10.171648] Brought up 6 CPUs
+
+Besides, the follow is the cpus message got from host.
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
+instance-0000000
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+     CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+     CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+     CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+     CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+     CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+     CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+     CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+
+Oh, i also forgot to mention in the above message that, we have bond each vCPU 
+to different physical CPU in
+host.
+
+Thanks,
+zhanghailiang
+
+
+
+
+--
+To unsubscribe from this list: send the line "unsubscribe kvm" in
+the body of a message to address@hidden
+More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+.
+.
+
diff --git a/results/classifier/004/other/31349848 b/results/classifier/004/other/31349848
new file mode 100644
index 00000000..2177a1bc
--- /dev/null
+++ b/results/classifier/004/other/31349848
@@ -0,0 +1,162 @@
+other: 0.901
+device: 0.881
+graphic: 0.876
+assembly: 0.855
+vnc: 0.854
+semantic: 0.846
+socket: 0.846
+instruction: 0.845
+KVM: 0.827
+boot: 0.815
+mistranslation: 0.781
+network: 0.769
+
+[Qemu-devel]  [BUG] qemu stuck when detach host-usb device
+
+Description of problem:
+The guest has a host-usb device(Kingston Technology DataTraveler 100 G3/G4/SE9 
+G2), which is attached
+to xhci controller(on host). Qemu will stuck if I detach it from guest.
+
+How reproducible:
+100%
+
+Steps to Reproduce:
+1.            Use usb stick to copy files in guest , make it busy working.
+2.            virsh detach-device vm_name usb.xml
+
+Then qemu will stuck for 20s, I found this is because libusb_release_interface 
+block for 20s.
+Dmesg prints:
+
+[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
+[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
+[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
+[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
+
+Is this a hardware error or software's bug?
+
+On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote:
+>
+Description of problem:
+>
+The guest has a host-usb device(Kingston Technology DataTraveler 100
+>
+G3/G4/SE9 G2), which is attached
+>
+to xhci controller(on host). Qemu will stuck if I detach it from guest.
+>
+>
+How reproducible:
+>
+100%
+>
+>
+Steps to Reproduce:
+>
+1.            Use usb stick to copy files in guest , make it busy working.
+>
+2.            virsh detach-device vm_name usb.xml
+>
+>
+Then qemu will stuck for 20s, I found this is because
+>
+libusb_release_interface block for 20s.
+>
+Dmesg prints:
+>
+>
+[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
+>
+[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
+>
+[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
+>
+[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
+>
+>
+Is this a hardware error or software's bug?
+I'd guess software error, could be is libusb or (host) linux kernel.
+Cc'ing libusb-devel.
+
+cheers,
+  Gerd
+
+>
+-----Original Message-----
+>
+From: Gerd Hoffmann [
+mailto:address@hidden
+>
+Sent: Tuesday, November 27, 2018 2:09 PM
+>
+To: linzhecheng <address@hidden>
+>
+Cc: address@hidden; wangxin (U) <address@hidden>;
+>
+Zhoujian (jay) <address@hidden>; address@hidden
+>
+Subject: Re: [Qemu-devel] [BUG] qemu stuck when detach host-usb device
+>
+>
+On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote:
+>
+> Description of problem:
+>
+> The guest has a host-usb device(Kingston Technology DataTraveler 100
+>
+> G3/G4/SE9 G2), which is attached to xhci controller(on host). Qemu will
+>
+> stuck
+>
+if I detach it from guest.
+>
+>
+>
+> How reproducible:
+>
+> 100%
+>
+>
+>
+> Steps to Reproduce:
+>
+> 1.            Use usb stick to copy files in guest , make it busy working.
+>
+> 2.            virsh detach-device vm_name usb.xml
+>
+>
+>
+> Then qemu will stuck for 20s, I found this is because
+>
+> libusb_release_interface
+>
+block for 20s.
+>
+> Dmesg prints:
+>
+>
+>
+> [35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
+>
+> [35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
+>
+> [35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
+>
+> [35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
+>
+>
+>
+> Is this a hardware error or software's bug?
+>
+>
+I'd guess software error, could be is libusb or (host) linux kernel.
+>
+Cc'ing libusb-devel.
+Perhaps it's usb driver's bug. Could you also reproduce it?
+>
+>
+cheers,
+>
+Gerd
+
diff --git a/results/classifier/004/other/32484936 b/results/classifier/004/other/32484936
new file mode 100644
index 00000000..8cada8da
--- /dev/null
+++ b/results/classifier/004/other/32484936
@@ -0,0 +1,231 @@
+other: 0.856
+assembly: 0.835
+semantic: 0.832
+vnc: 0.830
+device: 0.830
+instruction: 0.829
+socket: 0.829
+graphic: 0.813
+network: 0.811
+boot: 0.810
+mistranslation: 0.794
+KVM: 0.793
+
+[Qemu-devel] [Snapshot Bug?]Qcow2 meta data corruption
+
+Hi all,
+There was a problem about qcow2 image file happened in my serval vms and I could not figure it out,
+so have to ask for some help.
+Here is the thing:
+At first, I found there were some data corruption in a vm, so I did qemu-img check to all my vms.
+parts of check report:
+3-Leaked cluster 2926229 refcount=1 reference=0
+4-Leaked cluster 3021181 refcount=1 reference=0
+5-Leaked cluster 3021182 refcount=1 reference=0
+6-Leaked cluster 3021183 refcount=1 reference=0
+7-Leaked cluster 3021184 refcount=1 reference=0
+8-ERROR cluster 3102547 refcount=3 reference=4
+9-ERROR cluster 3111536 refcount=3 reference=4
+10-ERROR cluster 3113369 refcount=3 reference=4
+11-ERROR cluster 3235590 refcount=10 reference=11
+12-ERROR cluster 3235591 refcount=10 reference=11
+423-Warning: cluster offset=0xc000c00020000 is after the end of the image file, can't properly check refcounts.
+424-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
+425-Warning: cluster offset=0xc0001000c0000 is after the end of the image file, can't properly check refcounts.
+426-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
+427-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
+428-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
+429-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
+430-Warning: cluster offset=0xc000c00010000 is after the end of the image file, can't properly check refcounts.
+After a futher look in, I found two l2 entries point to the same cluster, and that was found in serval qcow2 image files of different vms.
+Like this:
+table entry conflict (with our qcow2 check 
+tool):
+a table offset : 0x00000093f7080000 level : 2, l1 table entry 100, l2 table entry 7
+b table offset : 0x00000093f7080000 level : 2, l1 table entry 5, l2 table entry 7
+table entry conflict :
+a table offset : 0x00000000a01e0000 level : 2, l1 table entry 100, l2 table entry 19
+b table offset : 0x00000000a01e0000 level : 2, l1 table entry 5, l2 table entry 19
+table entry conflict :
+a table offset : 0x00000000a01d0000 level : 2, l1 table entry 100, l2 table entry 18
+b table offset : 0x00000000a01d0000 level : 2, l1 table entry 5, l2 table entry 18
+table entry conflict :
+a table offset : 0x00000000a01c0000 level : 2, l1 table entry 100, l2 table entry 17
+b table offset : 0x00000000a01c0000 level : 2, l1 table entry 5, l2 table entry 17
+table entry conflict :
+a table offset : 0x00000000a01b0000 level : 2, l1 table entry 100, l2 table entry 16
+b table offset : 0x00000000a01b0000 level : 2, l1 table entry 5, l2 table entry 16
+I think the problem is relate to the snapshot create, delete. But I cant reproduce it .
+Can Anyone give a hint about how this happen?
+Qemu version 2.0.1, I download the source code and make install it.
+Qemu parameters:
+/usr/bin/kvm -chardev socket,id=qmp,path=/var/run/qemu-server/5855899639838.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5855899639838.pid -daemonize -name yfMailSvr-200.200.0.14 -smp sockets=1,cores=4 -cpu core2duo,hv_spinlocks=0xffff,hv_relaxed,hv_time,hv_vapic,+sse4.1,+sse4.2,+x2apic,+erms,+smep,+fsgsbase,+f16c,+dca,+pcid,+pdcm,+xtpr,+ht,+ss,+acpi,+ds -nodefaults -vga cirrus -k en-us -boot menu=on,splash-time=8000 -m 8192 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0 -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=585589963983800,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,romfile=,mac=FE:FC:FE:F0:AB:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1
+Thanks
+Sangfor VT.
+leijian
+
+Hi all,
+There was a problem about qcow2 image file happened in my serval vms and I could not figure it out,
+so have to ask for some help.
+Here is the thing:
+At first, I found there were some data corruption in a vm, so I did qemu-img check to all my vms.
+parts of check report:
+3-Leaked cluster 2926229 refcount=1 reference=0
+4-Leaked cluster 3021181 refcount=1 reference=0
+5-Leaked cluster 3021182 refcount=1 reference=0
+6-Leaked cluster 3021183 refcount=1 reference=0
+7-Leaked cluster 3021184 refcount=1 reference=0
+8-ERROR cluster 3102547 refcount=3 reference=4
+9-ERROR cluster 3111536 refcount=3 reference=4
+10-ERROR cluster 3113369 refcount=3 reference=4
+11-ERROR cluster 3235590 refcount=10 reference=11
+12-ERROR cluster 3235591 refcount=10 reference=11
+423-Warning: cluster offset=0xc000c00020000 is after the end of the image file, can't properly check refcounts.
+424-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
+425-Warning: cluster offset=0xc0001000c0000 is after the end of the image file, can't properly check refcounts.
+426-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
+427-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
+428-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
+429-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
+430-Warning: cluster offset=0xc000c00010000 is after the end of the image file, can't properly check refcounts.
+After a futher look in, I found two l2 entries point to the same cluster, and that was found in serval qcow2 image files of different vms.
+Like this:
+table entry conflict (with our qcow2 check 
+tool):
+a table offset : 0x00000093f7080000 level : 2, l1 table entry 100, l2 table entry 7
+b table offset : 0x00000093f7080000 level : 2, l1 table entry 5, l2 table entry 7
+table entry conflict :
+a table offset : 0x00000000a01e0000 level : 2, l1 table entry 100, l2 table entry 19
+b table offset : 0x00000000a01e0000 level : 2, l1 table entry 5, l2 table entry 19
+table entry conflict :
+a table offset : 0x00000000a01d0000 level : 2, l1 table entry 100, l2 table entry 18
+b table offset : 0x00000000a01d0000 level : 2, l1 table entry 5, l2 table entry 18
+table entry conflict :
+a table offset : 0x00000000a01c0000 level : 2, l1 table entry 100, l2 table entry 17
+b table offset : 0x00000000a01c0000 level : 2, l1 table entry 5, l2 table entry 17
+table entry conflict :
+a table offset : 0x00000000a01b0000 level : 2, l1 table entry 100, l2 table entry 16
+b table offset : 0x00000000a01b0000 level : 2, l1 table entry 5, l2 table entry 16
+I think the problem is relate to the snapshot create, delete. But I cant reproduce it .
+Can Anyone give a hint about how this happen?
+Qemu version 2.0.1, I download the source code and make install it.
+Qemu parameters:
+/usr/bin/kvm -chardev socket,id=qmp,path=/var/run/qemu-server/5855899639838.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5855899639838.pid -daemonize -name yfMailSvr-200.200.0.14 -smp sockets=1,cores=4 -cpu core2duo,hv_spinlocks=0xffff,hv_relaxed,hv_time,hv_vapic,+sse4.1,+sse4.2,+x2apic,+erms,+smep,+fsgsbase,+f16c,+dca,+pcid,+pdcm,+xtpr,+ht,+ss,+acpi,+ds -nodefaults -vga cirrus -k en-us -boot menu=on,splash-time=8000 -m 8192 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0 -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=585589963983800,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,romfile=,mac=FE:FC:FE:F0:AB:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1
+Thanks
+Sangfor VT.
+leijian
+
+Am 03.04.2015 um 12:04 hat leijian geschrieben:
+>
+Hi all,
+>
+>
+There was a problem about qcow2 image file happened in my serval vms and I
+>
+could not figure it out,
+>
+so have to ask for some help.
+>
+[...]
+>
+I think the problem is relate to the snapshot create, delete. But I cant
+>
+reproduce it .
+>
+Can Anyone give a hint about how this happen?
+How did you create/delete your snapshots?
+
+More specifically, did you take care to never access your image from
+more than one process (except if both are read-only)? It happens
+occasionally that people use 'qemu-img snapshot' while the VM is
+running. This is wrong and can corrupt the image.
+
+Kevin
+
+On 04/07/2015 03:33 AM, Kevin Wolf wrote:
+>
+More specifically, did you take care to never access your image from
+>
+more than one process (except if both are read-only)? It happens
+>
+occasionally that people use 'qemu-img snapshot' while the VM is
+>
+running. This is wrong and can corrupt the image.
+Since this has been done by more than one person, I'm wondering if there
+is something we can do in the qcow2 format itself to make it harder for
+the casual user to cause corruption.  Maybe if we declare some bit or
+extension header for an image open for writing, which other readers can
+use as a warning ("this image is being actively modified; reading it may
+fail"), and other writers can use to deny access ("another process is
+already modifying this image"), where a writer should set that bit
+before writing anything else in the file, then clear it on exit.  Of
+course, you'd need a way to override the bit to actively clear it to
+recover from the case of a writer dying unexpectedly without resetting
+it normally.  And it won't help the case of a reader opening the file
+first, followed by a writer, where the reader could still get thrown off
+track.
+
+Or maybe we could document in the qcow2 format that all readers and
+writers should attempt to obtain the appropriate flock() permissions [or
+other appropriate advisory locking scheme] over the file header, so that
+cooperating processes that both use advisory locking will know when the
+file is in use by another process.
+
+-- 
+Eric Blake   eblake redhat com    +1-919-301-3266
+Libvirt virtualization library
+http://libvirt.org
+signature.asc
+Description:
+OpenPGP digital signature
+
+ï»¿
+I created/deleted the snapshot by using qmp command "snapshot_blkdev_internal"/"snapshot_delete_blkdev_internal", and for avoiding the case you mentioned above, I have added the flock() permission in the qemu_open().
+Here is the test of doing qemu-img snapshot to a running vm:
+Diskfile:/sf/data/36c81f660e38b3b001b183da50b477d89_f8bc123b3e74/images/host-f8bc123b3e74/4a8d8728fcdc/Devried30030.vm/vm-disk-1.qcow2 is used! errno=Resource temporarily unavailable
+Does the two cluster entry happen to be the same because of the refcount of using cluster decrease to 0 unexpectedly and  is allocated again?
+If it was not accessing the image from more than one process, any other exceptions I can test for?
+Thanks
+leijian
+From:
+Eric Blake
+Date:
+2015-04-07 23:27
+To:
+Kevin Wolf
+;
+leijian
+CC:
+qemu-devel
+;
+stefanha
+Subject:
+Re: [Qemu-devel] [Snapshot Bug?]Qcow2 meta data 
+corruption
+On 04/07/2015 03:33 AM, Kevin Wolf wrote:
+> More specifically, did you take care to never access your image from
+> more than one process (except if both are read-only)? It happens
+> occasionally that people use 'qemu-img snapshot' while the VM is
+> running. This is wrong and can corrupt the image.
+Since this has been done by more than one person, I'm wondering if there
+is something we can do in the qcow2 format itself to make it harder for
+the casual user to cause corruption.  Maybe if we declare some bit or
+extension header for an image open for writing, which other readers can
+use as a warning ("this image is being actively modified; reading it may
+fail"), and other writers can use to deny access ("another process is
+already modifying this image"), where a writer should set that bit
+before writing anything else in the file, then clear it on exit.  Of
+course, you'd need a way to override the bit to actively clear it to
+recover from the case of a writer dying unexpectedly without resetting
+it normally.  And it won't help the case of a reader opening the file
+first, followed by a writer, where the reader could still get thrown off
+track.
+Or maybe we could document in the qcow2 format that all readers and
+writers should attempt to obtain the appropriate flock() permissions [or
+other appropriate advisory locking scheme] over the file header, so that
+cooperating processes that both use advisory locking will know when the
+file is in use by another process.
+--
+Eric Blake   eblake redhat com    +1-919-301-3266
+Libvirt virtualization library http://libvirt.org
+
diff --git a/results/classifier/004/other/35170175 b/results/classifier/004/other/35170175
new file mode 100644
index 00000000..9d4185af
--- /dev/null
+++ b/results/classifier/004/other/35170175
@@ -0,0 +1,529 @@
+other: 0.933
+graphic: 0.844
+instruction: 0.812
+semantic: 0.798
+device: 0.787
+assembly: 0.767
+boot: 0.719
+KVM: 0.709
+socket: 0.681
+network: 0.666
+mistranslation: 0.641
+vnc: 0.633
+
+[Qemu-devel] [BUG] QEMU crashes with dpdk virtio pmd
+
+Qemu crashes, with pre-condition:
+vm xml config with multiqueue, and the vm's driver virtio-net support 
+multi-queue
+
+reproduce steps:
+i. start dpdk testpmd in VM with the virtio nic
+ii. stop testpmd
+iii. reboot the VM
+
+This commit "f9d6dbf0  remove virtio queues if the guest doesn't support 
+multiqueue" is introduced.
+
+Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
+VM DPDK version:  DPDK-1.6.1
+
+Call Trace:
+#0  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
+#1  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
+#2  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
+#3  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
+#4  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
+#5  0x00007f6088fea32c in iter_remove_or_steal () from 
+/usr/lib64/libglib-2.0.so.0
+#6  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at 
+qom/object.c:410
+#7  object_finalize (data=0x7f6091e74800) at qom/object.c:467
+#8  object_unref (address@hidden) at qom/object.c:903
+#9  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at 
+git/qemu/exec.c:1154
+#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
+#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514
+#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at 
+util/rcu.c:272
+#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
+#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
+
+Call Trace:
+#0  0x00007fdccaeb9790 in ?? ()
+#1  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at 
+qom/object.c:405
+#2  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
+#3  object_unref (address@hidden) at qom/object.c:903
+#4  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at 
+git/qemu/exec.c:1154
+#5  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
+#6  address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514
+#7  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at 
+util/rcu.c:272
+#8  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
+#9  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
+
+The q->tx_bh will free in virtio_net_del_queue() function, when remove virtio 
+queues 
+if the guest doesn't support multiqueue. But it might be still referenced by 
+others (eg . virtio_net_set_status()),
+which need so set NULL.
+
+diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
+index 7d091c9..98bd683 100644
+--- a/hw/net/virtio-net.c
++++ b/hw/net/virtio-net.c
+@@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, int index)
+     if (q->tx_timer) {
+         timer_del(q->tx_timer);
+         timer_free(q->tx_timer);
++        q->tx_timer = NULL;
+     } else {
+         qemu_bh_delete(q->tx_bh);
++        q->tx_bh = NULL;
+     }
++    q->tx_waiting = 0;
+     virtio_del_queue(vdev, index * 2 + 1);
+ }
+
+From: wangyunjian 
+Sent: Monday, April 24, 2017 6:10 PM
+To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason Wang' 
+<address@hidden>
+Cc: wangyunjian <address@hidden>; caihe <address@hidden>
+Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd 
+
+Qemu crashes, with pre-condition:
+vm xml config with multiqueue, and the vm's driver virtio-net support 
+multi-queue
+
+reproduce steps:
+i. start dpdk testpmd in VM with the virtio nic
+ii. stop testpmd
+iii. reboot the VM
+
+This commit "f9d6dbf0Â  remove virtio queues if the guest doesn't support 
+multiqueue" is introduced.
+
+Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
+VM DPDK version: Â DPDK-1.6.1
+
+Call Trace:
+#0Â  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
+#1Â  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
+#2Â  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
+#3Â  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
+#4Â  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
+#5Â  0x00007f6088fea32c in iter_remove_or_steal () from 
+/usr/lib64/libglib-2.0.so.0
+#6Â  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at 
+qom/object.c:410
+#7Â  object_finalize (data=0x7f6091e74800) at qom/object.c:467
+#8Â  object_unref (address@hidden) at qom/object.c:903
+#9Â  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at 
+git/qemu/exec.c:1154
+#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
+#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514
+#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at 
+util/rcu.c:272
+#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
+#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
+
+Call Trace:
+#0Â  0x00007fdccaeb9790 in ?? ()
+#1Â  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at 
+qom/object.c:405
+#2Â  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
+#3Â  object_unref (address@hidden) at qom/object.c:903
+#4Â  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at 
+git/qemu/exec.c:1154
+#5Â  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
+#6Â  address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514
+#7Â  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at 
+util/rcu.c:272
+#8Â  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
+#9Â  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
+
+On 2017å¹´04æ25æ¥ 19:37, wangyunjian wrote:
+The q->tx_bh will free in virtio_net_del_queue() function, when remove virtio 
+queues
+if the guest doesn't support multiqueue. But it might be still referenced by 
+others (eg . virtio_net_set_status()),
+which need so set NULL.
+
+diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
+index 7d091c9..98bd683 100644
+--- a/hw/net/virtio-net.c
++++ b/hw/net/virtio-net.c
+@@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, int index)
+      if (q->tx_timer) {
+          timer_del(q->tx_timer);
+          timer_free(q->tx_timer);
++        q->tx_timer = NULL;
+      } else {
+          qemu_bh_delete(q->tx_bh);
++        q->tx_bh = NULL;
+      }
++    q->tx_waiting = 0;
+      virtio_del_queue(vdev, index * 2 + 1);
+  }
+Thanks a lot for the fix.
+
+Two questions:
+- If virtio_net_set_status() is the only function that may access tx_bh,
+it looks like setting tx_waiting to zero is sufficient?
+- Can you post a formal patch for this?
+
+Thanks
+From: wangyunjian
+Sent: Monday, April 24, 2017 6:10 PM
+To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason Wang' 
+<address@hidden>
+Cc: wangyunjian <address@hidden>; caihe <address@hidden>
+Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd
+
+Qemu crashes, with pre-condition:
+vm xml config with multiqueue, and the vm's driver virtio-net support 
+multi-queue
+
+reproduce steps:
+i. start dpdk testpmd in VM with the virtio nic
+ii. stop testpmd
+iii. reboot the VM
+
+This commit "f9d6dbf0  remove virtio queues if the guest doesn't support 
+multiqueue" is introduced.
+
+Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
+VM DPDK version:  DPDK-1.6.1
+
+Call Trace:
+#0  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
+#1  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
+#2  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
+#3  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
+#4  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
+#5  0x00007f6088fea32c in iter_remove_or_steal () from 
+/usr/lib64/libglib-2.0.so.0
+#6  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at 
+qom/object.c:410
+#7  object_finalize (data=0x7f6091e74800) at qom/object.c:467
+#8  object_unref (address@hidden) at qom/object.c:903
+#9  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at 
+git/qemu/exec.c:1154
+#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
+#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514
+#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at 
+util/rcu.c:272
+#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
+#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
+
+Call Trace:
+#0  0x00007fdccaeb9790 in ?? ()
+#1  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at 
+qom/object.c:405
+#2  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
+#3  object_unref (address@hidden) at qom/object.c:903
+#4  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at 
+git/qemu/exec.c:1154
+#5  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
+#6  address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514
+#7  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at 
+util/rcu.c:272
+#8  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
+#9  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
+
+CCing Paolo and Stefan, since it has a relationship with bh in Qemu.
+
+>
+-----Original Message-----
+>
+From: Jason Wang [
+mailto:address@hidden
+>
+>
+>
+On 2017å¹´04æ25æ¥ 19:37, wangyunjian wrote:
+>
+> The q->tx_bh will free in virtio_net_del_queue() function, when remove
+>
+> virtio
+>
+queues
+>
+> if the guest doesn't support multiqueue. But it might be still referenced by
+>
+others (eg . virtio_net_set_status()),
+>
+> which need so set NULL.
+>
+>
+>
+> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
+>
+> index 7d091c9..98bd683 100644
+>
+> --- a/hw/net/virtio-net.c
+>
+> +++ b/hw/net/virtio-net.c
+>
+> @@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n,
+>
+int index)
+>
+>       if (q->tx_timer) {
+>
+>           timer_del(q->tx_timer);
+>
+>           timer_free(q->tx_timer);
+>
+> +        q->tx_timer = NULL;
+>
+>       } else {
+>
+>           qemu_bh_delete(q->tx_bh);
+>
+> +        q->tx_bh = NULL;
+>
+>       }
+>
+> +    q->tx_waiting = 0;
+>
+>       virtio_del_queue(vdev, index * 2 + 1);
+>
+>   }
+>
+>
+Thanks a lot for the fix.
+>
+>
+Two questions:
+>
+>
+- If virtio_net_set_status() is the only function that may access tx_bh,
+>
+it looks like setting tx_waiting to zero is sufficient?
+Currently yes, but we don't assure that it works for all scenarios, so
+we set the tx_bh and tx_timer to NULL to avoid to possibly access wild pointer,
+which is the common method for usage of bh in Qemu.
+
+I have another question about the root cause of this issure.
+
+This below trace is the path of setting tx_waiting to one in 
+virtio_net_handle_tx_bh() :
+
+Breakpoint 1, virtio_net_handle_tx_bh (vdev=0x0, vq=0x7f335ad13900) at 
+/data/wyj/git/qemu/hw/net/virtio-net.c:1398
+1398    {
+(gdb) bt
+#0  virtio_net_handle_tx_bh (vdev=0x0, vq=0x7f335ad13900) at 
+/data/wyj/git/qemu/hw/net/virtio-net.c:1398
+#1  0x00007f3357bddf9c in virtio_bus_set_host_notifier (bus=<optimized out>, 
+address@hidden, address@hidden) at hw/virtio/virtio-bus.c:297
+#2  0x00007f3357a0055d in vhost_dev_disable_notifiers (address@hidden, 
+address@hidden) at /data/wyj/git/qemu/hw/virtio/vhost.c:1422
+#3  0x00007f33579e3373 in vhost_net_stop_one (net=0x7f335ad84dc0, 
+dev=0x7f335c6f5f90) at /data/wyj/git/qemu/hw/net/vhost_net.c:289
+#4  0x00007f33579e385b in vhost_net_stop (address@hidden, ncs=<optimized out>, 
+address@hidden) at /data/wyj/git/qemu/hw/net/vhost_net.c:367
+#5  0x00007f33579e15de in virtio_net_vhost_status (status=<optimized out>, 
+n=0x7f335c6f5f90) at /data/wyj/git/qemu/hw/net/virtio-net.c:176
+#6  virtio_net_set_status (vdev=0x7f335c6f5f90, status=0 '\000') at 
+/data/wyj/git/qemu/hw/net/virtio-net.c:250
+#7  0x00007f33579f8dc6 in virtio_set_status (address@hidden, address@hidden 
+'\000') at /data/wyj/git/qemu/hw/virtio/virtio.c:1146
+#8  0x00007f3357bdd3cc in virtio_ioport_write (val=0, addr=18, 
+opaque=0x7f335c6eda80) at hw/virtio/virtio-pci.c:387
+#9  virtio_pci_config_write (opaque=0x7f335c6eda80, addr=18, val=0, 
+size=<optimized out>) at hw/virtio/virtio-pci.c:511
+#10 0x00007f33579b2155 in memory_region_write_accessor (mr=0x7f335c6ee470, 
+addr=18, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized 
+out>, attrs=...) at /data/wyj/git/qemu/memory.c:526
+#11 0x00007f33579af2e9 in access_with_adjusted_size (address@hidden, 
+address@hidden, address@hidden, access_size_min=<optimized out>, 
+access_size_max=<optimized out>, address@hidden
+    0x7f33579b20f0 <memory_region_write_accessor>, address@hidden, 
+address@hidden) at /data/wyj/git/qemu/memory.c:592
+#12 0x00007f33579b2e15 in memory_region_dispatch_write (address@hidden, 
+address@hidden, data=0, address@hidden, address@hidden) at 
+/data/wyj/git/qemu/memory.c:1319
+#13 0x00007f335796cd93 in address_space_write_continue (mr=0x7f335c6ee470, l=1, 
+addr1=18, len=1, buf=0x7f335773d000 "", attrs=..., addr=49170, 
+as=0x7f3358317060 <address_space_io>) at /data/wyj/git/qemu/exec.c:2834
+#14 address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., 
+buf=<optimized out>, len=<optimized out>) at /data/wyj/git/qemu/exec.c:2879
+#15 0x00007f335796d3ad in address_space_rw (as=<optimized out>, address@hidden, 
+attrs=..., address@hidden, buf=<optimized out>, address@hidden, address@hidden) 
+at /data/wyj/git/qemu/exec.c:2981
+#16 0x00007f33579ae226 in kvm_handle_io (count=1, size=1, direction=<optimized 
+out>, data=<optimized out>, attrs=..., port=49170) at 
+/data/wyj/git/qemu/kvm-all.c:1803
+#17 kvm_cpu_exec (address@hidden) at /data/wyj/git/qemu/kvm-all.c:2032
+#18 0x00007f335799b632 in qemu_kvm_cpu_thread_fn (arg=0x7f335ae82070) at 
+/data/wyj/git/qemu/cpus.c:1118
+#19 0x00007f3352983dc5 in start_thread () from /usr/lib64/libpthread.so.0
+#20 0x00007f335113571d in clone () from /usr/lib64/libc.so.6
+
+It calls qemu_bh_schedule(q->tx_bh) at the bottom of virtio_net_handle_tx_bh(),
+I don't know why virtio_net_tx_bh() doesn't be invoked, so that the 
+q->tx_waiting is not zero.
+[ps: we added logs in virtio_net_tx_bh() to verify that]
+
+Some other information: 
+
+It won't crash if we don't use vhost-net.
+
+
+Thanks,
+-Gonglei
+
+>
+- Can you post a formal patch for this?
+>
+>
+Thanks
+>
+>
+> From: wangyunjian
+>
+> Sent: Monday, April 24, 2017 6:10 PM
+>
+> To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason
+>
+Wang' <address@hidden>
+>
+> Cc: wangyunjian <address@hidden>; caihe <address@hidden>
+>
+> Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd
+>
+>
+>
+> Qemu crashes, with pre-condition:
+>
+> vm xml config with multiqueue, and the vm's driver virtio-net support
+>
+multi-queue
+>
+>
+>
+> reproduce steps:
+>
+> i. start dpdk testpmd in VM with the virtio nic
+>
+> ii. stop testpmd
+>
+> iii. reboot the VM
+>
+>
+>
+> This commit "f9d6dbf0  remove virtio queues if the guest doesn't support
+>
+multiqueue" is introduced.
+>
+>
+>
+> Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
+>
+> VM DPDK version:  DPDK-1.6.1
+>
+>
+>
+> Call Trace:
+>
+> #0  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
+>
+> #1  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
+>
+> #2  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
+>
+> #3  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
+>
+> #4  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
+>
+> #5  0x00007f6088fea32c in iter_remove_or_steal () from
+>
+/usr/lib64/libglib-2.0.so.0
+>
+> #6  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800)
+>
+at qom/object.c:410
+>
+> #7  object_finalize (data=0x7f6091e74800) at qom/object.c:467
+>
+> #8  object_unref (address@hidden) at qom/object.c:903
+>
+> #9  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at
+>
+git/qemu/exec.c:1154
+>
+> #10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
+>
+> #11 address_space_dispatch_free (d=0x7f6090b72b90) at
+>
+git/qemu/exec.c:2514
+>
+> #12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at
+>
+util/rcu.c:272
+>
+> #13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
+>
+> #14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
+>
+>
+>
+> Call Trace:
+>
+> #0  0x00007fdccaeb9790 in ?? ()
+>
+> #1  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at
+>
+qom/object.c:405
+>
+> #2  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
+>
+> #3  object_unref (address@hidden) at qom/object.c:903
+>
+> #4  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at
+>
+git/qemu/exec.c:1154
+>
+> #5  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
+>
+> #6  address_space_dispatch_free (d=0x7fdcdc86a9e0) at
+>
+git/qemu/exec.c:2514
+>
+> #7  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at
+>
+util/rcu.c:272
+>
+> #8  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
+>
+> #9  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
+>
+>
+>
+>
+
+On 25/04/2017 14:02, Jason Wang wrote:
+>
+>
+Thanks a lot for the fix.
+>
+>
+Two questions:
+>
+>
+- If virtio_net_set_status() is the only function that may access tx_bh,
+>
+it looks like setting tx_waiting to zero is sufficient?
+I think clearing tx_bh is better anyway, as leaving a dangling pointer
+is not very hygienic.
+
+Paolo
+
+>
+- Can you post a formal patch for this?
+
diff --git a/results/classifier/004/other/42974450 b/results/classifier/004/other/42974450
new file mode 100644
index 00000000..05f9acca
--- /dev/null
+++ b/results/classifier/004/other/42974450
@@ -0,0 +1,437 @@
+other: 0.940
+device: 0.921
+instruction: 0.920
+semantic: 0.917
+assembly: 0.917
+boot: 0.909
+network: 0.907
+graphic: 0.906
+KVM: 0.901
+socket: 0.899
+mistranslation: 0.882
+vnc: 0.869
+
+[Bug Report] Possible Missing Endianness Conversion
+
+The virtio packed virtqueue support patch[1] suggests converting
+endianness by lines:
+
+virtio_tswap16s(vdev, &e->off_wrap);
+virtio_tswap16s(vdev, &e->flags);
+
+Though both of these conversion statements aren't present in the
+latest qemu code here[2]
+
+Is this intentional?
+
+[1]:
+https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
+[2]:
+https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
+
+CCing Jason.
+
+On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
+>
+>
+The virtio packed virtqueue support patch[1] suggests converting
+>
+endianness by lines:
+>
+>
+virtio_tswap16s(vdev, &e->off_wrap);
+>
+virtio_tswap16s(vdev, &e->flags);
+>
+>
+Though both of these conversion statements aren't present in the
+>
+latest qemu code here[2]
+>
+>
+Is this intentional?
+Good catch!
+
+It looks like it was removed (maybe by mistake) by commit
+d152cdd6f6 ("virtio: use virtio accessor to access packed event")
+
+Jason can you confirm that?
+
+Thanks,
+Stefano
+
+>
+>
+[1]:
+https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
+>
+[2]:
+https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
+>
+
+On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
+>
+>
+CCing Jason.
+>
+>
+On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
+>
+>
+>
+> The virtio packed virtqueue support patch[1] suggests converting
+>
+> endianness by lines:
+>
+>
+>
+> virtio_tswap16s(vdev, &e->off_wrap);
+>
+> virtio_tswap16s(vdev, &e->flags);
+>
+>
+>
+> Though both of these conversion statements aren't present in the
+>
+> latest qemu code here[2]
+>
+>
+>
+> Is this intentional?
+>
+>
+Good catch!
+>
+>
+It looks like it was removed (maybe by mistake) by commit
+>
+d152cdd6f6 ("virtio: use virtio accessor to access packed event")
+That commit changes from:
+
+-    address_space_read_cached(cache, off_off, &e->off_wrap,
+-                              sizeof(e->off_wrap));
+-    virtio_tswap16s(vdev, &e->off_wrap);
+
+which does a byte read of 2 bytes and then swaps the bytes
+depending on the host endianness and the value of
+virtio_access_is_big_endian()
+
+to this:
+
++    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+
+virtio_lduw_phys_cached() is a small function which calls
+either lduw_be_phys_cached() or lduw_le_phys_cached()
+depending on the value of virtio_access_is_big_endian().
+(And lduw_be_phys_cached() and lduw_le_phys_cached() do
+the right thing for the host-endianness to do a "load
+a specifically big or little endian 16-bit value".)
+
+Which is to say that because we use a load/store function that's
+explicit about the size of the data type it is accessing, the
+function itself can handle doing the load as big or little
+endian, rather than the calling code having to do a manual swap after
+it has done a load-as-bag-of-bytes. This is generally preferable
+as it's less error-prone.
+
+(Explicit swap-after-loading still has a place where the
+code is doing a load of a whole structure out of the
+guest and then swapping each struct field after the fact,
+because it means we can do a single load-from-guest-memory
+rather than a whole sequence of calls all the way down
+through the memory subsystem.)
+
+thanks
+-- PMM
+
+On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
+On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
+CCing Jason.
+
+On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
+>
+> The virtio packed virtqueue support patch[1] suggests converting
+> endianness by lines:
+>
+> virtio_tswap16s(vdev, &e->off_wrap);
+> virtio_tswap16s(vdev, &e->flags);
+>
+> Though both of these conversion statements aren't present in the
+> latest qemu code here[2]
+>
+> Is this intentional?
+
+Good catch!
+
+It looks like it was removed (maybe by mistake) by commit
+d152cdd6f6 ("virtio: use virtio accessor to access packed event")
+That commit changes from:
+
+-    address_space_read_cached(cache, off_off, &e->off_wrap,
+-                              sizeof(e->off_wrap));
+-    virtio_tswap16s(vdev, &e->off_wrap);
+
+which does a byte read of 2 bytes and then swaps the bytes
+depending on the host endianness and the value of
+virtio_access_is_big_endian()
+
+to this:
+
++    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+
+virtio_lduw_phys_cached() is a small function which calls
+either lduw_be_phys_cached() or lduw_le_phys_cached()
+depending on the value of virtio_access_is_big_endian().
+(And lduw_be_phys_cached() and lduw_le_phys_cached() do
+the right thing for the host-endianness to do a "load
+a specifically big or little endian 16-bit value".)
+
+Which is to say that because we use a load/store function that's
+explicit about the size of the data type it is accessing, the
+function itself can handle doing the load as big or little
+endian, rather than the calling code having to do a manual swap after
+it has done a load-as-bag-of-bytes. This is generally preferable
+as it's less error-prone.
+Thanks for the details!
+
+So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
+
+I mean:
+diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
+index 893a072c9d..2e5e67bdb9 100644
+--- a/hw/virtio/virtio.c
++++ b/hw/virtio/virtio.c
+@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
+     /* Make sure flags is seen before off_wrap */
+     smp_rmb();
+     e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+-    virtio_tswap16s(vdev, &e->flags);
+ }
+
+ static void vring_packed_off_wrap_write(VirtIODevice *vdev,
+
+Thanks,
+Stefano
+(Explicit swap-after-loading still has a place where the
+code is doing a load of a whole structure out of the
+guest and then swapping each struct field after the fact,
+because it means we can do a single load-from-guest-memory
+rather than a whole sequence of calls all the way down
+through the memory subsystem.)
+
+thanks
+-- PMM
+
+On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote:
+>
+>
+On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
+>
+>On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
+>
+>>
+>
+>> CCing Jason.
+>
+>>
+>
+>> On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
+>
+>> >
+>
+>> > The virtio packed virtqueue support patch[1] suggests converting
+>
+>> > endianness by lines:
+>
+>> >
+>
+>> > virtio_tswap16s(vdev, &e->off_wrap);
+>
+>> > virtio_tswap16s(vdev, &e->flags);
+>
+>> >
+>
+>> > Though both of these conversion statements aren't present in the
+>
+>> > latest qemu code here[2]
+>
+>> >
+>
+>> > Is this intentional?
+>
+>>
+>
+>> Good catch!
+>
+>>
+>
+>> It looks like it was removed (maybe by mistake) by commit
+>
+>> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
+>
+>
+>
+>That commit changes from:
+>
+>
+>
+>-    address_space_read_cached(cache, off_off, &e->off_wrap,
+>
+>-                              sizeof(e->off_wrap));
+>
+>-    virtio_tswap16s(vdev, &e->off_wrap);
+>
+>
+>
+>which does a byte read of 2 bytes and then swaps the bytes
+>
+>depending on the host endianness and the value of
+>
+>virtio_access_is_big_endian()
+>
+>
+>
+>to this:
+>
+>
+>
+>+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+>
+>
+>
+>virtio_lduw_phys_cached() is a small function which calls
+>
+>either lduw_be_phys_cached() or lduw_le_phys_cached()
+>
+>depending on the value of virtio_access_is_big_endian().
+>
+>(And lduw_be_phys_cached() and lduw_le_phys_cached() do
+>
+>the right thing for the host-endianness to do a "load
+>
+>a specifically big or little endian 16-bit value".)
+>
+>
+>
+>Which is to say that because we use a load/store function that's
+>
+>explicit about the size of the data type it is accessing, the
+>
+>function itself can handle doing the load as big or little
+>
+>endian, rather than the calling code having to do a manual swap after
+>
+>it has done a load-as-bag-of-bytes. This is generally preferable
+>
+>as it's less error-prone.
+>
+>
+Thanks for the details!
+>
+>
+So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
+>
+>
+I mean:
+>
+diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
+>
+index 893a072c9d..2e5e67bdb9 100644
+>
+--- a/hw/virtio/virtio.c
+>
++++ b/hw/virtio/virtio.c
+>
+@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
+>
+/* Make sure flags is seen before off_wrap */
+>
+smp_rmb();
+>
+e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+>
+-    virtio_tswap16s(vdev, &e->flags);
+>
+}
+That definitely looks like it's probably not correct...
+
+-- PMM
+
+On Fri, Jun 28, 2024 at 03:53:09PM GMT, Peter Maydell wrote:
+On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote:
+On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
+>On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
+>>
+>> CCing Jason.
+>>
+>> On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote:
+>> >
+>> > The virtio packed virtqueue support patch[1] suggests converting
+>> > endianness by lines:
+>> >
+>> > virtio_tswap16s(vdev, &e->off_wrap);
+>> > virtio_tswap16s(vdev, &e->flags);
+>> >
+>> > Though both of these conversion statements aren't present in the
+>> > latest qemu code here[2]
+>> >
+>> > Is this intentional?
+>>
+>> Good catch!
+>>
+>> It looks like it was removed (maybe by mistake) by commit
+>> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
+>
+>That commit changes from:
+>
+>-    address_space_read_cached(cache, off_off, &e->off_wrap,
+>-                              sizeof(e->off_wrap));
+>-    virtio_tswap16s(vdev, &e->off_wrap);
+>
+>which does a byte read of 2 bytes and then swaps the bytes
+>depending on the host endianness and the value of
+>virtio_access_is_big_endian()
+>
+>to this:
+>
+>+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+>
+>virtio_lduw_phys_cached() is a small function which calls
+>either lduw_be_phys_cached() or lduw_le_phys_cached()
+>depending on the value of virtio_access_is_big_endian().
+>(And lduw_be_phys_cached() and lduw_le_phys_cached() do
+>the right thing for the host-endianness to do a "load
+>a specifically big or little endian 16-bit value".)
+>
+>Which is to say that because we use a load/store function that's
+>explicit about the size of the data type it is accessing, the
+>function itself can handle doing the load as big or little
+>endian, rather than the calling code having to do a manual swap after
+>it has done a load-as-bag-of-bytes. This is generally preferable
+>as it's less error-prone.
+
+Thanks for the details!
+
+So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
+
+I mean:
+diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
+index 893a072c9d..2e5e67bdb9 100644
+--- a/hw/virtio/virtio.c
++++ b/hw/virtio/virtio.c
+@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
+      /* Make sure flags is seen before off_wrap */
+      smp_rmb();
+      e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
+-    virtio_tswap16s(vdev, &e->flags);
+  }
+That definitely looks like it's probably not correct...
+Yeah, I just sent that patch:
+20240701075208.19634-1-sgarzare@redhat.com
+">https://lore.kernel.org/qemu-devel/
+20240701075208.19634-1-sgarzare@redhat.com
+We can continue the discussion there.
+
+Thanks,
+Stefano
+
diff --git a/results/classifier/004/other/55247116 b/results/classifier/004/other/55247116
new file mode 100644
index 00000000..a42111a5
--- /dev/null
+++ b/results/classifier/004/other/55247116
@@ -0,0 +1,1318 @@
+other: 0.945
+assembly: 0.938
+graphic: 0.933
+socket: 0.929
+semantic: 0.928
+instruction: 0.928
+device: 0.919
+boot: 0.918
+network: 0.916
+vnc: 0.916
+KVM: 0.894
+mistranslation: 0.841
+
+[Qemu-devel]  [RFC/BUG] xen-mapcache: buggy invalidate map cache?
+
+Hi,
+
+In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
+instead of first level entry (if map to rom other than guest memory
+comes first), while in xen_invalidate_map_cache(), when VM ballooned
+out memory, qemu did not invalidate cache entries in linked
+list(entry->next), so when VM balloon back in memory, gfns probably
+mapped to different mfns, thus if guest asks device to DMA to these
+GPA, qemu may DMA to stale MFNs.
+
+So I think in xen_invalidate_map_cache() linked lists should also be
+checked and invalidated.
+
+Whatâs your opinion? Is this a bug? Is my analyze correct?
+
+On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+>
+Hi,
+>
+>
+In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
+>
+instead of first level entry (if map to rom other than guest memory
+>
+comes first), while in xen_invalidate_map_cache(), when VM ballooned
+>
+out memory, qemu did not invalidate cache entries in linked
+>
+list(entry->next), so when VM balloon back in memory, gfns probably
+>
+mapped to different mfns, thus if guest asks device to DMA to these
+>
+GPA, qemu may DMA to stale MFNs.
+>
+>
+So I think in xen_invalidate_map_cache() linked lists should also be
+>
+checked and invalidated.
+>
+>
+Whatâs your opinion? Is this a bug? Is my analyze correct?
+Added Jun Nakajima and Alexander Graf
+
+On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+>
+On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+>
+> Hi,
+>
+>
+>
+> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
+>
+> instead of first level entry (if map to rom other than guest memory
+>
+> comes first), while in xen_invalidate_map_cache(), when VM ballooned
+>
+> out memory, qemu did not invalidate cache entries in linked
+>
+> list(entry->next), so when VM balloon back in memory, gfns probably
+>
+> mapped to different mfns, thus if guest asks device to DMA to these
+>
+> GPA, qemu may DMA to stale MFNs.
+>
+>
+>
+> So I think in xen_invalidate_map_cache() linked lists should also be
+>
+> checked and invalidated.
+>
+>
+>
+> Whatâs your opinion? Is this a bug? Is my analyze correct?
+>
+>
+Added Jun Nakajima and Alexander Graf
+And correct Stefano Stabellini's email address.
+
+On Mon, 10 Apr 2017 00:36:02 +0800
+hrg <address@hidden> wrote:
+
+Hi,
+
+>
+On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+>
+> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+>
+>> Hi,
+>
+>>
+>
+>> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
+>
+>> instead of first level entry (if map to rom other than guest memory
+>
+>> comes first), while in xen_invalidate_map_cache(), when VM ballooned
+>
+>> out memory, qemu did not invalidate cache entries in linked
+>
+>> list(entry->next), so when VM balloon back in memory, gfns probably
+>
+>> mapped to different mfns, thus if guest asks device to DMA to these
+>
+>> GPA, qemu may DMA to stale MFNs.
+>
+>>
+>
+>> So I think in xen_invalidate_map_cache() linked lists should also be
+>
+>> checked and invalidated.
+>
+>>
+>
+>> Whatâs your opinion? Is this a bug? Is my analyze correct?
+>
+>
+>
+> Added Jun Nakajima and Alexander Graf
+>
+And correct Stefano Stabellini's email address.
+There is a real issue with the xen-mapcache corruption in fact. I encountered
+it a few months ago while experimenting with Q35 support on Xen. Q35 emulation
+uses an AHCI controller by default, along with NCQ mode enabled. The issue can
+be (somewhat) easily reproduced there, though using a normal i440 emulation
+might possibly allow to reproduce the issue as well, using a dedicated test
+code from a guest side. In case of Q35+NCQ the issue can be reproduced "as is".
+
+The issue occurs when a guest domain performs an intensive disk I/O, ex. while
+guest OS booting. QEMU crashes with "Bad ram offset 980aa000"
+message logged, where the address is different each time. The hard thing with
+this issue is that it has a very low reproducibility rate.
+
+The corruption happens when there are multiple I/O commands in the NCQ queue.
+So there are overlapping emulated DMA operations in flight and QEMU uses a
+sequence of mapcache actions which can be executed in the "wrong" order thus
+leading to an inconsistent xen-mapcache - so a bad address from the wrong
+entry is returned.
+
+The bad thing with this issue is that QEMU crash due to "Bad ram offset"
+appearance is a relatively good situation in the sense that this is a caught
+error. But there might be a much worse (artificial) situation where the returned
+address looks valid but points to a different mapped memory.
+
+The fix itself is not hard (ex. an additional checked field in MapCacheEntry),
+but there is a need of some reliable way to test it considering the low
+reproducibility rate.
+
+Regards,
+Alex
+
+On Mon, 10 Apr 2017, hrg wrote:
+>
+On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+>
+> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+>
+>> Hi,
+>
+>>
+>
+>> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
+>
+>> instead of first level entry (if map to rom other than guest memory
+>
+>> comes first), while in xen_invalidate_map_cache(), when VM ballooned
+>
+>> out memory, qemu did not invalidate cache entries in linked
+>
+>> list(entry->next), so when VM balloon back in memory, gfns probably
+>
+>> mapped to different mfns, thus if guest asks device to DMA to these
+>
+>> GPA, qemu may DMA to stale MFNs.
+>
+>>
+>
+>> So I think in xen_invalidate_map_cache() linked lists should also be
+>
+>> checked and invalidated.
+>
+>>
+>
+>> Whatâs your opinion? Is this a bug? Is my analyze correct?
+Yes, you are right. We need to go through the list for each element of
+the array in xen_invalidate_map_cache. Can you come up with a patch?
+
+On Mon, 10 Apr 2017, Stefano Stabellini wrote:
+>
+On Mon, 10 Apr 2017, hrg wrote:
+>
+> On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+>
+> > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+>
+> >> Hi,
+>
+> >>
+>
+> >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
+>
+> >> instead of first level entry (if map to rom other than guest memory
+>
+> >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
+>
+> >> out memory, qemu did not invalidate cache entries in linked
+>
+> >> list(entry->next), so when VM balloon back in memory, gfns probably
+>
+> >> mapped to different mfns, thus if guest asks device to DMA to these
+>
+> >> GPA, qemu may DMA to stale MFNs.
+>
+> >>
+>
+> >> So I think in xen_invalidate_map_cache() linked lists should also be
+>
+> >> checked and invalidated.
+>
+> >>
+>
+> >> Whatâs your opinion? Is this a bug? Is my analyze correct?
+>
+>
+Yes, you are right. We need to go through the list for each element of
+>
+the array in xen_invalidate_map_cache. Can you come up with a patch?
+I spoke too soon. In the regular case there should be no locked mappings
+when xen_invalidate_map_cache is called (see the DPRINTF warning at the
+beginning of the functions). Without locked mappings, there should never
+be more than one element in each list (see xen_map_cache_unlocked:
+entry->lock == true is a necessary condition to append a new entry to
+the list, otherwise it is just remapped).
+
+Can you confirm that what you are seeing are locked mappings
+when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
+by turning it into a printf or by defininig MAPCACHE_DEBUG.
+
+On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
+<address@hidden> wrote:
+>
+On Mon, 10 Apr 2017, Stefano Stabellini wrote:
+>
+> On Mon, 10 Apr 2017, hrg wrote:
+>
+> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+>
+> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+>
+> > >> Hi,
+>
+> > >>
+>
+> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
+>
+> > >> instead of first level entry (if map to rom other than guest memory
+>
+> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
+>
+> > >> out memory, qemu did not invalidate cache entries in linked
+>
+> > >> list(entry->next), so when VM balloon back in memory, gfns probably
+>
+> > >> mapped to different mfns, thus if guest asks device to DMA to these
+>
+> > >> GPA, qemu may DMA to stale MFNs.
+>
+> > >>
+>
+> > >> So I think in xen_invalidate_map_cache() linked lists should also be
+>
+> > >> checked and invalidated.
+>
+> > >>
+>
+> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
+>
+>
+>
+> Yes, you are right. We need to go through the list for each element of
+>
+> the array in xen_invalidate_map_cache. Can you come up with a patch?
+>
+>
+I spoke too soon. In the regular case there should be no locked mappings
+>
+when xen_invalidate_map_cache is called (see the DPRINTF warning at the
+>
+beginning of the functions). Without locked mappings, there should never
+>
+be more than one element in each list (see xen_map_cache_unlocked:
+>
+entry->lock == true is a necessary condition to append a new entry to
+>
+the list, otherwise it is just remapped).
+>
+>
+Can you confirm that what you are seeing are locked mappings
+>
+when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
+>
+by turning it into a printf or by defininig MAPCACHE_DEBUG.
+In fact, I think the DPRINTF above is incorrect too. In
+pci_add_option_rom(), rtl8139 rom is locked mapped in
+pci_add_option_rom->memory_region_get_ram_ptr (after
+memory_region_init_ram). So actually I think we should remove the
+DPRINTF warning as it is normal.
+
+On Tue, 11 Apr 2017, hrg wrote:
+>
+On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
+>
+<address@hidden> wrote:
+>
+> On Mon, 10 Apr 2017, Stefano Stabellini wrote:
+>
+>> On Mon, 10 Apr 2017, hrg wrote:
+>
+>> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+>
+>> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+>
+>> > >> Hi,
+>
+>> > >>
+>
+>> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
+>
+>> > >> instead of first level entry (if map to rom other than guest memory
+>
+>> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
+>
+>> > >> out memory, qemu did not invalidate cache entries in linked
+>
+>> > >> list(entry->next), so when VM balloon back in memory, gfns probably
+>
+>> > >> mapped to different mfns, thus if guest asks device to DMA to these
+>
+>> > >> GPA, qemu may DMA to stale MFNs.
+>
+>> > >>
+>
+>> > >> So I think in xen_invalidate_map_cache() linked lists should also be
+>
+>> > >> checked and invalidated.
+>
+>> > >>
+>
+>> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
+>
+>>
+>
+>> Yes, you are right. We need to go through the list for each element of
+>
+>> the array in xen_invalidate_map_cache. Can you come up with a patch?
+>
+>
+>
+> I spoke too soon. In the regular case there should be no locked mappings
+>
+> when xen_invalidate_map_cache is called (see the DPRINTF warning at the
+>
+> beginning of the functions). Without locked mappings, there should never
+>
+> be more than one element in each list (see xen_map_cache_unlocked:
+>
+> entry->lock == true is a necessary condition to append a new entry to
+>
+> the list, otherwise it is just remapped).
+>
+>
+>
+> Can you confirm that what you are seeing are locked mappings
+>
+> when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
+>
+> by turning it into a printf or by defininig MAPCACHE_DEBUG.
+>
+>
+In fact, I think the DPRINTF above is incorrect too. In
+>
+pci_add_option_rom(), rtl8139 rom is locked mapped in
+>
+pci_add_option_rom->memory_region_get_ram_ptr (after
+>
+memory_region_init_ram). So actually I think we should remove the
+>
+DPRINTF warning as it is normal.
+Let me explain why the DPRINTF warning is there: emulated dma operations
+can involve locked mappings. Once a dma operation completes, the related
+mapping is unlocked and can be safely destroyed. But if we destroy a
+locked mapping in xen_invalidate_map_cache, while a dma is still
+ongoing, QEMU will crash. We cannot handle that case.
+
+However, the scenario you described is different. It has nothing to do
+with DMA. It looks like pci_add_option_rom calls
+memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
+locked mapping and it is never unlocked or destroyed.
+
+It looks like "ptr" is not used after pci_add_option_rom returns. Does
+the append patch fix the problem you are seeing? For the proper fix, I
+think we probably need some sort of memory_region_unmap wrapper or maybe
+a call to address_space_unmap.
+
+
+diff --git a/hw/pci/pci.c b/hw/pci/pci.c
+index e6b08e1..04f98b7 100644
+--- a/hw/pci/pci.c
++++ b/hw/pci/pci.c
+@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
+is_default_rom,
+     }
+ 
+     pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
++    xen_invalidate_map_cache_entry(ptr);
+ }
+ 
+ static void pci_del_option_rom(PCIDevice *pdev)
+
+On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
+Stefano Stabellini <address@hidden> wrote:
+
+>
+On Tue, 11 Apr 2017, hrg wrote:
+>
+> On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
+>
+> <address@hidden> wrote:
+>
+> > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
+>
+> >> On Mon, 10 Apr 2017, hrg wrote:
+>
+> >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+>
+> >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+>
+> >> > >> Hi,
+>
+> >> > >>
+>
+> >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in
+>
+> >> > >> entry->next instead of first level entry (if map to rom other than
+>
+> >> > >> guest memory comes first), while in xen_invalidate_map_cache(),
+>
+> >> > >> when VM ballooned out memory, qemu did not invalidate cache entries
+>
+> >> > >> in linked list(entry->next), so when VM balloon back in memory,
+>
+> >> > >> gfns probably mapped to different mfns, thus if guest asks device
+>
+> >> > >> to DMA to these GPA, qemu may DMA to stale MFNs.
+>
+> >> > >>
+>
+> >> > >> So I think in xen_invalidate_map_cache() linked lists should also be
+>
+> >> > >> checked and invalidated.
+>
+> >> > >>
+>
+> >> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
+>
+> >>
+>
+> >> Yes, you are right. We need to go through the list for each element of
+>
+> >> the array in xen_invalidate_map_cache. Can you come up with a patch?
+>
+> >
+>
+> > I spoke too soon. In the regular case there should be no locked mappings
+>
+> > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
+>
+> > beginning of the functions). Without locked mappings, there should never
+>
+> > be more than one element in each list (see xen_map_cache_unlocked:
+>
+> > entry->lock == true is a necessary condition to append a new entry to
+>
+> > the list, otherwise it is just remapped).
+>
+> >
+>
+> > Can you confirm that what you are seeing are locked mappings
+>
+> > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
+>
+> > by turning it into a printf or by defininig MAPCACHE_DEBUG.
+>
+>
+>
+> In fact, I think the DPRINTF above is incorrect too. In
+>
+> pci_add_option_rom(), rtl8139 rom is locked mapped in
+>
+> pci_add_option_rom->memory_region_get_ram_ptr (after
+>
+> memory_region_init_ram). So actually I think we should remove the
+>
+> DPRINTF warning as it is normal.
+>
+>
+Let me explain why the DPRINTF warning is there: emulated dma operations
+>
+can involve locked mappings. Once a dma operation completes, the related
+>
+mapping is unlocked and can be safely destroyed. But if we destroy a
+>
+locked mapping in xen_invalidate_map_cache, while a dma is still
+>
+ongoing, QEMU will crash. We cannot handle that case.
+>
+>
+However, the scenario you described is different. It has nothing to do
+>
+with DMA. It looks like pci_add_option_rom calls
+>
+memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
+>
+locked mapping and it is never unlocked or destroyed.
+>
+>
+It looks like "ptr" is not used after pci_add_option_rom returns. Does
+>
+the append patch fix the problem you are seeing? For the proper fix, I
+>
+think we probably need some sort of memory_region_unmap wrapper or maybe
+>
+a call to address_space_unmap.
+Hmm, for some reason my message to the Xen-devel list got rejected but was sent
+to Qemu-devel instead, without any notice. Sorry if I'm missing something
+obvious as a list newbie.
+
+Stefano, hrg,
+
+There is an issue with inconsistency between the list of normal MapCacheEntry's
+and their 'reverse' counterparts - MapCacheRev's in locked_entries.
+When bad situation happens, there are multiple (locked) MapCacheEntry
+entries in the bucket's linked list along with a number of MapCacheRev's. And
+when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
+first list and calculates a wrong pointer from it which may then be caught with
+the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
+this issue as well I think.
+
+I'll try to provide a test code which can reproduce the issue from the
+guest side using an emulated IDE controller, though it's much simpler to achieve
+this result with an AHCI controller using multiple NCQ I/O commands. So far I've
+seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O
+DMA should be enough I think.
+
+On 2017/4/12 14:17, Alexey G wrote:
+On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
+Stefano Stabellini <address@hidden> wrote:
+On Tue, 11 Apr 2017, hrg wrote:
+On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
+<address@hidden> wrote:
+On Mon, 10 Apr 2017, Stefano Stabellini wrote:
+On Mon, 10 Apr 2017, hrg wrote:
+On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+Hi,
+
+In xen_map_cache_unlocked(), map to guest memory maybe in
+entry->next instead of first level entry (if map to rom other than
+guest memory comes first), while in xen_invalidate_map_cache(),
+when VM ballooned out memory, qemu did not invalidate cache entries
+in linked list(entry->next), so when VM balloon back in memory,
+gfns probably mapped to different mfns, thus if guest asks device
+to DMA to these GPA, qemu may DMA to stale MFNs.
+
+So I think in xen_invalidate_map_cache() linked lists should also be
+checked and invalidated.
+
+Whatâs your opinion? Is this a bug? Is my analyze correct?
+Yes, you are right. We need to go through the list for each element of
+the array in xen_invalidate_map_cache. Can you come up with a patch?
+I spoke too soon. In the regular case there should be no locked mappings
+when xen_invalidate_map_cache is called (see the DPRINTF warning at the
+beginning of the functions). Without locked mappings, there should never
+be more than one element in each list (see xen_map_cache_unlocked:
+entry->lock == true is a necessary condition to append a new entry to
+the list, otherwise it is just remapped).
+
+Can you confirm that what you are seeing are locked mappings
+when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
+by turning it into a printf or by defininig MAPCACHE_DEBUG.
+In fact, I think the DPRINTF above is incorrect too. In
+pci_add_option_rom(), rtl8139 rom is locked mapped in
+pci_add_option_rom->memory_region_get_ram_ptr (after
+memory_region_init_ram). So actually I think we should remove the
+DPRINTF warning as it is normal.
+Let me explain why the DPRINTF warning is there: emulated dma operations
+can involve locked mappings. Once a dma operation completes, the related
+mapping is unlocked and can be safely destroyed. But if we destroy a
+locked mapping in xen_invalidate_map_cache, while a dma is still
+ongoing, QEMU will crash. We cannot handle that case.
+
+However, the scenario you described is different. It has nothing to do
+with DMA. It looks like pci_add_option_rom calls
+memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
+locked mapping and it is never unlocked or destroyed.
+
+It looks like "ptr" is not used after pci_add_option_rom returns. Does
+the append patch fix the problem you are seeing? For the proper fix, I
+think we probably need some sort of memory_region_unmap wrapper or maybe
+a call to address_space_unmap.
+Hmm, for some reason my message to the Xen-devel list got rejected but was sent
+to Qemu-devel instead, without any notice. Sorry if I'm missing something
+obvious as a list newbie.
+
+Stefano, hrg,
+
+There is an issue with inconsistency between the list of normal MapCacheEntry's
+and their 'reverse' counterparts - MapCacheRev's in locked_entries.
+When bad situation happens, there are multiple (locked) MapCacheEntry
+entries in the bucket's linked list along with a number of MapCacheRev's. And
+when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
+first list and calculates a wrong pointer from it which may then be caught with
+the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
+this issue as well I think.
+
+I'll try to provide a test code which can reproduce the issue from the
+guest side using an emulated IDE controller, though it's much simpler to achieve
+this result with an AHCI controller using multiple NCQ I/O commands. So far I've
+seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O
+DMA should be enough I think.
+Yes, I think there may be other bugs lurking, considering the complexity, 
+though we need to reproduce it if we want to delve into it.
+
+On Wed, 12 Apr 2017, Alexey G wrote:
+>
+On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
+>
+Stefano Stabellini <address@hidden> wrote:
+>
+>
+> On Tue, 11 Apr 2017, hrg wrote:
+>
+> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
+>
+> > <address@hidden> wrote:
+>
+> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
+>
+> > >> On Mon, 10 Apr 2017, hrg wrote:
+>
+> > >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+>
+> > >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+>
+> > >> > >> Hi,
+>
+> > >> > >>
+>
+> > >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in
+>
+> > >> > >> entry->next instead of first level entry (if map to rom other than
+>
+> > >> > >> guest memory comes first), while in xen_invalidate_map_cache(),
+>
+> > >> > >> when VM ballooned out memory, qemu did not invalidate cache
+>
+> > >> > >> entries
+>
+> > >> > >> in linked list(entry->next), so when VM balloon back in memory,
+>
+> > >> > >> gfns probably mapped to different mfns, thus if guest asks device
+>
+> > >> > >> to DMA to these GPA, qemu may DMA to stale MFNs.
+>
+> > >> > >>
+>
+> > >> > >> So I think in xen_invalidate_map_cache() linked lists should also
+>
+> > >> > >> be
+>
+> > >> > >> checked and invalidated.
+>
+> > >> > >>
+>
+> > >> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
+>
+> > >>
+>
+> > >> Yes, you are right. We need to go through the list for each element of
+>
+> > >> the array in xen_invalidate_map_cache. Can you come up with a patch?
+>
+> > >
+>
+> > > I spoke too soon. In the regular case there should be no locked mappings
+>
+> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
+>
+> > > beginning of the functions). Without locked mappings, there should never
+>
+> > > be more than one element in each list (see xen_map_cache_unlocked:
+>
+> > > entry->lock == true is a necessary condition to append a new entry to
+>
+> > > the list, otherwise it is just remapped).
+>
+> > >
+>
+> > > Can you confirm that what you are seeing are locked mappings
+>
+> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
+>
+> > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
+>
+> >
+>
+> > In fact, I think the DPRINTF above is incorrect too. In
+>
+> > pci_add_option_rom(), rtl8139 rom is locked mapped in
+>
+> > pci_add_option_rom->memory_region_get_ram_ptr (after
+>
+> > memory_region_init_ram). So actually I think we should remove the
+>
+> > DPRINTF warning as it is normal.
+>
+>
+>
+> Let me explain why the DPRINTF warning is there: emulated dma operations
+>
+> can involve locked mappings. Once a dma operation completes, the related
+>
+> mapping is unlocked and can be safely destroyed. But if we destroy a
+>
+> locked mapping in xen_invalidate_map_cache, while a dma is still
+>
+> ongoing, QEMU will crash. We cannot handle that case.
+>
+>
+>
+> However, the scenario you described is different. It has nothing to do
+>
+> with DMA. It looks like pci_add_option_rom calls
+>
+> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
+>
+> locked mapping and it is never unlocked or destroyed.
+>
+>
+>
+> It looks like "ptr" is not used after pci_add_option_rom returns. Does
+>
+> the append patch fix the problem you are seeing? For the proper fix, I
+>
+> think we probably need some sort of memory_region_unmap wrapper or maybe
+>
+> a call to address_space_unmap.
+>
+>
+Hmm, for some reason my message to the Xen-devel list got rejected but was
+>
+sent
+>
+to Qemu-devel instead, without any notice. Sorry if I'm missing something
+>
+obvious as a list newbie.
+>
+>
+Stefano, hrg,
+>
+>
+There is an issue with inconsistency between the list of normal
+>
+MapCacheEntry's
+>
+and their 'reverse' counterparts - MapCacheRev's in locked_entries.
+>
+When bad situation happens, there are multiple (locked) MapCacheEntry
+>
+entries in the bucket's linked list along with a number of MapCacheRev's. And
+>
+when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
+>
+first list and calculates a wrong pointer from it which may then be caught
+>
+with
+>
+the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
+>
+this issue as well I think.
+>
+>
+I'll try to provide a test code which can reproduce the issue from the
+>
+guest side using an emulated IDE controller, though it's much simpler to
+>
+achieve
+>
+this result with an AHCI controller using multiple NCQ I/O commands. So far
+>
+I've
+>
+seen this issue only with Windows 7 (and above) guest on AHCI, but any block
+>
+I/O
+>
+DMA should be enough I think.
+That would be helpful. Please see if you can reproduce it after fixing
+the other issue (
+http://marc.info/?l=qemu-devel&m=149195042500707&w=2
+).
+
+On 2017/4/12 6:32, Stefano Stabellini wrote:
+On Tue, 11 Apr 2017, hrg wrote:
+On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
+<address@hidden> wrote:
+On Mon, 10 Apr 2017, Stefano Stabellini wrote:
+On Mon, 10 Apr 2017, hrg wrote:
+On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+Hi,
+
+In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
+instead of first level entry (if map to rom other than guest memory
+comes first), while in xen_invalidate_map_cache(), when VM ballooned
+out memory, qemu did not invalidate cache entries in linked
+list(entry->next), so when VM balloon back in memory, gfns probably
+mapped to different mfns, thus if guest asks device to DMA to these
+GPA, qemu may DMA to stale MFNs.
+
+So I think in xen_invalidate_map_cache() linked lists should also be
+checked and invalidated.
+
+Whatâs your opinion? Is this a bug? Is my analyze correct?
+Yes, you are right. We need to go through the list for each element of
+the array in xen_invalidate_map_cache. Can you come up with a patch?
+I spoke too soon. In the regular case there should be no locked mappings
+when xen_invalidate_map_cache is called (see the DPRINTF warning at the
+beginning of the functions). Without locked mappings, there should never
+be more than one element in each list (see xen_map_cache_unlocked:
+entry->lock == true is a necessary condition to append a new entry to
+the list, otherwise it is just remapped).
+
+Can you confirm that what you are seeing are locked mappings
+when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
+by turning it into a printf or by defininig MAPCACHE_DEBUG.
+In fact, I think the DPRINTF above is incorrect too. In
+pci_add_option_rom(), rtl8139 rom is locked mapped in
+pci_add_option_rom->memory_region_get_ram_ptr (after
+memory_region_init_ram). So actually I think we should remove the
+DPRINTF warning as it is normal.
+Let me explain why the DPRINTF warning is there: emulated dma operations
+can involve locked mappings. Once a dma operation completes, the related
+mapping is unlocked and can be safely destroyed. But if we destroy a
+locked mapping in xen_invalidate_map_cache, while a dma is still
+ongoing, QEMU will crash. We cannot handle that case.
+
+However, the scenario you described is different. It has nothing to do
+with DMA. It looks like pci_add_option_rom calls
+memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
+locked mapping and it is never unlocked or destroyed.
+
+It looks like "ptr" is not used after pci_add_option_rom returns. Does
+the append patch fix the problem you are seeing? For the proper fix, I
+think we probably need some sort of memory_region_unmap wrapper or maybe
+a call to address_space_unmap.
+Yes, I think so, maybe this is the proper way to fix this.
+diff --git a/hw/pci/pci.c b/hw/pci/pci.c
+index e6b08e1..04f98b7 100644
+--- a/hw/pci/pci.c
++++ b/hw/pci/pci.c
+@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
+is_default_rom,
+      }
+pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
++    xen_invalidate_map_cache_entry(ptr);
+  }
+static void pci_del_option_rom(PCIDevice *pdev)
+
+On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
+>
+On 2017/4/12 6:32, Stefano Stabellini wrote:
+>
+> On Tue, 11 Apr 2017, hrg wrote:
+>
+> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
+>
+> > <address@hidden> wrote:
+>
+> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
+>
+> > > > On Mon, 10 Apr 2017, hrg wrote:
+>
+> > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+>
+> > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+>
+> > > > > > > Hi,
+>
+> > > > > > >
+>
+> > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in
+>
+> > > > > > > entry->next
+>
+> > > > > > > instead of first level entry (if map to rom other than guest
+>
+> > > > > > > memory
+>
+> > > > > > > comes first), while in xen_invalidate_map_cache(), when VM
+>
+> > > > > > > ballooned
+>
+> > > > > > > out memory, qemu did not invalidate cache entries in linked
+>
+> > > > > > > list(entry->next), so when VM balloon back in memory, gfns
+>
+> > > > > > > probably
+>
+> > > > > > > mapped to different mfns, thus if guest asks device to DMA to
+>
+> > > > > > > these
+>
+> > > > > > > GPA, qemu may DMA to stale MFNs.
+>
+> > > > > > >
+>
+> > > > > > > So I think in xen_invalidate_map_cache() linked lists should
+>
+> > > > > > > also be
+>
+> > > > > > > checked and invalidated.
+>
+> > > > > > >
+>
+> > > > > > > Whatâs your opinion? Is this a bug? Is my analyze correct?
+>
+> > > > Yes, you are right. We need to go through the list for each element of
+>
+> > > > the array in xen_invalidate_map_cache. Can you come up with a patch?
+>
+> > > I spoke too soon. In the regular case there should be no locked mappings
+>
+> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
+>
+> > > beginning of the functions). Without locked mappings, there should never
+>
+> > > be more than one element in each list (see xen_map_cache_unlocked:
+>
+> > > entry->lock == true is a necessary condition to append a new entry to
+>
+> > > the list, otherwise it is just remapped).
+>
+> > >
+>
+> > > Can you confirm that what you are seeing are locked mappings
+>
+> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
+>
+> > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
+>
+> > In fact, I think the DPRINTF above is incorrect too. In
+>
+> > pci_add_option_rom(), rtl8139 rom is locked mapped in
+>
+> > pci_add_option_rom->memory_region_get_ram_ptr (after
+>
+> > memory_region_init_ram). So actually I think we should remove the
+>
+> > DPRINTF warning as it is normal.
+>
+> Let me explain why the DPRINTF warning is there: emulated dma operations
+>
+> can involve locked mappings. Once a dma operation completes, the related
+>
+> mapping is unlocked and can be safely destroyed. But if we destroy a
+>
+> locked mapping in xen_invalidate_map_cache, while a dma is still
+>
+> ongoing, QEMU will crash. We cannot handle that case.
+>
+>
+>
+> However, the scenario you described is different. It has nothing to do
+>
+> with DMA. It looks like pci_add_option_rom calls
+>
+> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
+>
+> locked mapping and it is never unlocked or destroyed.
+>
+>
+>
+> It looks like "ptr" is not used after pci_add_option_rom returns. Does
+>
+> the append patch fix the problem you are seeing? For the proper fix, I
+>
+> think we probably need some sort of memory_region_unmap wrapper or maybe
+>
+> a call to address_space_unmap.
+>
+>
+Yes, I think so, maybe this is the proper way to fix this.
+Would you be up for sending a proper patch and testing it? We cannot call
+xen_invalidate_map_cache_entry directly from pci.c though, it would need
+to be one of the other functions like address_space_unmap for example.
+
+
+>
+> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
+>
+> index e6b08e1..04f98b7 100644
+>
+> --- a/hw/pci/pci.c
+>
+> +++ b/hw/pci/pci.c
+>
+> @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool
+>
+> is_default_rom,
+>
+>       }
+>
+>         pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
+>
+> +    xen_invalidate_map_cache_entry(ptr);
+>
+>   }
+>
+>     static void pci_del_option_rom(PCIDevice *pdev)
+
+On 2017/4/13 7:51, Stefano Stabellini wrote:
+On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
+On 2017/4/12 6:32, Stefano Stabellini wrote:
+On Tue, 11 Apr 2017, hrg wrote:
+On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
+<address@hidden> wrote:
+On Mon, 10 Apr 2017, Stefano Stabellini wrote:
+On Mon, 10 Apr 2017, hrg wrote:
+On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
+On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
+Hi,
+
+In xen_map_cache_unlocked(), map to guest memory maybe in
+entry->next
+instead of first level entry (if map to rom other than guest
+memory
+comes first), while in xen_invalidate_map_cache(), when VM
+ballooned
+out memory, qemu did not invalidate cache entries in linked
+list(entry->next), so when VM balloon back in memory, gfns
+probably
+mapped to different mfns, thus if guest asks device to DMA to
+these
+GPA, qemu may DMA to stale MFNs.
+
+So I think in xen_invalidate_map_cache() linked lists should
+also be
+checked and invalidated.
+
+Whatâs your opinion? Is this a bug? Is my analyze correct?
+Yes, you are right. We need to go through the list for each element of
+the array in xen_invalidate_map_cache. Can you come up with a patch?
+I spoke too soon. In the regular case there should be no locked mappings
+when xen_invalidate_map_cache is called (see the DPRINTF warning at the
+beginning of the functions). Without locked mappings, there should never
+be more than one element in each list (see xen_map_cache_unlocked:
+entry->lock == true is a necessary condition to append a new entry to
+the list, otherwise it is just remapped).
+
+Can you confirm that what you are seeing are locked mappings
+when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
+by turning it into a printf or by defininig MAPCACHE_DEBUG.
+In fact, I think the DPRINTF above is incorrect too. In
+pci_add_option_rom(), rtl8139 rom is locked mapped in
+pci_add_option_rom->memory_region_get_ram_ptr (after
+memory_region_init_ram). So actually I think we should remove the
+DPRINTF warning as it is normal.
+Let me explain why the DPRINTF warning is there: emulated dma operations
+can involve locked mappings. Once a dma operation completes, the related
+mapping is unlocked and can be safely destroyed. But if we destroy a
+locked mapping in xen_invalidate_map_cache, while a dma is still
+ongoing, QEMU will crash. We cannot handle that case.
+
+However, the scenario you described is different. It has nothing to do
+with DMA. It looks like pci_add_option_rom calls
+memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
+locked mapping and it is never unlocked or destroyed.
+
+It looks like "ptr" is not used after pci_add_option_rom returns. Does
+the append patch fix the problem you are seeing? For the proper fix, I
+think we probably need some sort of memory_region_unmap wrapper or maybe
+a call to address_space_unmap.
+Yes, I think so, maybe this is the proper way to fix this.
+Would you be up for sending a proper patch and testing it? We cannot call
+xen_invalidate_map_cache_entry directly from pci.c though, it would need
+to be one of the other functions like address_space_unmap for example.
+Yes, I will look into this.
+diff --git a/hw/pci/pci.c b/hw/pci/pci.c
+index e6b08e1..04f98b7 100644
+--- a/hw/pci/pci.c
++++ b/hw/pci/pci.c
+@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool
+is_default_rom,
+       }
+         pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
++    xen_invalidate_map_cache_entry(ptr);
+   }
+     static void pci_del_option_rom(PCIDevice *pdev)
+
+On Thu, 13 Apr 2017, Herongguang (Stephen) wrote:
+>
+On 2017/4/13 7:51, Stefano Stabellini wrote:
+>
+> On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
+>
+> > On 2017/4/12 6:32, Stefano Stabellini wrote:
+>
+> > > On Tue, 11 Apr 2017, hrg wrote:
+>
+> > > > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
+>
+> > > > <address@hidden> wrote:
+>
+> > > > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
+>
+> > > > > > On Mon, 10 Apr 2017, hrg wrote:
+>
+> > > > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden>
+>
+> > > > > > > wrote:
+>
+> > > > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden>
+>
+> > > > > > > > wrote:
+>
+> > > > > > > > > Hi,
+>
+> > > > > > > > >
+>
+> > > > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in
+>
+> > > > > > > > > entry->next
+>
+> > > > > > > > > instead of first level entry (if map to rom other than guest
+>
+> > > > > > > > > memory
+>
+> > > > > > > > > comes first), while in xen_invalidate_map_cache(), when VM
+>
+> > > > > > > > > ballooned
+>
+> > > > > > > > > out memory, qemu did not invalidate cache entries in linked
+>
+> > > > > > > > > list(entry->next), so when VM balloon back in memory, gfns
+>
+> > > > > > > > > probably
+>
+> > > > > > > > > mapped to different mfns, thus if guest asks device to DMA
+>
+> > > > > > > > > to
+>
+> > > > > > > > > these
+>
+> > > > > > > > > GPA, qemu may DMA to stale MFNs.
+>
+> > > > > > > > >
+>
+> > > > > > > > > So I think in xen_invalidate_map_cache() linked lists should
+>
+> > > > > > > > > also be
+>
+> > > > > > > > > checked and invalidated.
+>
+> > > > > > > > >
+>
+> > > > > > > > > Whatâs your opinion? Is this a bug? Is my analyze correct?
+>
+> > > > > > Yes, you are right. We need to go through the list for each
+>
+> > > > > > element of
+>
+> > > > > > the array in xen_invalidate_map_cache. Can you come up with a
+>
+> > > > > > patch?
+>
+> > > > > I spoke too soon. In the regular case there should be no locked
+>
+> > > > > mappings
+>
+> > > > > when xen_invalidate_map_cache is called (see the DPRINTF warning at
+>
+> > > > > the
+>
+> > > > > beginning of the functions). Without locked mappings, there should
+>
+> > > > > never
+>
+> > > > > be more than one element in each list (see xen_map_cache_unlocked:
+>
+> > > > > entry->lock == true is a necessary condition to append a new entry
+>
+> > > > > to
+>
+> > > > > the list, otherwise it is just remapped).
+>
+> > > > >
+>
+> > > > > Can you confirm that what you are seeing are locked mappings
+>
+> > > > > when xen_invalidate_map_cache is called? To find out, enable the
+>
+> > > > > DPRINTK
+>
+> > > > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
+>
+> > > > In fact, I think the DPRINTF above is incorrect too. In
+>
+> > > > pci_add_option_rom(), rtl8139 rom is locked mapped in
+>
+> > > > pci_add_option_rom->memory_region_get_ram_ptr (after
+>
+> > > > memory_region_init_ram). So actually I think we should remove the
+>
+> > > > DPRINTF warning as it is normal.
+>
+> > > Let me explain why the DPRINTF warning is there: emulated dma operations
+>
+> > > can involve locked mappings. Once a dma operation completes, the related
+>
+> > > mapping is unlocked and can be safely destroyed. But if we destroy a
+>
+> > > locked mapping in xen_invalidate_map_cache, while a dma is still
+>
+> > > ongoing, QEMU will crash. We cannot handle that case.
+>
+> > >
+>
+> > > However, the scenario you described is different. It has nothing to do
+>
+> > > with DMA. It looks like pci_add_option_rom calls
+>
+> > > memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
+>
+> > > locked mapping and it is never unlocked or destroyed.
+>
+> > >
+>
+> > > It looks like "ptr" is not used after pci_add_option_rom returns. Does
+>
+> > > the append patch fix the problem you are seeing? For the proper fix, I
+>
+> > > think we probably need some sort of memory_region_unmap wrapper or maybe
+>
+> > > a call to address_space_unmap.
+>
+> >
+>
+> > Yes, I think so, maybe this is the proper way to fix this.
+>
+>
+>
+> Would you be up for sending a proper patch and testing it? We cannot call
+>
+> xen_invalidate_map_cache_entry directly from pci.c though, it would need
+>
+> to be one of the other functions like address_space_unmap for example.
+>
+>
+>
+>
+>
+Yes, I will look into this.
+Any updates?
+
+
+>
+> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
+>
+> > > index e6b08e1..04f98b7 100644
+>
+> > > --- a/hw/pci/pci.c
+>
+> > > +++ b/hw/pci/pci.c
+>
+> > > @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev,
+>
+> > > bool
+>
+> > > is_default_rom,
+>
+> > >        }
+>
+> > >          pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
+>
+> > > +    xen_invalidate_map_cache_entry(ptr);
+>
+> > >    }
+>
+> > >      static void pci_del_option_rom(PCIDevice *pdev)
+>
+
diff --git a/results/classifier/004/other/55367348 b/results/classifier/004/other/55367348
new file mode 100644
index 00000000..39f042ab
--- /dev/null
+++ b/results/classifier/004/other/55367348
@@ -0,0 +1,540 @@
+other: 0.626
+mistranslation: 0.615
+device: 0.586
+instruction: 0.572
+semantic: 0.555
+graphic: 0.532
+assembly: 0.531
+network: 0.518
+socket: 0.501
+boot: 0.486
+KVM: 0.470
+vnc: 0.465
+
+[Qemu-devel] [Bug] Docs build fails at interop.rst
+
+https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
+running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
+(Rawhide)
+
+uname - a
+Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
+UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
+
+Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
+allows for the build to occur
+
+Regards
+Aarushi Mehta
+
+On 5/20/19 7:30 AM, Aarushi Mehta wrote:
+>
+https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
+>
+running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
+>
+(Rawhide)
+>
+>
+uname - a
+>
+Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
+>
+UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
+>
+>
+Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
+>
+allows for the build to occur
+>
+>
+Regards
+>
+Aarushi Mehta
+>
+>
+Ah, dang. The blocks aren't strictly conforming json, but the version I
+tested this under didn't seem to care. Your version is much newer. (I
+was using 1.7 as provided by Fedora 29.)
+
+For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
+which should at least turn off the "warnings as errors" option, but I
+don't think that reverting -n will turn off this warning.
+
+I'll try to get ahold of this newer version and see if I can't fix it
+more appropriately.
+
+--js
+
+On 5/20/19 12:37 PM, John Snow wrote:
+>
+>
+>
+On 5/20/19 7:30 AM, Aarushi Mehta wrote:
+>
+>
+https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
+>
+> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
+>
+> (Rawhide)
+>
+>
+>
+> uname - a
+>
+> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
+>
+> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
+>
+>
+>
+> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
+>
+> allows for the build to occur
+>
+>
+>
+> Regards
+>
+> Aarushi Mehta
+>
+>
+>
+>
+>
+>
+Ah, dang. The blocks aren't strictly conforming json, but the version I
+>
+tested this under didn't seem to care. Your version is much newer. (I
+>
+was using 1.7 as provided by Fedora 29.)
+>
+>
+For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
+>
+which should at least turn off the "warnings as errors" option, but I
+>
+don't think that reverting -n will turn off this warning.
+>
+>
+I'll try to get ahold of this newer version and see if I can't fix it
+>
+more appropriately.
+>
+>
+--js
+>
+...Sigh, okay.
+
+So, I am still not actually sure what changed from pygments 2.2 and
+sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx
+by default always tries to do add a filter to the pygments lexer that
+raises an error on highlighting failure, instead of the default behavior
+which is to just highlight those errors in the output. There is no
+option to Sphinx that I am aware of to retain this lexing behavior.
+(Effectively, it's strict or nothing.)
+
+This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the
+build works with our malformed json.
+
+There are a few options:
+
+1. Update conf.py to ignore these warnings (and all future lexing
+errors), and settle for the fact that there will be no QMP highlighting
+wherever we use the directionality indicators ('->', '<-').
+
+2. Update bitmaps.rst to remove the directionality indicators.
+
+3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON.
+
+4. Update bitmaps.rst to remove the "json" specification from the code
+block. This will cause sphinx to "guess" the formatting, and the
+pygments guesser will decide it's Python3.
+
+This will parse well enough, but will mis-highlight 'true' and 'false'
+which are not python keywords. This approach may break in the future if
+the Python3 lexer is upgraded to be stricter (because '->' and '<-' are
+still invalid), and leaves us at the mercy of both the guesser and the
+lexer.
+
+I'm not actually sure what I dislike the least; I think I dislike #1 the
+most. #4 gets us most of what we want but is perhaps porcelain.
+
+I suspect if we attempt to move more of our documentation to ReST and
+Sphinx that we will need to answer for ourselves how we intend to
+document QMP code flow examples.
+
+--js
+
+On Mon, May 20, 2019 at 05:25:28PM -0400, John Snow wrote:
+>
+>
+>
+On 5/20/19 12:37 PM, John Snow wrote:
+>
+>
+>
+>
+>
+> On 5/20/19 7:30 AM, Aarushi Mehta wrote:
+>
+>>
+https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
+>
+>> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
+>
+>> (Rawhide)
+>
+>>
+>
+>> uname - a
+>
+>> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
+>
+>> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
+>
+>>
+>
+>> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
+>
+>> allows for the build to occur
+>
+>>
+>
+>> Regards
+>
+>> Aarushi Mehta
+>
+>>
+>
+>>
+>
+>
+>
+> Ah, dang. The blocks aren't strictly conforming json, but the version I
+>
+> tested this under didn't seem to care. Your version is much newer. (I
+>
+> was using 1.7 as provided by Fedora 29.)
+>
+>
+>
+> For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
+>
+> which should at least turn off the "warnings as errors" option, but I
+>
+> don't think that reverting -n will turn off this warning.
+>
+>
+>
+> I'll try to get ahold of this newer version and see if I can't fix it
+>
+> more appropriately.
+>
+>
+>
+> --js
+>
+>
+>
+>
+...Sigh, okay.
+>
+>
+So, I am still not actually sure what changed from pygments 2.2 and
+>
+sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx
+>
+by default always tries to do add a filter to the pygments lexer that
+>
+raises an error on highlighting failure, instead of the default behavior
+>
+which is to just highlight those errors in the output. There is no
+>
+option to Sphinx that I am aware of to retain this lexing behavior.
+>
+(Effectively, it's strict or nothing.)
+>
+>
+This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the
+>
+build works with our malformed json.
+>
+>
+There are a few options:
+>
+>
+1. Update conf.py to ignore these warnings (and all future lexing
+>
+errors), and settle for the fact that there will be no QMP highlighting
+>
+wherever we use the directionality indicators ('->', '<-').
+>
+>
+2. Update bitmaps.rst to remove the directionality indicators.
+>
+>
+3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON.
+>
+>
+4. Update bitmaps.rst to remove the "json" specification from the code
+>
+block. This will cause sphinx to "guess" the formatting, and the
+>
+pygments guesser will decide it's Python3.
+>
+>
+This will parse well enough, but will mis-highlight 'true' and 'false'
+>
+which are not python keywords. This approach may break in the future if
+>
+the Python3 lexer is upgraded to be stricter (because '->' and '<-' are
+>
+still invalid), and leaves us at the mercy of both the guesser and the
+>
+lexer.
+>
+>
+I'm not actually sure what I dislike the least; I think I dislike #1 the
+>
+most. #4 gets us most of what we want but is perhaps porcelain.
+>
+>
+I suspect if we attempt to move more of our documentation to ReST and
+>
+Sphinx that we will need to answer for ourselves how we intend to
+>
+document QMP code flow examples.
+Writing a custom lexer that handles "<-" and "->" was simple (see below).
+
+Now, is it possible to convince Sphinx to register and use a custom lexer?
+
+$ cat > /tmp/lexer.py <<EOF
+from pygments.lexer import RegexLexer, DelegatingLexer
+from pygments.lexers.data import JsonLexer
+import re
+from pygments.token import *
+
+class QMPExampleMarkersLexer(RegexLexer):
+    tokens = {
+        'root': [
+            (r' *-> *', Generic.Prompt),
+            (r' *<- *', Generic.Output),
+        ]
+    }
+
+class QMPExampleLexer(DelegatingLexer):
+    def __init__(self, **options):
+        super(QMPExampleLexer, self).__init__(JsonLexer, 
+QMPExampleMarkersLexer, Error, **options)
+EOF
+$ pygmentize -l /tmp/lexer.py:QMPExampleLexer -x -f html <<EOF
+    -> {
+         "execute": "drive-backup",
+         "arguments": {
+           "device": "drive0",
+           "bitmap": "bitmap0",
+           "target": "drive0.inc0.qcow2",
+           "format": "qcow2",
+           "sync": "incremental",
+           "mode": "existing"
+         }
+       }
+
+    <- { "return": {} }
+EOF
+<div class="highlight"><pre><span></span><span class="gp">    -&gt; 
+</span><span class="p">{</span>
+         <span class="nt">&quot;execute&quot;</span><span class="p">:</span> 
+<span class="s2">&quot;drive-backup&quot;</span><span class="p">,</span>
+         <span class="nt">&quot;arguments&quot;</span><span class="p">:</span> 
+<span class="p">{</span>
+           <span class="nt">&quot;device&quot;</span><span class="p">:</span> 
+<span class="s2">&quot;drive0&quot;</span><span class="p">,</span>
+           <span class="nt">&quot;bitmap&quot;</span><span class="p">:</span> 
+<span class="s2">&quot;bitmap0&quot;</span><span class="p">,</span>
+           <span class="nt">&quot;target&quot;</span><span class="p">:</span> 
+<span class="s2">&quot;drive0.inc0.qcow2&quot;</span><span class="p">,</span>
+           <span class="nt">&quot;format&quot;</span><span class="p">:</span> 
+<span class="s2">&quot;qcow2&quot;</span><span class="p">,</span>
+           <span class="nt">&quot;sync&quot;</span><span class="p">:</span> 
+<span class="s2">&quot;incremental&quot;</span><span class="p">,</span>
+           <span class="nt">&quot;mode&quot;</span><span class="p">:</span> 
+<span class="s2">&quot;existing&quot;</span>
+         <span class="p">}</span>
+       <span class="p">}</span>
+
+<span class="go">    &lt;- </span><span class="p">{</span> <span 
+class="nt">&quot;return&quot;</span><span class="p">:</span> <span 
+class="p">{}</span> <span class="p">}</span>
+</pre></div>
+$ 
+
+
+-- 
+Eduardo
+
+On 5/20/19 7:04 PM, Eduardo Habkost wrote:
+>
+On Mon, May 20, 2019 at 05:25:28PM -0400, John Snow wrote:
+>
+>
+>
+>
+>
+> On 5/20/19 12:37 PM, John Snow wrote:
+>
+>>
+>
+>>
+>
+>> On 5/20/19 7:30 AM, Aarushi Mehta wrote:
+>
+>>>
+https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
+>
+>>> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
+>
+>>> (Rawhide)
+>
+>>>
+>
+>>> uname - a
+>
+>>> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
+>
+>>> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
+>
+>>>
+>
+>>> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
+>
+>>> allows for the build to occur
+>
+>>>
+>
+>>> Regards
+>
+>>> Aarushi Mehta
+>
+>>>
+>
+>>>
+>
+>>
+>
+>> Ah, dang. The blocks aren't strictly conforming json, but the version I
+>
+>> tested this under didn't seem to care. Your version is much newer. (I
+>
+>> was using 1.7 as provided by Fedora 29.)
+>
+>>
+>
+>> For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
+>
+>> which should at least turn off the "warnings as errors" option, but I
+>
+>> don't think that reverting -n will turn off this warning.
+>
+>>
+>
+>> I'll try to get ahold of this newer version and see if I can't fix it
+>
+>> more appropriately.
+>
+>>
+>
+>> --js
+>
+>>
+>
+>
+>
+> ...Sigh, okay.
+>
+>
+>
+> So, I am still not actually sure what changed from pygments 2.2 and
+>
+> sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx
+>
+> by default always tries to do add a filter to the pygments lexer that
+>
+> raises an error on highlighting failure, instead of the default behavior
+>
+> which is to just highlight those errors in the output. There is no
+>
+> option to Sphinx that I am aware of to retain this lexing behavior.
+>
+> (Effectively, it's strict or nothing.)
+>
+>
+>
+> This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the
+>
+> build works with our malformed json.
+>
+>
+>
+> There are a few options:
+>
+>
+>
+> 1. Update conf.py to ignore these warnings (and all future lexing
+>
+> errors), and settle for the fact that there will be no QMP highlighting
+>
+> wherever we use the directionality indicators ('->', '<-').
+>
+>
+>
+> 2. Update bitmaps.rst to remove the directionality indicators.
+>
+>
+>
+> 3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON.
+>
+>
+>
+> 4. Update bitmaps.rst to remove the "json" specification from the code
+>
+> block. This will cause sphinx to "guess" the formatting, and the
+>
+> pygments guesser will decide it's Python3.
+>
+>
+>
+> This will parse well enough, but will mis-highlight 'true' and 'false'
+>
+> which are not python keywords. This approach may break in the future if
+>
+> the Python3 lexer is upgraded to be stricter (because '->' and '<-' are
+>
+> still invalid), and leaves us at the mercy of both the guesser and the
+>
+> lexer.
+>
+>
+>
+> I'm not actually sure what I dislike the least; I think I dislike #1 the
+>
+> most. #4 gets us most of what we want but is perhaps porcelain.
+>
+>
+>
+> I suspect if we attempt to move more of our documentation to ReST and
+>
+> Sphinx that we will need to answer for ourselves how we intend to
+>
+> document QMP code flow examples.
+>
+>
+Writing a custom lexer that handles "<-" and "->" was simple (see below).
+>
+>
+Now, is it possible to convince Sphinx to register and use a custom lexer?
+>
+Spoilers, yes, and I've sent a patch to list. Thanks for your help!
+
diff --git a/results/classifier/004/other/55753058 b/results/classifier/004/other/55753058
new file mode 100644
index 00000000..ff699cdc
--- /dev/null
+++ b/results/classifier/004/other/55753058
@@ -0,0 +1,301 @@
+other: 0.734
+KVM: 0.713
+vnc: 0.682
+mistranslation: 0.649
+graphic: 0.630
+device: 0.623
+instruction: 0.581
+semantic: 0.577
+assembly: 0.529
+network: 0.525
+boot: 0.478
+socket: 0.462
+
+[RESEND][BUG FIX HELP] QEMU main thread endlessly hangs in __ppoll()
+
+Hi Genius,
+I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may still
+exist in the mainline.
+Thanks in advance to heroes who can take a look and share understanding.
+
+The qemu main thread endlessly hangs in the handle of the qmp statement:
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
+'drive_del replication0' } }
+and we have the call trace looks like:
+#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
+timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
+sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
+#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
+__nfds=<optimized out>, __fds=<optimized out>)
+at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
+#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
+timeout=<optimized out>) at util/qemu-timer.c:348
+#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
+blocking=blocking@entry=true) at util/aio-posix.c:669
+#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
+ignore_bds_parents=false, parent=0x0, recursive=false,
+bs=0x55561138b0a0) at block/io.c:430
+#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
+parent=0x0, ignore_bds_parents=<optimized out>,
+poll=<optimized out>) at block/io.c:396
+#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
+child=0x7f36dc0ce380, errp=<optimized out>)
+at block/quorum.c:1063
+#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
+"colo-disk0", has_child=<optimized out>,
+child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
+errp=0x7ffc56c66f98) at blockdev.c:4494
+#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
+out>, ret=<optimized out>, errp=0x7ffc56c67018)
+at qapi/qapi-commands-block-core.c:1538
+#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
+allow_oob=<optimized out>, request=<optimized out>,
+cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
+#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
+out>, allow_oob=<optimized out>)
+at qapi/qmp-dispatch.c:175
+#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
+req=<optimized out>) at monitor/qmp.c:145
+#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized out>)
+at monitor/qmp.c:234
+#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
+util/async.c:117
+#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
+#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
+util/aio-posix.c:459
+#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
+callback=<optimized out>, user_data=<optimized out>)
+at util/async.c:260
+#17 0x00007f3c22302fbd in g_main_context_dispatch () from
+/lib/x86_64-linux-gnu/libglib-2.0.so.0
+#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
+#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
+#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
+#21 0x000055560ff600fe in main_loop () at vl.c:1814
+#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized out>,
+envp=<optimized out>) at vl.c:4503
+We found that we're doing endless check in the line of
+block/io.c:bdrv_do_drained_begin():
+BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
+and it turns out that the bdrv_drain_poll() always get true from:
+- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
+- AND atomic_read(&bs->in_flight)
+
+I personally think this is a deadlock issue in the a QEMU block layer
+(as we know, we have some #FIXME comments in related codes, such as block
+permisson update).
+Any comments are welcome and appreciated.
+
+---
+thx,likexu
+
+On 2/28/21 9:39 PM, Like Xu wrote:
+Hi Genius,
+I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
+still exist in the mainline.
+Thanks in advance to heroes who can take a look and share understanding.
+Do you have a test case that reproduces on 5.2? It'd be nice to know if
+it was still a problem in the latest source tree or not.
+--js
+The qemu main thread endlessly hangs in the handle of the qmp statement:
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
+'drive_del replication0' } }
+and we have the call trace looks like:
+#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
+timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
+sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
+#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
+__nfds=<optimized out>, __fds=<optimized out>)
+at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
+#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
+timeout=<optimized out>) at util/qemu-timer.c:348
+#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
+blocking=blocking@entry=true) at util/aio-posix.c:669
+#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
+ignore_bds_parents=false, parent=0x0, recursive=false,
+bs=0x55561138b0a0) at block/io.c:430
+#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
+parent=0x0, ignore_bds_parents=<optimized out>,
+poll=<optimized out>) at block/io.c:396
+#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
+child=0x7f36dc0ce380, errp=<optimized out>)
+at block/quorum.c:1063
+#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
+"colo-disk0", has_child=<optimized out>,
+child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
+errp=0x7ffc56c66f98) at blockdev.c:4494
+#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
+out>, ret=<optimized out>, errp=0x7ffc56c67018)
+at qapi/qapi-commands-block-core.c:1538
+#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
+allow_oob=<optimized out>, request=<optimized out>,
+cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
+#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
+out>, allow_oob=<optimized out>)
+at qapi/qmp-dispatch.c:175
+#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
+req=<optimized out>) at monitor/qmp.c:145
+#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized
+out>) at monitor/qmp.c:234
+#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
+util/async.c:117
+#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
+#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
+util/aio-posix.c:459
+#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
+callback=<optimized out>, user_data=<optimized out>)
+at util/async.c:260
+#17 0x00007f3c22302fbd in g_main_context_dispatch () from
+/lib/x86_64-linux-gnu/libglib-2.0.so.0
+#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
+#19 os_host_main_loop_wait (timeout=<optimized out>) at
+util/main-loop.c:242
+#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
+#21 0x000055560ff600fe in main_loop () at vl.c:1814
+#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized
+out>, envp=<optimized out>) at vl.c:4503
+We found that we're doing endless check in the line of
+block/io.c:bdrv_do_drained_begin():
+Â Â Â Â BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
+and it turns out that the bdrv_drain_poll() always get true from:
+- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
+- AND atomic_read(&bs->in_flight)
+
+I personally think this is a deadlock issue in the a QEMU block layer
+(as we know, we have some #FIXME comments in related codes, such as
+block permisson update).
+Any comments are welcome and appreciated.
+
+---
+thx,likexu
+
+Hi John,
+
+Thanks for your comment.
+
+On 2021/3/5 7:53, John Snow wrote:
+On 2/28/21 9:39 PM, Like Xu wrote:
+Hi Genius,
+I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
+still exist in the mainline.
+Thanks in advance to heroes who can take a look and share understanding.
+Do you have a test case that reproduces on 5.2? It'd be nice to know if it
+was still a problem in the latest source tree or not.
+We narrowed down the source of the bug, which basically came from
+the following qmp usage:
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
+'drive_del replication0' } }
+One of the test cases is the COLO usage (docs/colo-proxy.txt).
+
+This issue is sporadic,the probability may be 1/15 for a io-heavy guest.
+
+I believe it's reproducible on 5.2 and the latest tree.
+--js
+The qemu main thread endlessly hangs in the handle of the qmp statement:
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
+'drive_del replication0' } }
+and we have the call trace looks like:
+#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
+timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
+sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
+#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
+__nfds=<optimized out>, __fds=<optimized out>)
+at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
+#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
+timeout=<optimized out>) at util/qemu-timer.c:348
+#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
+blocking=blocking@entry=true) at util/aio-posix.c:669
+#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
+ignore_bds_parents=false, parent=0x0, recursive=false,
+bs=0x55561138b0a0) at block/io.c:430
+#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
+parent=0x0, ignore_bds_parents=<optimized out>,
+poll=<optimized out>) at block/io.c:396
+#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
+child=0x7f36dc0ce380, errp=<optimized out>)
+at block/quorum.c:1063
+#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
+"colo-disk0", has_child=<optimized out>,
+child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
+errp=0x7ffc56c66f98) at blockdev.c:4494
+#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
+out>, ret=<optimized out>, errp=0x7ffc56c67018)
+at qapi/qapi-commands-block-core.c:1538
+#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
+allow_oob=<optimized out>, request=<optimized out>,
+cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
+#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
+out>, allow_oob=<optimized out>)
+at qapi/qmp-dispatch.c:175
+#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
+req=<optimized out>) at monitor/qmp.c:145
+#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized
+out>) at monitor/qmp.c:234
+#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
+util/async.c:117
+#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
+#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
+util/aio-posix.c:459
+#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
+callback=<optimized out>, user_data=<optimized out>)
+at util/async.c:260
+#17 0x00007f3c22302fbd in g_main_context_dispatch () from
+/lib/x86_64-linux-gnu/libglib-2.0.so.0
+#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
+#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
+#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
+#21 0x000055560ff600fe in main_loop () at vl.c:1814
+#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized
+out>, envp=<optimized out>) at vl.c:4503
+We found that we're doing endless check in the line of
+block/io.c:bdrv_do_drained_begin():
+Â Â Â Â Â BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
+and it turns out that the bdrv_drain_poll() always get true from:
+- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
+- AND atomic_read(&bs->in_flight)
+
+I personally think this is a deadlock issue in the a QEMU block layer
+(as we know, we have some #FIXME comments in related codes, such as block
+permisson update).
+Any comments are welcome and appreciated.
+
+---
+thx,likexu
+
+On 3/4/21 10:08 PM, Like Xu wrote:
+Hi John,
+
+Thanks for your comment.
+
+On 2021/3/5 7:53, John Snow wrote:
+On 2/28/21 9:39 PM, Like Xu wrote:
+Hi Genius,
+I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
+still exist in the mainline.
+Thanks in advance to heroes who can take a look and share understanding.
+Do you have a test case that reproduces on 5.2? It'd be nice to know
+if it was still a problem in the latest source tree or not.
+We narrowed down the source of the bug, which basically came from
+the following qmp usage:
+{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
+'drive_del replication0' } }
+One of the test cases is the COLO usage (docs/colo-proxy.txt).
+
+This issue is sporadic,the probability may be 1/15 for a io-heavy guest.
+
+I believe it's reproducible on 5.2 and the latest tree.
+Can you please test and confirm that this is the case, and then file a
+bug report on the LP:
+https://launchpad.net/qemu
+and include:
+- The exact commit you used (current origin/master debug build would be
+the most ideal.)
+- Which QEMU binary you are using (qemu-system-x86_64?)
+- The shortest command line you are aware of that reproduces the problem
+- The host OS and kernel version
+- An updated call trace
+- Any relevant commands issued prior to the one that caused the hang; or
+detailed reproduction steps if possible.
+Thanks,
+--js
+
diff --git a/results/classifier/004/other/56309929 b/results/classifier/004/other/56309929
new file mode 100644
index 00000000..2efc040b
--- /dev/null
+++ b/results/classifier/004/other/56309929
@@ -0,0 +1,188 @@
+other: 0.690
+device: 0.646
+KVM: 0.636
+assembly: 0.618
+vnc: 0.600
+network: 0.589
+instruction: 0.581
+boot: 0.578
+graphic: 0.570
+mistranslation: 0.554
+semantic: 0.521
+socket: 0.516
+
+[Qemu-devel] [BUG 2.6] Broken CONFIG_TPM?
+
+A compilation test with clang -Weverything reported this problem:
+
+config-host.h:112:20: warning: '$' in identifier
+[-Wdollar-in-identifier-extension]
+
+The line of code looks like this:
+
+#define CONFIG_TPM $(CONFIG_SOFTMMU)
+
+This is fine for Makefile code, but won't work as expected in C code.
+
+Am 28.04.2016 um 22:33 schrieb Stefan Weil:
+>
+A compilation test with clang -Weverything reported this problem:
+>
+>
+config-host.h:112:20: warning: '$' in identifier
+>
+[-Wdollar-in-identifier-extension]
+>
+>
+The line of code looks like this:
+>
+>
+#define CONFIG_TPM $(CONFIG_SOFTMMU)
+>
+>
+This is fine for Makefile code, but won't work as expected in C code.
+>
+A complete 64 bit build with clang -Weverything creates a log file of
+1.7 GB.
+Here are the uniq warnings sorted by their frequency:
+
+      1 -Wflexible-array-extensions
+      1 -Wgnu-folding-constant
+      1 -Wunknown-pragmas
+      1 -Wunknown-warning-option
+      1 -Wunreachable-code-loop-increment
+      2 -Warray-bounds-pointer-arithmetic
+      2 -Wdollar-in-identifier-extension
+      3 -Woverlength-strings
+      3 -Wweak-vtables
+      4 -Wgnu-empty-struct
+      4 -Wstring-conversion
+      6 -Wclass-varargs
+      7 -Wc99-extensions
+      7 -Wc++-compat
+      8 -Wfloat-equal
+     11 -Wformat-nonliteral
+     16 -Wshift-negative-value
+     19 -Wglobal-constructors
+     28 -Wc++11-long-long
+     29 -Wembedded-directive
+     38 -Wvla
+     40 -Wcovered-switch-default
+     40 -Wmissing-variable-declarations
+     49 -Wold-style-cast
+     53 -Wgnu-conditional-omitted-operand
+     56 -Wformat-pedantic
+     61 -Wvariadic-macros
+     77 -Wc++11-extensions
+     83 -Wgnu-flexible-array-initializer
+     83 -Wzero-length-array
+     96 -Wgnu-designator
+    102 -Wmissing-noreturn
+    103 -Wconditional-uninitialized
+    107 -Wdisabled-macro-expansion
+    115 -Wunreachable-code-return
+    134 -Wunreachable-code
+    243 -Wunreachable-code-break
+    257 -Wfloat-conversion
+    280 -Wswitch-enum
+    291 -Wpointer-arith
+    298 -Wshadow
+    378 -Wassign-enum
+    395 -Wused-but-marked-unused
+    420 -Wreserved-id-macro
+    493 -Wdocumentation
+    510 -Wshift-sign-overflow
+    565 -Wgnu-case-range
+    566 -Wgnu-zero-variadic-macro-arguments
+    650 -Wbad-function-cast
+    705 -Wmissing-field-initializers
+    817 -Wgnu-statement-expression
+    968 -Wdocumentation-unknown-command
+   1021 -Wextra-semi
+   1112 -Wgnu-empty-initializer
+   1138 -Wcast-qual
+   1509 -Wcast-align
+   1766 -Wextended-offsetof
+   1937 -Wsign-compare
+   2130 -Wpacked
+   2404 -Wunused-macros
+   3081 -Wpadded
+   4182 -Wconversion
+   5430 -Wlanguage-extension-token
+   6655 -Wshorten-64-to-32
+   6995 -Wpedantic
+   7354 -Wunused-parameter
+  27659 -Wsign-conversion
+
+Stefan Weil <address@hidden> writes:
+
+>
+A compilation test with clang -Weverything reported this problem:
+>
+>
+config-host.h:112:20: warning: '$' in identifier
+>
+[-Wdollar-in-identifier-extension]
+>
+>
+The line of code looks like this:
+>
+>
+#define CONFIG_TPM $(CONFIG_SOFTMMU)
+>
+>
+This is fine for Makefile code, but won't work as expected in C code.
+Broken in commit 3b8acc1 "configure: fix TPM logic".  Cc'ing Paolo.
+
+Impact: #ifdef CONFIG_TPM never disables code.  There are no other uses
+of CONFIG_TPM in C code.
+
+I had a quick peek at configure and create_config, but refrained from
+attempting to fix this, since I don't understand when exactly CONFIG_TPM
+should be defined.
+
+On 29 April 2016 at 08:42, Markus Armbruster <address@hidden> wrote:
+>
+Stefan Weil <address@hidden> writes:
+>
+>
+> A compilation test with clang -Weverything reported this problem:
+>
+>
+>
+> config-host.h:112:20: warning: '$' in identifier
+>
+> [-Wdollar-in-identifier-extension]
+>
+>
+>
+> The line of code looks like this:
+>
+>
+>
+> #define CONFIG_TPM $(CONFIG_SOFTMMU)
+>
+>
+>
+> This is fine for Makefile code, but won't work as expected in C code.
+>
+>
+Broken in commit 3b8acc1 "configure: fix TPM logic".  Cc'ing Paolo.
+>
+>
+Impact: #ifdef CONFIG_TPM never disables code.  There are no other uses
+>
+of CONFIG_TPM in C code.
+>
+>
+I had a quick peek at configure and create_config, but refrained from
+>
+attempting to fix this, since I don't understand when exactly CONFIG_TPM
+>
+should be defined.
+Looking at 'git blame' suggests this has been wrong like this for
+some years, so we don't need to scramble to fix it for 2.6.
+
+thanks
+-- PMM
+
diff --git a/results/classifier/004/other/56937788 b/results/classifier/004/other/56937788
new file mode 100644
index 00000000..e4043d1f
--- /dev/null
+++ b/results/classifier/004/other/56937788
@@ -0,0 +1,352 @@
+other: 0.791
+KVM: 0.755
+vnc: 0.743
+mistranslation: 0.735
+graphic: 0.720
+semantic: 0.705
+device: 0.697
+instruction: 0.653
+assembly: 0.638
+boot: 0.636
+network: 0.633
+socket: 0.613
+
+[Qemu-devel] [Bug] virtio-blk: qemu will crash if hotplug virtio-blk device failed
+
+I found that hotplug virtio-blk device will lead to qemu crash.
+
+Re-production steps:
+
+1.       Run VM named vm001
+
+2.       Create a virtio-blk.xml which contains wrong configurations:
+<disk device="lun" rawio="yes" type="block">
+  <driver cache="none" io="native" name="qemu" type="raw" />
+  <source dev="/dev/mapper/11-dm" />
+  <target bus="virtio" dev="vdx" />
+</disk>
+
+3.       Run command : virsh attach-device vm001 vm001
+
+Libvirt will return err msg:
+
+error: Failed to attach device from blk-scsi.xml
+
+error: internal error: unable to execute QEMU command 'device_add': Please set 
+scsi=off for virtio-blk devices in order to use virtio 1.0
+
+it means hotplug virtio-blk device failed.
+
+4.       Suspend or shutdown VM will leads to qemu crash
+
+
+
+from gdb:
+
+
+(gdb) bt
+#0  object_get_class (address@hidden) at qom/object.c:750
+#1  0x00007f9a72582e01 in virtio_vmstate_change (opaque=0x7f9a73d10960, 
+running=0, state=<optimized out>) at 
+/mnt/sdb/lzc/code/open/qemu/hw/virtio/virtio.c:2203
+#2  0x00007f9a7261ef52 in vm_state_notify (address@hidden, address@hidden) at 
+vl.c:1685
+#3  0x00007f9a7252603a in do_vm_stop (state=RUN_STATE_PAUSED) at 
+/mnt/sdb/lzc/code/open/qemu/cpus.c:941
+#4  vm_stop (address@hidden) at /mnt/sdb/lzc/code/open/qemu/cpus.c:1807
+#5  0x00007f9a7262eb1b in qmp_stop (address@hidden) at qmp.c:102
+#6  0x00007f9a7262c70a in qmp_marshal_stop (args=<optimized out>, 
+ret=<optimized out>, errp=0x7ffe63e255d8) at qmp-marshal.c:5854
+#7  0x00007f9a72897e79 in do_qmp_dispatch (errp=0x7ffe63e255d0, 
+request=0x7f9a76510120, cmds=0x7f9a72ee7980 <qmp_commands>) at 
+qapi/qmp-dispatch.c:104
+#8  qmp_dispatch (cmds=0x7f9a72ee7980 <qmp_commands>, address@hidden) at 
+qapi/qmp-dispatch.c:131
+#9  0x00007f9a725288d5 in handle_qmp_command (parser=<optimized out>, 
+tokens=<optimized out>) at /mnt/sdb/lzc/code/open/qemu/monitor.c:3852
+#10 0x00007f9a7289d514 in json_message_process_token (lexer=0x7f9a73ce4498, 
+input=0x7f9a73cc6880, type=JSON_RCURLY, x=36, y=17) at 
+qobject/json-streamer.c:105
+#11 0x00007f9a728bb69b in json_lexer_feed_char (address@hidden, ch=125 '}', 
+address@hidden) at qobject/json-lexer.c:323
+#12 0x00007f9a728bb75e in json_lexer_feed (lexer=0x7f9a73ce4498, 
+buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:373
+#13 0x00007f9a7289d5d9 in json_message_parser_feed (parser=<optimized out>, 
+buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:124
+#14 0x00007f9a7252722e in monitor_qmp_read (opaque=<optimized out>, 
+buf=<optimized out>, size=<optimized out>) at 
+/mnt/sdb/lzc/code/open/qemu/monitor.c:3894
+#15 0x00007f9a7284ee1b in tcp_chr_read (chan=<optimized out>, cond=<optimized 
+out>, opaque=<optimized out>) at chardev/char-socket.c:441
+#16 0x00007f9a6e03e99a in g_main_context_dispatch () from 
+/usr/lib64/libglib-2.0.so.0
+#17 0x00007f9a728a342c in glib_pollfds_poll () at util/main-loop.c:214
+#18 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
+#19 main_loop_wait (address@hidden) at util/main-loop.c:515
+#20 0x00007f9a724e7547 in main_loop () at vl.c:1999
+#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at 
+vl.c:4877
+
+Problem happens in virtio_vmstate_change which is called by vm_state_notify,
+static void virtio_vmstate_change(void *opaque, int running, RunState state)
+{
+    VirtIODevice *vdev = opaque;
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+    bool backend_run = running && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK);
+    vdev->vm_running = running;
+
+    if (backend_run) {
+        virtio_set_status(vdev, vdev->status);
+    }
+
+    if (k->vmstate_change) {
+        k->vmstate_change(qbus->parent, backend_run);
+    }
+
+    if (!backend_run) {
+        virtio_set_status(vdev, vdev->status);
+    }
+}
+
+Vdev's parent_bus is NULL, so qdev_get_parent_bus(DEVICE(vdev)) will crash.
+virtio_vmstate_change is added to the list vm_change_state_head at 
+virtio_blk_device_realize(virtio_init),
+but after hotplug virtio-blk failed, virtio_vmstate_change will not be removed 
+from vm_change_state_head.
+
+
+I apply a patch as follews:
+
+diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
+index 5884ce3..ea532dc 100644
+--- a/hw/virtio/virtio.c
++++ b/hw/virtio/virtio.c
+@@ -2491,6 +2491,7 @@ static void virtio_device_realize(DeviceState *dev, Error 
+**errp)
+     virtio_bus_device_plugged(vdev, &err);
+     if (err != NULL) {
+         error_propagate(errp, err);
++        vdc->unrealize(dev, NULL);
+         return;
+     }
+
+On Tue, Oct 31, 2017 at 05:19:08AM +0000, linzhecheng wrote:
+>
+I found that hotplug virtio-blk device will lead to qemu crash.
+The author posted a patch in a separate email thread.  Please see
+"[PATCH] fix: unrealize virtio device if we fail to hotplug it".
+
+>
+Re-production steps:
+>
+>
+1.       Run VM named vm001
+>
+>
+2.       Create a virtio-blk.xml which contains wrong configurations:
+>
+<disk device="lun" rawio="yes" type="block">
+>
+<driver cache="none" io="native" name="qemu" type="raw" />
+>
+<source dev="/dev/mapper/11-dm" />
+>
+<target bus="virtio" dev="vdx" />
+>
+</disk>
+>
+>
+3.       Run command : virsh attach-device vm001 vm001
+>
+>
+Libvirt will return err msg:
+>
+>
+error: Failed to attach device from blk-scsi.xml
+>
+>
+error: internal error: unable to execute QEMU command 'device_add': Please
+>
+set scsi=off for virtio-blk devices in order to use virtio 1.0
+>
+>
+it means hotplug virtio-blk device failed.
+>
+>
+4.       Suspend or shutdown VM will leads to qemu crash
+>
+>
+>
+>
+from gdb:
+>
+>
+>
+(gdb) bt
+>
+#0  object_get_class (address@hidden) at qom/object.c:750
+>
+#1  0x00007f9a72582e01 in virtio_vmstate_change (opaque=0x7f9a73d10960,
+>
+running=0, state=<optimized out>) at
+>
+/mnt/sdb/lzc/code/open/qemu/hw/virtio/virtio.c:2203
+>
+#2  0x00007f9a7261ef52 in vm_state_notify (address@hidden, address@hidden) at
+>
+vl.c:1685
+>
+#3  0x00007f9a7252603a in do_vm_stop (state=RUN_STATE_PAUSED) at
+>
+/mnt/sdb/lzc/code/open/qemu/cpus.c:941
+>
+#4  vm_stop (address@hidden) at /mnt/sdb/lzc/code/open/qemu/cpus.c:1807
+>
+#5  0x00007f9a7262eb1b in qmp_stop (address@hidden) at qmp.c:102
+>
+#6  0x00007f9a7262c70a in qmp_marshal_stop (args=<optimized out>,
+>
+ret=<optimized out>, errp=0x7ffe63e255d8) at qmp-marshal.c:5854
+>
+#7  0x00007f9a72897e79 in do_qmp_dispatch (errp=0x7ffe63e255d0,
+>
+request=0x7f9a76510120, cmds=0x7f9a72ee7980 <qmp_commands>) at
+>
+qapi/qmp-dispatch.c:104
+>
+#8  qmp_dispatch (cmds=0x7f9a72ee7980 <qmp_commands>, address@hidden) at
+>
+qapi/qmp-dispatch.c:131
+>
+#9  0x00007f9a725288d5 in handle_qmp_command (parser=<optimized out>,
+>
+tokens=<optimized out>) at /mnt/sdb/lzc/code/open/qemu/monitor.c:3852
+>
+#10 0x00007f9a7289d514 in json_message_process_token (lexer=0x7f9a73ce4498,
+>
+input=0x7f9a73cc6880, type=JSON_RCURLY, x=36, y=17) at
+>
+qobject/json-streamer.c:105
+>
+#11 0x00007f9a728bb69b in json_lexer_feed_char (address@hidden, ch=125 '}',
+>
+address@hidden) at qobject/json-lexer.c:323
+>
+#12 0x00007f9a728bb75e in json_lexer_feed (lexer=0x7f9a73ce4498,
+>
+buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:373
+>
+#13 0x00007f9a7289d5d9 in json_message_parser_feed (parser=<optimized out>,
+>
+buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:124
+>
+#14 0x00007f9a7252722e in monitor_qmp_read (opaque=<optimized out>,
+>
+buf=<optimized out>, size=<optimized out>) at
+>
+/mnt/sdb/lzc/code/open/qemu/monitor.c:3894
+>
+#15 0x00007f9a7284ee1b in tcp_chr_read (chan=<optimized out>, cond=<optimized
+>
+out>, opaque=<optimized out>) at chardev/char-socket.c:441
+>
+#16 0x00007f9a6e03e99a in g_main_context_dispatch () from
+>
+/usr/lib64/libglib-2.0.so.0
+>
+#17 0x00007f9a728a342c in glib_pollfds_poll () at util/main-loop.c:214
+>
+#18 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
+>
+#19 main_loop_wait (address@hidden) at util/main-loop.c:515
+>
+#20 0x00007f9a724e7547 in main_loop () at vl.c:1999
+>
+#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
+>
+at vl.c:4877
+>
+>
+Problem happens in virtio_vmstate_change which is called by vm_state_notify,
+>
+static void virtio_vmstate_change(void *opaque, int running, RunState state)
+>
+{
+>
+VirtIODevice *vdev = opaque;
+>
+BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+>
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+>
+bool backend_run = running && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK);
+>
+vdev->vm_running = running;
+>
+>
+if (backend_run) {
+>
+virtio_set_status(vdev, vdev->status);
+>
+}
+>
+>
+if (k->vmstate_change) {
+>
+k->vmstate_change(qbus->parent, backend_run);
+>
+}
+>
+>
+if (!backend_run) {
+>
+virtio_set_status(vdev, vdev->status);
+>
+}
+>
+}
+>
+>
+Vdev's parent_bus is NULL, so qdev_get_parent_bus(DEVICE(vdev)) will crash.
+>
+virtio_vmstate_change is added to the list vm_change_state_head at
+>
+virtio_blk_device_realize(virtio_init),
+>
+but after hotplug virtio-blk failed, virtio_vmstate_change will not be
+>
+removed from vm_change_state_head.
+>
+>
+>
+I apply a patch as follews:
+>
+>
+diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
+>
+index 5884ce3..ea532dc 100644
+>
+--- a/hw/virtio/virtio.c
+>
++++ b/hw/virtio/virtio.c
+>
+@@ -2491,6 +2491,7 @@ static void virtio_device_realize(DeviceState *dev,
+>
+Error **errp)
+>
+virtio_bus_device_plugged(vdev, &err);
+>
+if (err != NULL) {
+>
+error_propagate(errp, err);
+>
++        vdc->unrealize(dev, NULL);
+>
+return;
+>
+}
+signature.asc
+Description:
+PGP signature
+
diff --git a/results/classifier/004/other/57756589 b/results/classifier/004/other/57756589
new file mode 100644
index 00000000..7a168faa
--- /dev/null
+++ b/results/classifier/004/other/57756589
@@ -0,0 +1,1429 @@
+other: 0.899
+mistranslation: 0.861
+instruction: 0.854
+device: 0.853
+vnc: 0.851
+assembly: 0.841
+semantic: 0.835
+boot: 0.827
+graphic: 0.824
+network: 0.822
+socket: 0.820
+KVM: 0.817
+
+[Qemu-devel] 答复: Re:   答复: Re:  答复: Re: [BUG]COLO failover hang
+
+amost like wikiï¼but panic in Primary Node.
+
+
+
+
+setp:
+
+1 
+
+Primary Node.
+
+x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio 
+-vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb 
+-usbdevice tablet\
+
+  -drive 
+if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,
+
+   
+children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=qcow2
+ -S \
+
+  -netdev 
+tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
+
+  -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \
+
+  -netdev 
+tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
+
+  -device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66 \
+
+  -chardev socket,id=mirror0,host=9.61.1.8,port=9003,server,nowait -chardev 
+socket,id=compare1,host=9.61.1.8,port=9004,server,nowait \
+
+  -chardev socket,id=compare0,host=9.61.1.8,port=9001,server,nowait -chardev 
+socket,id=compare0-0,host=9.61.1.8,port=9001 \
+
+  -chardev socket,id=compare_out,host=9.61.1.8,port=9005,server,nowait \
+
+  -chardev socket,id=compare_out0,host=9.61.1.8,port=9005 \
+
+  -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
+
+  -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out 
+-object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
+
+  -object 
+colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0
+
+2 Second node:
+
+x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 
+-name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb 
+-usbdevice tablet\
+
+  -drive 
+if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=qcow2,node-name=node0
+ \
+
+  -drive 
+if=virtio,id=active-disk0,driver=replication,mode=secondary,file.driver=qcow2,top-id=active-disk0,file.file.filename=/mnt/ramfstest/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfstest/hidden_disk.img,file.backing.backing=colo-disk0
+  \
+
+   -netdev 
+tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
+
+  -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \
+
+  -netdev 
+tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
+
+  -device e1000,netdev=hn0,mac=52:a4:00:12:78:66 -chardev 
+socket,id=red0,host=9.61.1.8,port=9003 \
+
+  -chardev socket,id=red1,host=9.61.1.8,port=9004 \
+
+  -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
+
+  -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
+
+  -object filter-rewriter,id=rew0,netdev=hn0,queue=all -incoming tcp:0:8888
+
+3  Secondary node:
+
+{'execute':'qmp_capabilities'}
+
+{ 'execute': 'nbd-server-start',
+
+  'arguments': {'addr': {'type': 'inet', 'data': {'host': '9.61.1.7', 'port': 
+'8889'} } }
+
+}
+
+{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': 
+true } }
+
+4:Primary Nodeï¼
+
+{'execute':'qmp_capabilities'}
+
+
+{ 'execute': 'human-monitor-command',
+
+  'arguments': {'command-line': 'drive_add -n buddy 
+driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0'}}
+
+{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 
+'node0' } }
+
+{ 'execute': 'migrate-set-capabilities',
+
+      'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } 
+] } }
+
+{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:9.61.1.7:8888' } }
+
+
+
+
+then can see two runing VMs, whenever you make changes to PVM, SVM will be 
+synced.  
+
+
+
+
+5ï¼Primary Nodeï¼
+
+echo c ï¼ /proc/sysrq-trigger
+
+
+
+
+ï¼ï¼Secondary node:
+
+{ 'execute': 'nbd-server-stop' }
+
+{ "execute": "x-colo-lost-heartbeat" }
+
+
+
+
+then can see the Secondary node hang at recvmsg recvmsg .
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+æéäººï¼ address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ21æ¥ 16:27
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: [BUG]COLO failover hang
+
+
+
+
+
+Hi,
+
+On 2017/3/21 16:10, address@hidden wrote:
+ï¼ Thank youã
+ï¼
+ï¼ I have test areadyã
+ï¼
+ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+ï¼
+ï¼ Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node qemu 
+will not produce the problem,but Primary Node panic canã
+ï¼
+ï¼ I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+ï¼
+ï¼
+
+Yes, you are right, when we do failover for primary/secondary VM, we will 
+shutdown the related
+fd in case it is stuck in the read/write fd.
+
+It seems that you didn't follow the above introduction exactly to do the test. 
+Could you
+share your test procedures ? Especially the commands used in the test.
+
+Thanks,
+Hailiang
+
+ï¼ when failover,channel_shutdown could not shut down the channel.
+ï¼
+ï¼
+ï¼ so the colo_process_incoming_thread will hang at recvmsg.
+ï¼
+ï¼
+ï¼ I test a patch:
+ï¼
+ï¼
+ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼
+ï¼
+ï¼ index 13966f1..d65a0ea 100644
+ï¼
+ï¼
+ï¼ --- a/migration/socket.c
+ï¼
+ï¼
+ï¼ +++ b/migration/socket.c
+ï¼
+ï¼
+ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼
+ï¼
+ï¼       }
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼       trace_migration_socket_incoming_accepted()
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+ï¼
+ï¼
+ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼
+ï¼
+ï¼       migration_channel_process_incoming(migrate_get_current(),
+ï¼
+ï¼
+ï¼                                          QIO_CHANNEL(sioc))
+ï¼
+ï¼
+ï¼       object_unref(OBJECT(sioc))
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ My test will not hang any more.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼
+ï¼
+ï¼
+ï¼ åä»¶äººï¼ address@hidden
+ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+ï¼ æéäººï¼ address@hidden address@hidden
+ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ Hi,Wang.
+ï¼
+ï¼ You can test this branch:
+ï¼
+ï¼
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+ï¼
+ï¼ and please follow wiki ensure your own configuration correctly.
+ï¼
+ï¼
+http://wiki.qemu-project.org/Features/COLO
+ï¼
+ï¼
+ï¼ Thanks
+ï¼
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼ ï¼
+ï¼ ï¼ hi.
+ï¼ ï¼
+ï¼ ï¼ I test the git qemu master have the same problem.
+ï¼ ï¼
+ï¼ ï¼ (gdb) bt
+ï¼ ï¼
+ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼ ï¼
+ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ ï¼ (address@hidden, address@hidden "",
+ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼ ï¼
+ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼
+ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ ï¼ migration/qemu-file.c:295
+ï¼ ï¼
+ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼ ï¼
+ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:568
+ï¼ ï¼
+ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:648
+ï¼ ï¼
+ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ ï¼ address@hidden) at migration/colo.c:244
+ï¼ ï¼
+ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ ï¼ outï¼, address@hidden,
+ï¼ ï¼ address@hidden)
+ï¼ ï¼
+ï¼ ï¼     at migration/colo.c:264
+ï¼ ï¼
+ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼ ï¼
+ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼
+ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼ ï¼
+ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ ï¼
+ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼ ï¼
+ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼ ï¼
+ï¼ ï¼ $3 = 0
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ (gdb) bt
+ï¼ ï¼
+ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼ ï¼
+ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ ï¼ gmain.c:3054
+ï¼ ï¼
+ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ ï¼ address@hidden) at gmain.c:3630
+ï¼ ï¼
+ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼ ï¼
+ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ ï¼ util/main-loop.c:258
+ï¼ ï¼
+ï¼ ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ ï¼ util/main-loop.c:506
+ï¼ ï¼
+ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼ ï¼
+ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ ï¼ outï¼) at vl.c:4709
+ï¼ ï¼
+ï¼ ï¼ (gdb) p ioc-ï¼features
+ï¼ ï¼
+ï¼ ï¼ $1 = 6
+ï¼ ï¼
+ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ ï¼
+ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ May be socket_accept_incoming_migration should
+ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ thank you.
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ åå§é®ä»¶
+ï¼ ï¼ address@hidden
+ï¼ ï¼ address@hidden
+ï¼ ï¼ address@hidden@huawei.comï¼
+ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ ï¼ Do the colo have any plan for development?
+ï¼ ï¼
+ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼ ï¼
+ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼ ï¼
+ï¼ ï¼ In our internal version can run it successfully,
+ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ ï¼ Next time if you have some question about COLO,
+ï¼ ï¼ please cc me and zhanghailiang address@hidden
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ Thanks
+ï¼ ï¼ Zhang Chen
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ ï¼ (gdb) bt
+ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ ï¼ address@hidden) at migration/colo..c:215
+ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼ --
+ï¼ ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼
+ï¼ ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ --
+ï¼ ï¼ Thanks
+ï¼ ï¼ Zhang Chen
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼
+
+diff --git a/migration/socket.c b/migration/socket.c
+
+
+index 13966f1..d65a0ea 100644
+
+
+--- a/migration/socket.c
+
+
++++ b/migration/socket.c
+
+
+@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
+*ioc,
+
+
+     }
+
+
+ 
+
+
+     trace_migration_socket_incoming_accepted()
+
+
+    
+
+
+     qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+
+
++    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+
+
+     migration_channel_process_incoming(migrate_get_current(),
+
+
+                                        QIO_CHANNEL(sioc))
+
+
+     object_unref(OBJECT(sioc))
+
+
+
+
+Is this patch ok? 
+
+I have test it . The test could not hang any more.
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼ address@hidden address@hidden
+æéäººï¼ address@hidden address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: [BUG]COLO failover hang
+
+
+
+
+
+On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
+ï¼ * Hailiang Zhang (address@hidden) wrote:
+ï¼ï¼ Hi,
+ï¼ï¼
+ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+ï¼ï¼
+ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do failover,
+ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
+ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
+ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+ï¼ï¼
+ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
+ï¼ï¼
+ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
+ï¼ï¼ if we tried to cancel migration.
+ï¼ï¼
+ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
+Error **errp)
+ï¼ï¼ {
+ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
+ï¼ï¼      migration_channel_connect(s, ioc, NULL)
+ï¼ï¼      ... ...
+ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+ï¼ï¼ and the
+ï¼ï¼ migrate_fd_cancel()
+ï¼ï¼ {
+ï¼ï¼   ... ...
+ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
+ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
+ï¼ï¼      }
+ï¼ï¼ }
+ï¼
+ï¼ (cc'd in Daniel Berrange).
+ï¼ I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) 
+at the
+ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
+ï¼
+
+Hmm, you are right, this problem is only exist for the migration incoming fd, 
+thanks.
+
+ï¼ Dave
+ï¼
+ï¼ï¼ Thanks,
+ï¼ï¼ Hailiang
+ï¼ï¼
+ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
+ï¼ï¼ï¼ Thank youã
+ï¼ï¼ï¼
+ï¼ï¼ï¼ I have test areadyã
+ï¼ï¼ï¼
+ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+ï¼ï¼ï¼
+ï¼ï¼ï¼ Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node 
+qemu will not produce the problem,but Primary Node panic canã
+ï¼ï¼ï¼
+ï¼ï¼ï¼ I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ I test a patch:
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ --- a/migration/socket.c
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ +++ b/migration/socket.c
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼        }
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼        object_unref(OBJECT(sioc))
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ My test will not hang any more.
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ åå§é®ä»¶
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
+ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
+ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ Hi,Wang.
+ï¼ï¼ï¼
+ï¼ï¼ï¼ You can test this branch:
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+ï¼ï¼ï¼
+ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+http://wiki.qemu-project.org/Features/COLO
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ Thanks
+ï¼ï¼ï¼
+ï¼ï¼ï¼ Zhang Chen
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ hi.
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
+ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
+ï¼ï¼ï¼ ï¼ address@hidden)
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼     at migration/colo.c:264
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ $3 = 0
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ï¼ï¼ ï¼ gmain.c:3054
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ï¼ï¼ ï¼ util/main-loop.c:258
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ï¼ï¼ ï¼ util/main-loop.c:506
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ $1 = 6
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
+ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ thank you.
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ åå§é®ä»¶
+ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
+ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ï¼ï¼ ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
+ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
+ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ Thanks
+ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
+ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) 
+at
+ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
+ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
+ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
+ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
+ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
+ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
+ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼ --
+ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ --
+ï¼ï¼ï¼ ï¼ Thanks
+ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼
+ï¼ï¼
+ï¼ --
+ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
+ï¼
+ï¼ .
+ï¼
+
+Hi,
+
+On 2017/3/22 9:42, address@hidden wrote:
+diff --git a/migration/socket.c b/migration/socket.c
+
+
+index 13966f1..d65a0ea 100644
+
+
+--- a/migration/socket.c
+
+
++++ b/migration/socket.c
+
+
+@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
+*ioc,
+
+
+      }
+
+
+
+
+
+      trace_migration_socket_incoming_accepted()
+
+
+
+
+
+      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+
+
++    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+
+
+      migration_channel_process_incoming(migrate_get_current(),
+
+
+                                         QIO_CHANNEL(sioc))
+
+
+      object_unref(OBJECT(sioc))
+
+
+
+
+Is this patch ok?
+Yes, i think this works, but a better way maybe to call 
+qio_channel_set_feature()
+in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
+socket accept fd,
+Or fix it by this:
+
+diff --git a/io/channel-socket.c b/io/channel-socket.c
+index f546c68..ce6894c 100644
+--- a/io/channel-socket.c
++++ b/io/channel-socket.c
+@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
+                           Error **errp)
+ {
+     QIOChannelSocket *cioc;
+-
+-    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
+-    cioc->fd = -1;
++
++    cioc = qio_channel_socket_new();
+     cioc->remoteAddrLen = sizeof(ioc->remoteAddr);
+     cioc->localAddrLen = sizeof(ioc->localAddr);
+
+
+Thanks,
+Hailiang
+I have test it . The test could not hang any more.
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼ address@hidden address@hidden
+æéäººï¼ address@hidden address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  çå¤: Re: [BUG]COLO failover hang
+
+
+
+
+
+On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
+ï¼ * Hailiang Zhang (address@hidden) wrote:
+ï¼ï¼ Hi,
+ï¼ï¼
+ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+ï¼ï¼
+ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do failover,
+ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration)
+ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if
+ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+ï¼ï¼
+ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden
+ï¼ï¼
+ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write()
+ï¼ï¼ if we tried to cancel migration.
+ï¼ï¼
+ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
+Error **errp)
+ï¼ï¼ {
+ï¼ï¼      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
+ï¼ï¼      migration_channel_connect(s, ioc, NULL)
+ï¼ï¼      ... ...
+ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+ï¼ï¼ and the
+ï¼ï¼ migrate_fd_cancel()
+ï¼ï¼ {
+ï¼ï¼   ... ...
+ï¼ï¼      if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) {
+ï¼ï¼          qemu_file_shutdown(f)  --ï¼ This will not take effect. No ?
+ï¼ï¼      }
+ï¼ï¼ }
+ï¼
+ï¼ (cc'd in Daniel Berrange).
+ï¼ I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) 
+at the
+ï¼ top of qio_channel_socket_new  so I think that's safe isn't it?
+ï¼
+
+Hmm, you are right, this problem is only exist for the migration incoming fd, 
+thanks.
+
+ï¼ Dave
+ï¼
+ï¼ï¼ Thanks,
+ï¼ï¼ Hailiang
+ï¼ï¼
+ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote:
+ï¼ï¼ï¼ Thank youã
+ï¼ï¼ï¼
+ï¼ï¼ï¼ I have test areadyã
+ï¼ï¼ï¼
+ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+ï¼ï¼ï¼
+ï¼ï¼ï¼ Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node 
+qemu will not produce the problem,but Primary Node panic canã
+ï¼ï¼ï¼
+ï¼ï¼ï¼ I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel.
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg.
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ I test a patch:
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ index 13966f1..d65a0ea 100644
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ --- a/migration/socket.c
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ +++ b/migration/socket.c
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean 
+socket_accept_incoming_migration(QIOChannel *ioc,
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼        }
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼        trace_migration_socket_incoming_accepted()
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼        qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN)
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼        migration_channel_process_incoming(migrate_get_current(),
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼                                           QIO_CHANNEL(sioc))
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼        object_unref(OBJECT(sioc))
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ My test will not hang any more.
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ åå§é®ä»¶
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ åä»¶äººï¼ address@hidden
+ï¼ï¼ï¼ æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden
+ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ï¼ï¼ï¼ ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ Hi,Wang.
+ï¼ï¼ï¼
+ï¼ï¼ï¼ You can test this branch:
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+ï¼ï¼ï¼
+ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly.
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+http://wiki.qemu-project.org/Features/COLO
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ Thanks
+ï¼ï¼ï¼
+ï¼ï¼ï¼ Zhang Chen
+ï¼ï¼ï¼
+ï¼ï¼ï¼
+ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ hi.
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem.
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "",
+ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ï¼ï¼ ï¼ outï¼, address@hidden,
+ï¼ï¼ï¼ ï¼ address@hidden)
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼     at migration/colo.c:264
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ $3 = 0
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) bt
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ï¼ï¼ ï¼ gmain.c:3054
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ï¼ï¼ ï¼ util/main-loop.c:258
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ï¼ï¼ ï¼ util/main-loop.c:506
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ $1 = 6
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should
+ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ thank you.
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ åå§é®ä»¶
+ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ï¼ï¼ ï¼ address@hidden
+ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼
+ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ï¼ï¼ ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ï¼ï¼ ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ï¼ï¼ ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ï¼ï¼ ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ï¼ï¼ ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development?
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ In our internal version can run it successfully,
+ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO,
+ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ Thanks
+ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt
+ï¼ï¼ï¼ ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ï¼ï¼ ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) 
+at
+ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497
+ï¼ï¼ï¼ ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ï¼ï¼ ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ï¼ï¼ ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257
+ï¼ï¼ï¼ ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ï¼ï¼ ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523
+ï¼ï¼ï¼ ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603
+ï¼ï¼ï¼ ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ï¼ï¼ ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546
+ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649
+ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼ --
+ï¼ï¼ï¼ ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼ --
+ï¼ï¼ï¼ ï¼ Thanks
+ï¼ï¼ï¼ ï¼ Zhang Chen
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼ ï¼
+ï¼ï¼ï¼
+ï¼ï¼
+ï¼ --
+ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK
+ï¼
+ï¼ .
+ï¼
+
diff --git a/results/classifier/004/other/59540920 b/results/classifier/004/other/59540920
new file mode 100644
index 00000000..ae43e07c
--- /dev/null
+++ b/results/classifier/004/other/59540920
@@ -0,0 +1,384 @@
+other: 0.989
+instruction: 0.986
+graphic: 0.985
+device: 0.985
+semantic: 0.985
+assembly: 0.984
+socket: 0.983
+network: 0.981
+boot: 0.980
+mistranslation: 0.978
+vnc: 0.977
+KVM: 0.970
+
+[BUG] No irqchip created after commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property")
+
+I apologize if this was already reported,
+
+I just noticed that with the latest updates QEMU doesn't start with the
+following configuration:
+
+qemu-system-x86_64 -name guest=win10 -machine pc,accel=kvm -cpu 
+host,hv_vpindex,hv_synic ...
+
+qemu-system-x86_64: failed to turn on HyperV SynIC in KVM: Invalid argument
+qemu-system-x86_64: kvm_init_vcpu failed: Invalid argument
+
+If I add 'kernel-irqchip=split' or ',kernel-irqchip=on' it starts as
+usual. I bisected this to the following commit:
+
+commit 11bc4a13d1f4b07dafbd1dda4d4bf0fdd7ad65f2 (HEAD, refs/bisect/bad)
+Author: Paolo Bonzini <address@hidden>
+Date:   Wed Nov 13 10:56:53 2019 +0100
+
+    kvm: convert "-machine kernel_irqchip" to an accelerator property
+    
+so aparently we now default to 'kernel_irqchip=off'. Is this the desired
+behavior?
+
+-- 
+Vitaly
+
+No, absolutely not. I was sure I had tested it, but I will take a look.
+Paolo
+Il ven 20 dic 2019, 15:11 Vitaly Kuznetsov <
+address@hidden
+> ha scritto:
+I apologize if this was already reported,
+I just noticed that with the latest updates QEMU doesn't start with the
+following configuration:
+qemu-system-x86_64 -name guest=win10 -machine pc,accel=kvm -cpu host,hv_vpindex,hv_synic ...
+qemu-system-x86_64: failed to turn on HyperV SynIC in KVM: Invalid argument
+qemu-system-x86_64: kvm_init_vcpu failed: Invalid argument
+If I add 'kernel-irqchip=split' or ',kernel-irqchip=on' it starts as
+usual. I bisected this to the following commit:
+commit 11bc4a13d1f4b07dafbd1dda4d4bf0fdd7ad65f2 (HEAD, refs/bisect/bad)
+Author: Paolo Bonzini <
+address@hidden
+>
+Date:Â  Â Wed Nov 13 10:56:53 2019 +0100
+Â  Â  kvm: convert "-machine kernel_irqchip" to an accelerator property
+so aparently we now default to 'kernel_irqchip=off'. Is this the desired
+behavior?
+--
+Vitaly
+
+Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an
+accelerator property") moves kernel_irqchip property from "-machine" to
+"-accel kvm", but it forgets to set the default value of
+kernel_irqchip_allowed and kernel_irqchip_split.
+
+Also cleaning up the three useless members (kernel_irqchip_allowed,
+kernel_irqchip_required, kernel_irqchip_split) in struct MachineState.
+
+Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator 
+property")
+Signed-off-by: Xiaoyao Li <address@hidden>
+---
+ accel/kvm/kvm-all.c | 3 +++
+ include/hw/boards.h | 3 ---
+ 2 files changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
+index b2f1a5bcb5ef..40f74094f8d3 100644
+--- a/accel/kvm/kvm-all.c
++++ b/accel/kvm/kvm-all.c
+@@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void)
+ static void kvm_accel_instance_init(Object *obj)
+ {
+     KVMState *s = KVM_STATE(obj);
++    MachineClass *mc = MACHINE_GET_CLASS(current_machine);
+ 
+     s->kvm_shadow_mem = -1;
++    s->kernel_irqchip_allowed = true;
++    s->kernel_irqchip_split = mc->default_kernel_irqchip_split;
+ }
+ 
+ static void kvm_accel_class_init(ObjectClass *oc, void *data)
+diff --git a/include/hw/boards.h b/include/hw/boards.h
+index 61f8bb8e5a42..fb1b43d5b972 100644
+--- a/include/hw/boards.h
++++ b/include/hw/boards.h
+@@ -271,9 +271,6 @@ struct MachineState {
+ 
+     /*< public >*/
+ 
+-    bool kernel_irqchip_allowed;
+-    bool kernel_irqchip_required;
+-    bool kernel_irqchip_split;
+     char *dtb;
+     char *dumpdtb;
+     int phandle_start;
+-- 
+2.19.1
+
+Il sab 28 dic 2019, 09:48 Xiaoyao Li <
+address@hidden
+> ha scritto:
+Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an
+accelerator property") moves kernel_irqchip property from "-machine" to
+"-accel kvm", but it forgets to set the default value of
+kernel_irqchip_allowed and kernel_irqchip_split.
+Also cleaning up the three useless members (kernel_irqchip_allowed,
+kernel_irqchip_required, kernel_irqchip_split) in struct MachineState.
+Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property")
+Signed-off-by: Xiaoyao Li <
+address@hidden
+>
+Please also add a Reported-by line for Vitaly Kuznetsov.
+---
+Â accel/kvm/kvm-all.c | 3 +++
+Â include/hw/boards.h | 3 ---
+Â 2 files changed, 3 insertions(+), 3 deletions(-)
+diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
+index b2f1a5bcb5ef..40f74094f8d3 100644
+--- a/accel/kvm/kvm-all.c
++++ b/accel/kvm/kvm-all.c
+@@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void)
+Â static void kvm_accel_instance_init(Object *obj)
+Â {
+Â  Â  Â KVMState *s = KVM_STATE(obj);
++Â  Â  MachineClass *mc = MACHINE_GET_CLASS(current_machine);
+Â  Â  Â s->kvm_shadow_mem = -1;
++Â  Â  s->kernel_irqchip_allowed = true;
++Â  Â  s->kernel_irqchip_split = mc->default_kernel_irqchip_split;
+Can you initialize this from the init_machine method instead of assuming that current_machine has been initialized earlier?
+Thanks for the quick fix!
+Paolo
+Â }
+Â static void kvm_accel_class_init(ObjectClass *oc, void *data)
+diff --git a/include/hw/boards.h b/include/hw/boards.h
+index 61f8bb8e5a42..fb1b43d5b972 100644
+--- a/include/hw/boards.h
++++ b/include/hw/boards.h
+@@ -271,9 +271,6 @@ struct MachineState {
+Â  Â  Â /*< public >*/
+-Â  Â  bool kernel_irqchip_allowed;
+-Â  Â  bool kernel_irqchip_required;
+-Â  Â  bool kernel_irqchip_split;
+Â  Â  Â char *dtb;
+Â  Â  Â char *dumpdtb;
+Â  Â  Â int phandle_start;
+--
+2.19.1
+
+On Sat, 2019-12-28 at 10:02 +0000, Paolo Bonzini wrote:
+>
+>
+>
+Il sab 28 dic 2019, 09:48 Xiaoyao Li <address@hidden> ha scritto:
+>
+> Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an
+>
+> accelerator property") moves kernel_irqchip property from "-machine" to
+>
+> "-accel kvm", but it forgets to set the default value of
+>
+> kernel_irqchip_allowed and kernel_irqchip_split.
+>
+>
+>
+> Also cleaning up the three useless members (kernel_irqchip_allowed,
+>
+> kernel_irqchip_required, kernel_irqchip_split) in struct MachineState.
+>
+>
+>
+> Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an
+>
+> accelerator property")
+>
+> Signed-off-by: Xiaoyao Li <address@hidden>
+>
+>
+Please also add a Reported-by line for Vitaly Kuznetsov.
+Sure.
+
+>
+> ---
+>
+>  accel/kvm/kvm-all.c | 3 +++
+>
+>  include/hw/boards.h | 3 ---
+>
+>  2 files changed, 3 insertions(+), 3 deletions(-)
+>
+>
+>
+> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
+>
+> index b2f1a5bcb5ef..40f74094f8d3 100644
+>
+> --- a/accel/kvm/kvm-all.c
+>
+> +++ b/accel/kvm/kvm-all.c
+>
+> @@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void)
+>
+>  static void kvm_accel_instance_init(Object *obj)
+>
+>  {
+>
+>      KVMState *s = KVM_STATE(obj);
+>
+> +    MachineClass *mc = MACHINE_GET_CLASS(current_machine);
+>
+>
+>
+>      s->kvm_shadow_mem = -1;
+>
+> +    s->kernel_irqchip_allowed = true;
+>
+> +    s->kernel_irqchip_split = mc->default_kernel_irqchip_split;
+>
+>
+Can you initialize this from the init_machine method instead of assuming that
+>
+current_machine has been initialized earlier?
+OK, will do it in v2.
+
+>
+Thanks for the quick fix!
+BTW, it seems that this patch makes kernel_irqchip default on to workaround the
+bug. 
+However, when explicitly configuring kernel_irqchip=off, guest still fails
+booting due to "KVM: failed to send PV IPI: -95" with a latest upstream kernel
+ubuntu guest. Any idea about this? 
+
+>
+Paolo
+>
+>  }
+>
+>
+>
+>  static void kvm_accel_class_init(ObjectClass *oc, void *data)
+>
+> diff --git a/include/hw/boards.h b/include/hw/boards.h
+>
+> index 61f8bb8e5a42..fb1b43d5b972 100644
+>
+> --- a/include/hw/boards.h
+>
+> +++ b/include/hw/boards.h
+>
+> @@ -271,9 +271,6 @@ struct MachineState {
+>
+>
+>
+>      /*< public >*/
+>
+>
+>
+> -    bool kernel_irqchip_allowed;
+>
+> -    bool kernel_irqchip_required;
+>
+> -    bool kernel_irqchip_split;
+>
+>      char *dtb;
+>
+>      char *dumpdtb;
+>
+>      int phandle_start;
+
+Il sab 28 dic 2019, 10:24 Xiaoyao Li <
+address@hidden
+> ha scritto:
+BTW, it seems that this patch makes kernel_irqchip default on to workaround the
+bug.
+However, when explicitly configuring kernel_irqchip=off, guest still fails
+booting due to "KVM: failed to send PV IPI: -95" with a latest upstream kernel
+ubuntu guest. Any idea about this?
+We need to clear the PV IPI feature for userspace irqchip. Are you using -cpu host by chance?
+Paolo
+> Paolo
+> >Â  }
+> >
+> >Â  static void kvm_accel_class_init(ObjectClass *oc, void *data)
+> > diff --git a/include/hw/boards.h b/include/hw/boards.h
+> > index 61f8bb8e5a42..fb1b43d5b972 100644
+> > --- a/include/hw/boards.h
+> > +++ b/include/hw/boards.h
+> > @@ -271,9 +271,6 @@ struct MachineState {
+> >
+> >Â  Â  Â  /*< public >*/
+> >
+> > -Â  Â  bool kernel_irqchip_allowed;
+> > -Â  Â  bool kernel_irqchip_required;
+> > -Â  Â  bool kernel_irqchip_split;
+> >Â  Â  Â  char *dtb;
+> >Â  Â  Â  char *dumpdtb;
+> >Â  Â  Â  int phandle_start;
+
+On Sat, 2019-12-28 at 10:57 +0000, Paolo Bonzini wrote:
+>
+>
+>
+Il sab 28 dic 2019, 10:24 Xiaoyao Li <address@hidden> ha scritto:
+>
+> BTW, it seems that this patch makes kernel_irqchip default on to workaround
+>
+> the
+>
+> bug.
+>
+> However, when explicitly configuring kernel_irqchip=off, guest still fails
+>
+> booting due to "KVM: failed to send PV IPI: -95" with a latest upstream
+>
+> kernel
+>
+> ubuntu guest. Any idea about this?
+>
+>
+We need to clear the PV IPI feature for userspace irqchip. Are you using -cpu
+>
+host by chance?
+Yes, I used -cpu host.
+
+After using "-cpu host,-kvm-pv-ipi" with kernel_irqchip=off, it can boot
+successfully.
+
+>
+Paolo
+>
+>
+> > Paolo
+>
+> > >  }
+>
+> > >
+>
+> > >  static void kvm_accel_class_init(ObjectClass *oc, void *data)
+>
+> > > diff --git a/include/hw/boards.h b/include/hw/boards.h
+>
+> > > index 61f8bb8e5a42..fb1b43d5b972 100644
+>
+> > > --- a/include/hw/boards.h
+>
+> > > +++ b/include/hw/boards.h
+>
+> > > @@ -271,9 +271,6 @@ struct MachineState {
+>
+> > >
+>
+> > >      /*< public >*/
+>
+> > >
+>
+> > > -    bool kernel_irqchip_allowed;
+>
+> > > -    bool kernel_irqchip_required;
+>
+> > > -    bool kernel_irqchip_split;
+>
+> > >      char *dtb;
+>
+> > >      char *dumpdtb;
+>
+> > >      int phandle_start;
+>
+>
+
diff --git a/results/classifier/004/other/64571620 b/results/classifier/004/other/64571620
new file mode 100644
index 00000000..e6da5ef4
--- /dev/null
+++ b/results/classifier/004/other/64571620
@@ -0,0 +1,793 @@
+other: 0.922
+mistranslation: 0.917
+assembly: 0.913
+semantic: 0.903
+device: 0.899
+graphic: 0.897
+instruction: 0.894
+boot: 0.879
+KVM: 0.867
+socket: 0.855
+network: 0.853
+vnc: 0.819
+
+[BUG] Migration hv_time rollback
+
+Hi,
+
+We are experiencing timestamp rollbacks during live-migration of
+Windows 10 guests with the following qemu configuration (linux 5.4.46
+and qemu master):
+```
+$ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
+```
+
+I have tracked the bug to the fact that `kvmclock` is not exposed and
+disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
+
+I think we should enable the `kvmclock` (qemu device) if `hv-time` is
+present and add Hyper-V support for the `kvmclock_current_nsec`
+function.
+
+I'm asking for advice because I am unsure this is the _right_ approach
+and how to keep migration compatibility between qemu versions.
+
+Thank you all,
+
+-- 
+Antoine 'xdbob' Damhet
+signature.asc
+Description:
+PGP signature
+
+cc'ing in Vitaly who knows about the hv stuff.
+
+* Antoine Damhet (antoine.damhet@blade-group.com) wrote:
+>
+Hi,
+>
+>
+We are experiencing timestamp rollbacks during live-migration of
+>
+Windows 10 guests with the following qemu configuration (linux 5.4.46
+>
+and qemu master):
+>
+```
+>
+$ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
+>
+```
+How big a jump are you seeing, and how did you notice it in the guest?
+
+Dave
+
+>
+I have tracked the bug to the fact that `kvmclock` is not exposed and
+>
+disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
+>
+>
+I think we should enable the `kvmclock` (qemu device) if `hv-time` is
+>
+present and add Hyper-V support for the `kvmclock_current_nsec`
+>
+function.
+>
+>
+I'm asking for advice because I am unsure this is the _right_ approach
+>
+and how to keep migration compatibility between qemu versions.
+>
+>
+Thank you all,
+>
+>
+--
+>
+Antoine 'xdbob' Damhet
+-- 
+Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
+
+"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
+
+>
+cc'ing in Vitaly who knows about the hv stuff.
+>
+cc'ing Marcelo who knows about clocksources :-)
+
+>
+* Antoine Damhet (antoine.damhet@blade-group.com) wrote:
+>
+> Hi,
+>
+>
+>
+> We are experiencing timestamp rollbacks during live-migration of
+>
+> Windows 10 guests
+Are you migrating to the same hardware (with the same TSC frequency)? Is
+TSC used as the clocksource on the host?
+
+>
+>  with the following qemu configuration (linux 5.4.46
+>
+> and qemu master):
+>
+> ```
+>
+> $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
+>
+> ```
+Out of pure curiosity, what's the purpose of doing 'kvm=off'? Windows is
+not going to check for KVM identification anyway so we pretend we're
+Hyper-V. 
+
+Also, have you tried adding more Hyper-V enlightenments? 
+
+>
+>
+How big a jump are you seeing, and how did you notice it in the guest?
+>
+>
+Dave
+>
+>
+> I have tracked the bug to the fact that `kvmclock` is not exposed and
+>
+> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
+>
+>
+>
+> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
+>
+> present and add Hyper-V support for the `kvmclock_current_nsec`
+>
+> function.
+AFAICT kvmclock_current_nsec() checks whether kvmclock was enabled by
+the guest:
+
+   if (!(env->system_time_msr & 1ULL)) {
+        /* KVM clock not active */
+        return 0;
+    }
+
+and this is (and way) always false for Windows guests.
+
+>
+>
+>
+> I'm asking for advice because I am unsure this is the _right_ approach
+>
+> and how to keep migration compatibility between qemu versions.
+>
+>
+>
+> Thank you all,
+>
+>
+>
+> --
+>
+> Antoine 'xdbob' Damhet
+-- 
+Vitaly
+
+On Wed, Sep 16, 2020 at 01:59:43PM +0200, Vitaly Kuznetsov wrote:
+>
+"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
+>
+>
+> cc'ing in Vitaly who knows about the hv stuff.
+>
+>
+>
+>
+cc'ing Marcelo who knows about clocksources :-)
+>
+>
+> * Antoine Damhet (antoine.damhet@blade-group.com) wrote:
+>
+>> Hi,
+>
+>>
+>
+>> We are experiencing timestamp rollbacks during live-migration of
+>
+>> Windows 10 guests
+>
+>
+Are you migrating to the same hardware (with the same TSC frequency)? Is
+>
+TSC used as the clocksource on the host?
+Yes we are migrating to the exact same hardware. And yes TSC is used as
+a clocksource in the host (but the bug is still happening with `hpet` as
+a clocksource).
+
+>
+>
+>>  with the following qemu configuration (linux 5.4.46
+>
+>> and qemu master):
+>
+>> ```
+>
+>> $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
+>
+>> ```
+>
+>
+Out of pure curiosity, what's the purpose of doing 'kvm=off'? Windows is
+>
+not going to check for KVM identification anyway so we pretend we're
+>
+Hyper-V.
+Some softwares explicitly checks for the presence of KVM and then crash
+if they find it in CPUID :/
+
+>
+>
+Also, have you tried adding more Hyper-V enlightenments?
+Yes, I published a stripped-down command-line for a minimal reproducer
+but even `hv-frequencies` and `hv-reenlightenment` don't help.
+
+>
+>
+>
+>
+> How big a jump are you seeing, and how did you notice it in the guest?
+>
+>
+>
+> Dave
+>
+>
+>
+>> I have tracked the bug to the fact that `kvmclock` is not exposed and
+>
+>> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
+>
+>>
+>
+>> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
+>
+>> present and add Hyper-V support for the `kvmclock_current_nsec`
+>
+>> function.
+>
+>
+AFAICT kvmclock_current_nsec() checks whether kvmclock was enabled by
+>
+the guest:
+>
+>
+if (!(env->system_time_msr & 1ULL)) {
+>
+/* KVM clock not active */
+>
+return 0;
+>
+}
+>
+>
+and this is (and way) always false for Windows guests.
+Hooo, I missed this piece. When is `clock_is_reliable` expected to be
+false ? Because if it is I still think we should be able to query at
+least `HV_X64_MSR_REFERENCE_TSC`
+
+>
+>
+>>
+>
+>> I'm asking for advice because I am unsure this is the _right_ approach
+>
+>> and how to keep migration compatibility between qemu versions.
+>
+>>
+>
+>> Thank you all,
+>
+>>
+>
+>> --
+>
+>> Antoine 'xdbob' Damhet
+>
+>
+--
+>
+Vitaly
+>
+-- 
+Antoine 'xdbob' Damhet
+signature.asc
+Description:
+PGP signature
+
+On Wed, Sep 16, 2020 at 12:29:56PM +0100, Dr. David Alan Gilbert wrote:
+>
+cc'ing in Vitaly who knows about the hv stuff.
+Thanks
+
+>
+>
+* Antoine Damhet (antoine.damhet@blade-group.com) wrote:
+>
+> Hi,
+>
+>
+>
+> We are experiencing timestamp rollbacks during live-migration of
+>
+> Windows 10 guests with the following qemu configuration (linux 5.4.46
+>
+> and qemu master):
+>
+> ```
+>
+> $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
+>
+> ```
+>
+>
+How big a jump are you seeing, and how did you notice it in the guest?
+I'm seeing jumps of about the guest uptime (indicating a reset of the
+counter). It's expected because we won't call `KVM_SET_CLOCK` to
+restore any value.
+
+We first noticed it because after some migrations `dwm.exe` crashes with
+the "(NTSTATUS) 0x8898009b - QueryPerformanceCounter returned a time in
+the past." error code.
+
+I can also confirm the following hack makes the behavior disappear:
+
+```
+diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
+index 64283358f9..f334bdf35f 100644
+--- a/hw/i386/kvm/clock.c
++++ b/hw/i386/kvm/clock.c
+@@ -332,11 +332,7 @@ void kvmclock_create(void)
+ {
+     X86CPU *cpu = X86_CPU(first_cpu);
+
+-    if (kvm_enabled() &&
+-        cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
+-                                       (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
+-        sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
+-    }
++    sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
+ }
+
+ static void kvmclock_register_types(void)
+diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
+index 32b1453e6a..11d980ba85 100644
+--- a/hw/i386/pc_piix.c
++++ b/hw/i386/pc_piix.c
+@@ -158,9 +158,7 @@ static void pc_init1(MachineState *machine,
+
+     x86_cpus_init(x86ms, pcmc->default_cpu_version);
+
+-    if (kvm_enabled() && pcmc->kvmclock_enabled) {
+-        kvmclock_create();
+-    }
++    kvmclock_create();
+
+     if (pcmc->pci_enabled) {
+         pci_memory = g_new(MemoryRegion, 1);
+```
+
+>
+>
+Dave
+>
+>
+> I have tracked the bug to the fact that `kvmclock` is not exposed and
+>
+> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
+>
+>
+>
+> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
+>
+> present and add Hyper-V support for the `kvmclock_current_nsec`
+>
+> function.
+>
+>
+>
+> I'm asking for advice because I am unsure this is the _right_ approach
+>
+> and how to keep migration compatibility between qemu versions.
+>
+>
+>
+> Thank you all,
+>
+>
+>
+> --
+>
+> Antoine 'xdbob' Damhet
+>
+>
+>
+--
+>
+Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
+>
+-- 
+Antoine 'xdbob' Damhet
+signature.asc
+Description:
+PGP signature
+
+Antoine Damhet <antoine.damhet@blade-group.com> writes:
+
+>
+On Wed, Sep 16, 2020 at 12:29:56PM +0100, Dr. David Alan Gilbert wrote:
+>
+> cc'ing in Vitaly who knows about the hv stuff.
+>
+>
+Thanks
+>
+>
+>
+>
+> * Antoine Damhet (antoine.damhet@blade-group.com) wrote:
+>
+> > Hi,
+>
+> >
+>
+> > We are experiencing timestamp rollbacks during live-migration of
+>
+> > Windows 10 guests with the following qemu configuration (linux 5.4.46
+>
+> > and qemu master):
+>
+> > ```
+>
+> > $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
+>
+> > ```
+>
+>
+>
+> How big a jump are you seeing, and how did you notice it in the guest?
+>
+>
+I'm seeing jumps of about the guest uptime (indicating a reset of the
+>
+counter). It's expected because we won't call `KVM_SET_CLOCK` to
+>
+restore any value.
+>
+>
+We first noticed it because after some migrations `dwm.exe` crashes with
+>
+the "(NTSTATUS) 0x8898009b - QueryPerformanceCounter returned a time in
+>
+the past." error code.
+>
+>
+I can also confirm the following hack makes the behavior disappear:
+>
+>
+```
+>
+diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
+>
+index 64283358f9..f334bdf35f 100644
+>
+--- a/hw/i386/kvm/clock.c
+>
++++ b/hw/i386/kvm/clock.c
+>
+@@ -332,11 +332,7 @@ void kvmclock_create(void)
+>
+{
+>
+X86CPU *cpu = X86_CPU(first_cpu);
+>
+>
+-    if (kvm_enabled() &&
+>
+-        cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
+>
+-                                       (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
+>
+-        sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
+>
+-    }
+>
++    sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
+>
+}
+>
+Oh, I think I see what's going on. When you add 'kvm=off'
+cpu->env.features[FEAT_KVM] is reset (see x86_cpu_expand_features()) so
+kvmclock QEMU device is not created and nobody calls KVM_SET_CLOCK on
+migration.
+
+In case we really want to support 'kvm=off' I think we can add Hyper-V
+features check here along with KVM, this should do the job.
+
+-- 
+Vitaly
+
+Vitaly Kuznetsov <vkuznets@redhat.com> writes:
+
+>
+Antoine Damhet <antoine.damhet@blade-group.com> writes:
+>
+>
+> On Wed, Sep 16, 2020 at 12:29:56PM +0100, Dr. David Alan Gilbert wrote:
+>
+>> cc'ing in Vitaly who knows about the hv stuff.
+>
+>
+>
+> Thanks
+>
+>
+>
+>>
+>
+>> * Antoine Damhet (antoine.damhet@blade-group.com) wrote:
+>
+>> > Hi,
+>
+>> >
+>
+>> > We are experiencing timestamp rollbacks during live-migration of
+>
+>> > Windows 10 guests with the following qemu configuration (linux 5.4.46
+>
+>> > and qemu master):
+>
+>> > ```
+>
+>> > $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...]
+>
+>> > ```
+>
+>>
+>
+>> How big a jump are you seeing, and how did you notice it in the guest?
+>
+>
+>
+> I'm seeing jumps of about the guest uptime (indicating a reset of the
+>
+> counter). It's expected because we won't call `KVM_SET_CLOCK` to
+>
+> restore any value.
+>
+>
+>
+> We first noticed it because after some migrations `dwm.exe` crashes with
+>
+> the "(NTSTATUS) 0x8898009b - QueryPerformanceCounter returned a time in
+>
+> the past." error code.
+>
+>
+>
+> I can also confirm the following hack makes the behavior disappear:
+>
+>
+>
+> ```
+>
+> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
+>
+> index 64283358f9..f334bdf35f 100644
+>
+> --- a/hw/i386/kvm/clock.c
+>
+> +++ b/hw/i386/kvm/clock.c
+>
+> @@ -332,11 +332,7 @@ void kvmclock_create(void)
+>
+>  {
+>
+>      X86CPU *cpu = X86_CPU(first_cpu);
+>
+>
+>
+> -    if (kvm_enabled() &&
+>
+> -        cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
+>
+> -                                       (1ULL << KVM_FEATURE_CLOCKSOURCE2)))
+>
+> {
+>
+> -        sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
+>
+> -    }
+>
+> +    sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
+>
+>  }
+>
+>
+>
+>
+>
+Oh, I think I see what's going on. When you add 'kvm=off'
+>
+cpu->env.features[FEAT_KVM] is reset (see x86_cpu_expand_features()) so
+>
+kvmclock QEMU device is not created and nobody calls KVM_SET_CLOCK on
+>
+migration.
+>
+>
+In case we really want to support 'kvm=off' I think we can add Hyper-V
+>
+features check here along with KVM, this should do the job.
+Does the untested
+
+diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
+index 64283358f91d..e03b2ca6d8f6 100644
+--- a/hw/i386/kvm/clock.c
++++ b/hw/i386/kvm/clock.c
+@@ -333,8 +333,9 @@ void kvmclock_create(void)
+     X86CPU *cpu = X86_CPU(first_cpu);
+ 
+     if (kvm_enabled() &&
+-        cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
+-                                       (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
++        ((cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
++                                         (1ULL << KVM_FEATURE_CLOCKSOURCE2))) 
+||
++         (cpu->env.features[FEAT_HYPERV_EAX] & HV_TIME_REF_COUNT_AVAILABLE))) {
+         sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
+     }
+ }
+
+help?
+
+(I don't think we need to remove all 'if (kvm_enabled())' checks from
+machine types as 'kvm=off' should not be related).
+
+-- 
+Vitaly
+
+On Wed, Sep 16, 2020 at 02:50:56PM +0200, Vitaly Kuznetsov wrote:
+[...]
+
+>
+>>
+>
+>
+>
+>
+>
+> Oh, I think I see what's going on. When you add 'kvm=off'
+>
+> cpu->env.features[FEAT_KVM] is reset (see x86_cpu_expand_features()) so
+>
+> kvmclock QEMU device is not created and nobody calls KVM_SET_CLOCK on
+>
+> migration.
+>
+>
+>
+> In case we really want to support 'kvm=off' I think we can add Hyper-V
+>
+> features check here along with KVM, this should do the job.
+>
+>
+Does the untested
+>
+>
+diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
+>
+index 64283358f91d..e03b2ca6d8f6 100644
+>
+--- a/hw/i386/kvm/clock.c
+>
++++ b/hw/i386/kvm/clock.c
+>
+@@ -333,8 +333,9 @@ void kvmclock_create(void)
+>
+X86CPU *cpu = X86_CPU(first_cpu);
+>
+>
+if (kvm_enabled() &&
+>
+-        cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
+>
+-                                       (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
+>
++        ((cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
+>
++                                         (1ULL <<
+>
+KVM_FEATURE_CLOCKSOURCE2))) ||
+>
++         (cpu->env.features[FEAT_HYPERV_EAX] &
+>
+HV_TIME_REF_COUNT_AVAILABLE))) {
+>
+sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL);
+>
+}
+>
+}
+>
+>
+help?
+It appears to work :)
+
+>
+>
+(I don't think we need to remove all 'if (kvm_enabled())' checks from
+>
+machine types as 'kvm=off' should not be related).
+Indeed (I didn't look at the macro, it was just quick & dirty).
+
+>
+>
+--
+>
+Vitaly
+>
+>
+-- 
+Antoine 'xdbob' Damhet
+signature.asc
+Description:
+PGP signature
+
+On 16/09/20 13:29, Dr. David Alan Gilbert wrote:
+>
+> I have tracked the bug to the fact that `kvmclock` is not exposed and
+>
+> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
+>
+>
+>
+> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
+>
+> present and add Hyper-V support for the `kvmclock_current_nsec`
+>
+> function.
+Yes, this seems correct.  I would have to check but it may even be
+better to _always_ send kvmclock data in the live migration stream.
+
+Paolo
+
+Paolo Bonzini <pbonzini@redhat.com> writes:
+
+>
+On 16/09/20 13:29, Dr. David Alan Gilbert wrote:
+>
+>> I have tracked the bug to the fact that `kvmclock` is not exposed and
+>
+>> disabled from qemu PoV but is in fact used by `hv-time` (in KVM).
+>
+>>
+>
+>> I think we should enable the `kvmclock` (qemu device) if `hv-time` is
+>
+>> present and add Hyper-V support for the `kvmclock_current_nsec`
+>
+>> function.
+>
+>
+Yes, this seems correct.  I would have to check but it may even be
+>
+better to _always_ send kvmclock data in the live migration stream.
+>
+The question I have is: with 'kvm=off', do we actually restore TSC
+reading on migration? (and I guess the answer is 'no' or Hyper-V TSC
+page would 'just work' I guess). So yea, maybe dropping the
+'cpu->env.features[FEAT_KVM]' check is the right fix.
+
+-- 
+Vitaly
+
diff --git a/results/classifier/004/other/65781993 b/results/classifier/004/other/65781993
new file mode 100644
index 00000000..a7d133b6
--- /dev/null
+++ b/results/classifier/004/other/65781993
@@ -0,0 +1,2801 @@
+other: 0.727
+instruction: 0.670
+assembly: 0.666
+semantic: 0.665
+graphic: 0.664
+socket: 0.660
+network: 0.657
+mistranslation: 0.650
+device: 0.647
+boot: 0.635
+KVM: 0.627
+vnc: 0.590
+
+[Qemu-devel] 答复: Re:   答复: Re:  [BUG]COLO failover hang
+
+Thank youã
+
+I have test areadyã
+
+When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+
+Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node qemu 
+will not produce the problem,but Primary Node panic canã
+
+I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+
+
+when failover,channel_shutdown could not shut down the channel.
+
+
+so the colo_process_incoming_thread will hang at recvmsg.
+
+
+I test a patch:
+
+
+diff --git a/migration/socket.c b/migration/socket.c
+
+
+index 13966f1..d65a0ea 100644
+
+
+--- a/migration/socket.c
+
+
++++ b/migration/socket.c
+
+
+@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
+*ioc,
+
+
+     }
+
+
+ 
+
+
+     trace_migration_socket_incoming_accepted()
+
+
+    
+
+
+     qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+
+
++    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+
+
+     migration_channel_process_incoming(migrate_get_current(),
+
+
+                                        QIO_CHANNEL(sioc))
+
+
+     object_unref(OBJECT(sioc))
+
+
+
+
+My test will not hang any more.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+æéäººï¼ address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+
+
+
+
+
+Hi,Wang.
+
+You can test this branch:
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+and please follow wiki ensure your own configuration correctly.
+http://wiki.qemu-project.org/Features/COLO
+Thanks
+
+Zhang Chen
+
+
+On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼
+ï¼ hi.
+ï¼
+ï¼ I test the git qemu master have the same problem.
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, 
+ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼
+ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read 
+ï¼ (address@hidden, address@hidden "", 
+ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼
+ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, 
+ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at 
+ï¼ migration/qemu-file-channel.c:78
+ï¼
+ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
+ï¼ migration/qemu-file.c:295
+ï¼
+ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, 
+ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼
+ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at 
+ï¼ migration/qemu-file.c:568
+ï¼
+ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at 
+ï¼ migration/qemu-file.c:648
+ï¼
+ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
+ï¼ address@hidden) at migration/colo.c:244
+ï¼
+ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized 
+ï¼ outï¼, address@hidden, 
+ï¼ address@hidden)
+ï¼
+ï¼     at migration/colo.c:264
+ï¼
+ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread 
+ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼
+ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼
+ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼
+ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼
+ï¼ $3 = 0
+ï¼
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, 
+ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼
+ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at 
+ï¼ gmain.c:3054
+ï¼
+ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼, 
+ï¼ address@hidden) at gmain.c:3630
+ï¼
+ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼
+ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at 
+ï¼ util/main-loop.c:258
+ï¼
+ï¼ #5  main_loop_wait (address@hidden) at 
+ï¼ util/main-loop.c:506
+ï¼
+ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼
+ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized 
+ï¼ outï¼) at vl.c:4709
+ï¼
+ï¼ (gdb) p ioc-ï¼features
+ï¼
+ï¼ $1 = 6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼
+ï¼
+ï¼ May be socket_accept_incoming_migration should 
+ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼
+ï¼
+ï¼ thank you.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼ address@hidden
+ï¼ address@hidden
+ï¼ address@hidden@huawei.comï¼
+ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼
+ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼
+ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ Do the colo have any plan for development?
+ï¼
+ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼
+ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼
+ï¼ In our internal version can run it successfully,
+ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ Next time if you have some question about COLO,
+ï¼ please cc me and zhanghailiang address@hidden
+ï¼
+ï¼
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ (gdb) bt
+ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ --
+ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼
+ï¼ -- 
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+
+-- 
+Thanks
+Zhang Chen
+
+Hi,
+
+On 2017/3/21 16:10, address@hidden wrote:
+Thank youã
+
+I have test areadyã
+
+When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+
+Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node qemu 
+will not produce the problem,but Primary Node panic canã
+
+I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+Yes, you are right, when we do failover for primary/secondary VM, we will 
+shutdown the related
+fd in case it is stuck in the read/write fd.
+
+It seems that you didn't follow the above introduction exactly to do the test. 
+Could you
+share your test procedures ? Especially the commands used in the test.
+
+Thanks,
+Hailiang
+when failover,channel_shutdown could not shut down the channel.
+
+
+so the colo_process_incoming_thread will hang at recvmsg.
+
+
+I test a patch:
+
+
+diff --git a/migration/socket.c b/migration/socket.c
+
+
+index 13966f1..d65a0ea 100644
+
+
+--- a/migration/socket.c
+
+
++++ b/migration/socket.c
+
+
+@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
+*ioc,
+
+
+      }
+
+
+
+
+
+      trace_migration_socket_incoming_accepted()
+
+
+
+
+
+      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+
+
++    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+
+
+      migration_channel_process_incoming(migrate_get_current(),
+
+
+                                         QIO_CHANNEL(sioc))
+
+
+      object_unref(OBJECT(sioc))
+
+
+
+
+My test will not hang any more.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+æéäººï¼ address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+
+
+
+
+
+Hi,Wang.
+
+You can test this branch:
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+and please follow wiki ensure your own configuration correctly.
+http://wiki.qemu-project.org/Features/COLO
+Thanks
+
+Zhang Chen
+
+
+On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼
+ï¼ hi.
+ï¼
+ï¼ I test the git qemu master have the same problem.
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼
+ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ (address@hidden, address@hidden "",
+ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼
+ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ migration/qemu-file-channel.c:78
+ï¼
+ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ migration/qemu-file.c:295
+ï¼
+ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼
+ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ migration/qemu-file.c:568
+ï¼
+ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ migration/qemu-file.c:648
+ï¼
+ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ address@hidden) at migration/colo.c:244
+ï¼
+ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ outï¼, address@hidden,
+ï¼ address@hidden)
+ï¼
+ï¼     at migration/colo.c:264
+ï¼
+ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼
+ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼
+ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼
+ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼
+ï¼ $3 = 0
+ï¼
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼
+ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ gmain.c:3054
+ï¼
+ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ address@hidden) at gmain.c:3630
+ï¼
+ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼
+ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ util/main-loop.c:258
+ï¼
+ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ util/main-loop.c:506
+ï¼
+ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼
+ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ outï¼) at vl.c:4709
+ï¼
+ï¼ (gdb) p ioc-ï¼features
+ï¼
+ï¼ $1 = 6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼
+ï¼
+ï¼ May be socket_accept_incoming_migration should
+ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼
+ï¼
+ï¼ thank you.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼ address@hidden
+ï¼ address@hidden
+ï¼ address@hidden@huawei.comï¼
+ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼
+ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼
+ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ Do the colo have any plan for development?
+ï¼
+ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼
+ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼
+ï¼ In our internal version can run it successfully,
+ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ Next time if you have some question about COLO,
+ï¼ please cc me and zhanghailiang address@hidden
+ï¼
+ï¼
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ (gdb) bt
+ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ --
+ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼
+ï¼ --
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+
+Hi,
+
+Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+
+Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+case COLO thread/incoming thread is stuck in read/write() while do failover,
+but it didn't take effect, because all the fd used by COLO (also migration)
+has been wrapped by qio channel, and it will not call the shutdown API if
+we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+
+Cc: Dr. David Alan Gilbert <address@hidden>
+
+I doubted migration cancel has the same problem, it may be stuck in write()
+if we tried to cancel migration.
+
+void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
+**errp)
+{
+    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
+    migration_channel_connect(s, ioc, NULL);
+    ... ...
+We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+and the
+migrate_fd_cancel()
+{
+ ... ...
+    if (s->state == MIGRATION_STATUS_CANCELLING && f) {
+        qemu_file_shutdown(f);  --> This will not take effect. No ?
+    }
+}
+
+Thanks,
+Hailiang
+
+On 2017/3/21 16:10, address@hidden wrote:
+Thank youã
+
+I have test areadyã
+
+When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+
+Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node qemu 
+will not produce the problem,but Primary Node panic canã
+
+I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+
+
+when failover,channel_shutdown could not shut down the channel.
+
+
+so the colo_process_incoming_thread will hang at recvmsg.
+
+
+I test a patch:
+
+
+diff --git a/migration/socket.c b/migration/socket.c
+
+
+index 13966f1..d65a0ea 100644
+
+
+--- a/migration/socket.c
+
+
++++ b/migration/socket.c
+
+
+@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
+*ioc,
+
+
+      }
+
+
+
+
+
+      trace_migration_socket_incoming_accepted()
+
+
+
+
+
+      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+
+
++    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+
+
+      migration_channel_process_incoming(migrate_get_current(),
+
+
+                                         QIO_CHANNEL(sioc))
+
+
+      object_unref(OBJECT(sioc))
+
+
+
+
+My test will not hang any more.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+æéäººï¼ address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+
+
+
+
+
+Hi,Wang.
+
+You can test this branch:
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+and please follow wiki ensure your own configuration correctly.
+http://wiki.qemu-project.org/Features/COLO
+Thanks
+
+Zhang Chen
+
+
+On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼
+ï¼ hi.
+ï¼
+ï¼ I test the git qemu master have the same problem.
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼
+ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ (address@hidden, address@hidden "",
+ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼
+ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ migration/qemu-file-channel.c:78
+ï¼
+ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ migration/qemu-file.c:295
+ï¼
+ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼
+ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ migration/qemu-file.c:568
+ï¼
+ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ migration/qemu-file.c:648
+ï¼
+ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ address@hidden) at migration/colo.c:244
+ï¼
+ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ outï¼, address@hidden,
+ï¼ address@hidden)
+ï¼
+ï¼     at migration/colo.c:264
+ï¼
+ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼
+ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼
+ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼
+ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼
+ï¼ $3 = 0
+ï¼
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼
+ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ gmain.c:3054
+ï¼
+ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ address@hidden) at gmain.c:3630
+ï¼
+ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼
+ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ util/main-loop.c:258
+ï¼
+ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ util/main-loop.c:506
+ï¼
+ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼
+ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ outï¼) at vl.c:4709
+ï¼
+ï¼ (gdb) p ioc-ï¼features
+ï¼
+ï¼ $1 = 6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼
+ï¼
+ï¼ May be socket_accept_incoming_migration should
+ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼
+ï¼
+ï¼ thank you.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼ address@hidden
+ï¼ address@hidden
+ï¼ address@hidden@huawei.comï¼
+ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼
+ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼
+ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ Do the colo have any plan for development?
+ï¼
+ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼
+ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼
+ï¼ In our internal version can run it successfully,
+ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ Next time if you have some question about COLO,
+ï¼ please cc me and zhanghailiang address@hidden
+ï¼
+ï¼
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ (gdb) bt
+ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ --
+ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼
+ï¼ --
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+
+* Hailiang Zhang (address@hidden) wrote:
+>
+Hi,
+>
+>
+Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+>
+>
+Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+>
+case COLO thread/incoming thread is stuck in read/write() while do failover,
+>
+but it didn't take effect, because all the fd used by COLO (also migration)
+>
+has been wrapped by qio channel, and it will not call the shutdown API if
+>
+we didn't qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+>
+>
+Cc: Dr. David Alan Gilbert <address@hidden>
+>
+>
+I doubted migration cancel has the same problem, it may be stuck in write()
+>
+if we tried to cancel migration.
+>
+>
+void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error
+>
+**errp)
+>
+{
+>
+qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
+>
+migration_channel_connect(s, ioc, NULL);
+>
+... ...
+>
+We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+>
+and the
+>
+migrate_fd_cancel()
+>
+{
+>
+... ...
+>
+if (s->state == MIGRATION_STATUS_CANCELLING && f) {
+>
+qemu_file_shutdown(f);  --> This will not take effect. No ?
+>
+}
+>
+}
+(cc'd in Daniel Berrange).
+I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
+at the
+top of qio_channel_socket_new;  so I think that's safe isn't it?
+
+Dave
+
+>
+Thanks,
+>
+Hailiang
+>
+>
+On 2017/3/21 16:10, address@hidden wrote:
+>
+> Thank youã
+>
+>
+>
+> I have test areadyã
+>
+>
+>
+> When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+>
+>
+>
+> Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node
+>
+> qemu will not produce the problem,but Primary Node panic canã
+>
+>
+>
+> I think due to the feature of channel does not support
+>
+> QIO_CHANNEL_FEATURE_SHUTDOWN.
+>
+>
+>
+>
+>
+> when failover,channel_shutdown could not shut down the channel.
+>
+>
+>
+>
+>
+> so the colo_process_incoming_thread will hang at recvmsg.
+>
+>
+>
+>
+>
+> I test a patch:
+>
+>
+>
+>
+>
+> diff --git a/migration/socket.c b/migration/socket.c
+>
+>
+>
+>
+>
+> index 13966f1..d65a0ea 100644
+>
+>
+>
+>
+>
+> --- a/migration/socket.c
+>
+>
+>
+>
+>
+> +++ b/migration/socket.c
+>
+>
+>
+>
+>
+> @@ -147,8 +147,9 @@ static gboolean
+>
+> socket_accept_incoming_migration(QIOChannel *ioc,
+>
+>
+>
+>
+>
+>       }
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>       trace_migration_socket_incoming_accepted()
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+>
+>
+>
+>
+>
+> +    qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+> QIO_CHANNEL_FEATURE_SHUTDOWN)
+>
+>
+>
+>
+>
+>       migration_channel_process_incoming(migrate_get_current(),
+>
+>
+>
+>
+>
+>                                          QIO_CHANNEL(sioc))
+>
+>
+>
+>
+>
+>       object_unref(OBJECT(sioc))
+>
+>
+>
+>
+>
+>
+>
+>
+>
+> My test will not hang any more.
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+> åå§é®ä»¶
+>
+>
+>
+>
+>
+>
+>
+> åä»¶äººï¼ address@hidden
+>
+> æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+>
+> æéäººï¼ address@hidden address@hidden
+>
+> æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+>
+> ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+>
+> Hi,Wang.
+>
+>
+>
+> You can test this branch:
+>
+>
+>
+>
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+>
+>
+>
+> and please follow wiki ensure your own configuration correctly.
+>
+>
+>
+>
+http://wiki.qemu-project.org/Features/COLO
+>
+>
+>
+>
+>
+> Thanks
+>
+>
+>
+> Zhang Chen
+>
+>
+>
+>
+>
+> On 03/21/2017 03:27 PM, address@hidden wrote:
+>
+> ï¼
+>
+> ï¼ hi.
+>
+> ï¼
+>
+> ï¼ I test the git qemu master have the same problem.
+>
+> ï¼
+>
+> ï¼ (gdb) bt
+>
+> ï¼
+>
+> ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+>
+> ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+>
+> ï¼
+>
+> ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+>
+> ï¼ (address@hidden, address@hidden "",
+>
+> ï¼ address@hidden, address@hidden) at io/channel.c:114
+>
+> ï¼
+>
+> ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+>
+> ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+>
+> ï¼ migration/qemu-file-channel.c:78
+>
+> ï¼
+>
+> ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+>
+> ï¼ migration/qemu-file.c:295
+>
+> ï¼
+>
+> ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+>
+> ï¼ address@hidden) at migration/qemu-file.c:555
+>
+> ï¼
+>
+> ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+>
+> ï¼ migration/qemu-file.c:568
+>
+> ï¼
+>
+> ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+>
+> ï¼ migration/qemu-file.c:648
+>
+> ï¼
+>
+> ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+>
+> ï¼ address@hidden) at migration/colo.c:244
+>
+> ï¼
+>
+> ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+>
+> ï¼ outï¼, address@hidden,
+>
+> ï¼ address@hidden)
+>
+> ï¼
+>
+> ï¼     at migration/colo.c:264
+>
+> ï¼
+>
+> ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+>
+> ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+>
+> ï¼
+>
+> ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+>
+> ï¼
+>
+> ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+>
+> ï¼
+>
+> ï¼ (gdb) p ioc-ï¼name
+>
+> ï¼
+>
+> ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+>
+> ï¼
+>
+> ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+>
+> ï¼
+>
+> ï¼ $3 = 0
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ (gdb) bt
+>
+> ï¼
+>
+> ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+>
+> ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+>
+> ï¼
+>
+> ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+>
+> ï¼ gmain.c:3054
+>
+> ï¼
+>
+> ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+>
+> ï¼ address@hidden) at gmain.c:3630
+>
+> ï¼
+>
+> ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+>
+> ï¼
+>
+> ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+>
+> ï¼ util/main-loop.c:258
+>
+> ï¼
+>
+> ï¼ #5  main_loop_wait (address@hidden) at
+>
+> ï¼ util/main-loop.c:506
+>
+> ï¼
+>
+> ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+>
+> ï¼
+>
+> ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+>
+> ï¼ outï¼) at vl.c:4709
+>
+> ï¼
+>
+> ï¼ (gdb) p ioc-ï¼features
+>
+> ï¼
+>
+> ï¼ $1 = 6
+>
+> ï¼
+>
+> ï¼ (gdb) p ioc-ï¼name
+>
+> ï¼
+>
+> ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ May be socket_accept_incoming_migration should
+>
+> ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ thank you.
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ åå§é®ä»¶
+>
+> ï¼ address@hidden
+>
+> ï¼ address@hidden
+>
+> ï¼ address@hidden@huawei.comï¼
+>
+> ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+>
+> ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+>
+> ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+>
+> ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+>
+> ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+>
+> ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+>
+> ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+>
+> ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼ I found that the colo in qemu is not complete yet.
+>
+> ï¼ ï¼ Do the colo have any plan for development?
+>
+> ï¼
+>
+> ï¼ Yes, We are developing. You can see some of patch we pushing.
+>
+> ï¼
+>
+> ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+>
+> ï¼
+>
+> ï¼ In our internal version can run it successfully,
+>
+> ï¼ The failover detail you can ask Zhanghailiang for help.
+>
+> ï¼ Next time if you have some question about COLO,
+>
+> ï¼ please cc me and zhanghailiang address@hidden
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ Thanks
+>
+> ï¼ Zhang Chen
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼ centos7.2+qemu2.7.50
+>
+> ï¼ ï¼ (gdb) bt
+>
+> ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+>
+> ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+>
+> ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0)
+>
+> at
+>
+> ï¼ ï¼ io/channel-socket.c:497
+>
+> ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+>
+> ï¼ ï¼ address@hidden "", address@hidden,
+>
+> ï¼ ï¼ address@hidden) at io/channel.c:97
+>
+> ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+>
+> ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+>
+> ï¼ ï¼ migration/qemu-file-channel.c:78
+>
+> ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+>
+> ï¼ ï¼ migration/qemu-file.c:257
+>
+> ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+>
+> ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+>
+> ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+>
+> ï¼ ï¼ migration/qemu-file.c:523
+>
+> ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+>
+> ï¼ ï¼ migration/qemu-file.c:603
+>
+> ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+>
+> ï¼ ï¼ address@hidden) at migration/colo.c:215
+>
+> ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+>
+> ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+>
+> ï¼ ï¼ migration/colo.c:546
+>
+> ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+>
+> ï¼ ï¼ migration/colo.c:649
+>
+> ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+>
+> ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼ --
+>
+> ï¼ ï¼ View this message in context:
+>
+>
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+>
+> ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼ ï¼
+>
+> ï¼
+>
+> ï¼ --
+>
+> ï¼ Thanks
+>
+> ï¼ Zhang Chen
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+> ï¼
+>
+>
+>
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
+* Hailiang Zhang (address@hidden) wrote:
+Hi,
+
+Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+
+Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
+case COLO thread/incoming thread is stuck in read/write() while do failover,
+but it didn't take effect, because all the fd used by COLO (also migration)
+has been wrapped by qio channel, and it will not call the shutdown API if
+we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN).
+
+Cc: Dr. David Alan Gilbert <address@hidden>
+
+I doubted migration cancel has the same problem, it may be stuck in write()
+if we tried to cancel migration.
+
+void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
+**errp)
+{
+     qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
+     migration_channel_connect(s, ioc, NULL);
+     ... ...
+We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
+QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+and the
+migrate_fd_cancel()
+{
+  ... ...
+     if (s->state == MIGRATION_STATUS_CANCELLING && f) {
+         qemu_file_shutdown(f);  --> This will not take effect. No ?
+     }
+}
+(cc'd in Daniel Berrange).
+I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
+at the
+top of qio_channel_socket_new;  so I think that's safe isn't it?
+Hmm, you are right, this problem is only exist for the migration incoming fd, 
+thanks.
+Dave
+Thanks,
+Hailiang
+
+On 2017/3/21 16:10, address@hidden wrote:
+Thank youã
+
+I have test areadyã
+
+When the Primary Node panic,the Secondary Node qemu hang at the same placeã
+
+Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary Node qemu 
+will not produce the problem,but Primary Node panic canã
+
+I think due to the feature of channel does not support 
+QIO_CHANNEL_FEATURE_SHUTDOWN.
+
+
+when failover,channel_shutdown could not shut down the channel.
+
+
+so the colo_process_incoming_thread will hang at recvmsg.
+
+
+I test a patch:
+
+
+diff --git a/migration/socket.c b/migration/socket.c
+
+
+index 13966f1..d65a0ea 100644
+
+
+--- a/migration/socket.c
+
+
++++ b/migration/socket.c
+
+
+@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
+*ioc,
+
+
+       }
+
+
+
+
+
+       trace_migration_socket_incoming_accepted()
+
+
+
+
+
+       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
+
+
++    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
+
+
+       migration_channel_process_incoming(migrate_get_current(),
+
+
+                                          QIO_CHANNEL(sioc))
+
+
+       object_unref(OBJECT(sioc))
+
+
+
+
+My test will not hang any more.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+åå§é®ä»¶
+
+
+
+åä»¶äººï¼ address@hidden
+æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+æéäººï¼ address@hidden address@hidden
+æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+
+
+
+
+
+Hi,Wang.
+
+You can test this branch:
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+and please follow wiki ensure your own configuration correctly.
+http://wiki.qemu-project.org/Features/COLO
+Thanks
+
+Zhang Chen
+
+
+On 03/21/2017 03:27 PM, address@hidden wrote:
+ï¼
+ï¼ hi.
+ï¼
+ï¼ I test the git qemu master have the same problem.
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+ï¼
+ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+ï¼ (address@hidden, address@hidden "",
+ï¼ address@hidden, address@hidden) at io/channel.c:114
+ï¼
+ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ migration/qemu-file-channel.c:78
+ï¼
+ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+ï¼ migration/qemu-file.c:295
+ï¼
+ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+ï¼ address@hidden) at migration/qemu-file.c:555
+ï¼
+ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+ï¼ migration/qemu-file.c:568
+ï¼
+ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+ï¼ migration/qemu-file.c:648
+ï¼
+ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+ï¼ address@hidden) at migration/colo.c:244
+ï¼
+ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+ï¼ outï¼, address@hidden,
+ï¼ address@hidden)
+ï¼
+ï¼     at migration/colo.c:264
+ï¼
+ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+ï¼
+ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+ï¼
+ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+ï¼
+ï¼ (gdb) p ioc-ï¼features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
+ï¼
+ï¼ $3 = 0
+ï¼
+ï¼
+ï¼ (gdb) bt
+ï¼
+ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+ï¼
+ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+ï¼ gmain.c:3054
+ï¼
+ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+ï¼ address@hidden) at gmain.c:3630
+ï¼
+ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+ï¼
+ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+ï¼ util/main-loop.c:258
+ï¼
+ï¼ #5  main_loop_wait (address@hidden) at
+ï¼ util/main-loop.c:506
+ï¼
+ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+ï¼
+ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+ï¼ outï¼) at vl.c:4709
+ï¼
+ï¼ (gdb) p ioc-ï¼features
+ï¼
+ï¼ $1 = 6
+ï¼
+ï¼ (gdb) p ioc-ï¼name
+ï¼
+ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+ï¼
+ï¼
+ï¼ May be socket_accept_incoming_migration should
+ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+ï¼
+ï¼
+ï¼ thank you.
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ åå§é®ä»¶
+ï¼ address@hidden
+ï¼ address@hidden
+ï¼ address@hidden@huawei.comï¼
+ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+ï¼ ï¼
+ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+ï¼ ï¼
+ï¼ ï¼ I found that the colo in qemu is not complete yet.
+ï¼ ï¼ Do the colo have any plan for development?
+ï¼
+ï¼ Yes, We are developing. You can see some of patch we pushing.
+ï¼
+ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+ï¼
+ï¼ In our internal version can run it successfully,
+ï¼ The failover detail you can ask Zhanghailiang for help.
+ï¼ Next time if you have some question about COLO,
+ï¼ please cc me and zhanghailiang address@hidden
+ï¼
+ï¼
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ centos7.2+qemu2.7.50
+ï¼ ï¼ (gdb) bt
+ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼,
+ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at
+ï¼ ï¼ io/channel-socket.c:497
+ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+ï¼ ï¼ address@hidden "", address@hidden,
+ï¼ ï¼ address@hidden) at io/channel.c:97
+ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼,
+ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+ï¼ ï¼ migration/qemu-file-channel.c:78
+ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+ï¼ ï¼ migration/qemu-file.c:257
+ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:523
+ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+ï¼ ï¼ migration/qemu-file.c:603
+ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+ï¼ ï¼ address@hidden) at migration/colo.c:215
+ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
+ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+ï¼ ï¼ migration/colo.c:546
+ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+ï¼ ï¼ migration/colo.c:649
+ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
+ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼ --
+ï¼ ï¼ View this message in context:
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼ ï¼
+ï¼
+ï¼ --
+ï¼ Thanks
+ï¼ Zhang Chen
+ï¼
+ï¼
+ï¼
+ï¼
+ï¼
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+.
+
+* Hailiang Zhang (address@hidden) wrote:
+>
+On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
+>
+> * Hailiang Zhang (address@hidden) wrote:
+>
+> > Hi,
+>
+> >
+>
+> > Thanks for reporting this, and i confirmed it in my test, and it is a bug.
+>
+> >
+>
+> > Though we tried to call qemu_file_shutdown() to shutdown the related fd,
+>
+> > in
+>
+> > case COLO thread/incoming thread is stuck in read/write() while do
+>
+> > failover,
+>
+> > but it didn't take effect, because all the fd used by COLO (also
+>
+> > migration)
+>
+> > has been wrapped by qio channel, and it will not call the shutdown API if
+>
+> > we didn't qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+> > QIO_CHANNEL_FEATURE_SHUTDOWN).
+>
+> >
+>
+> > Cc: Dr. David Alan Gilbert <address@hidden>
+>
+> >
+>
+> > I doubted migration cancel has the same problem, it may be stuck in
+>
+> > write()
+>
+> > if we tried to cancel migration.
+>
+> >
+>
+> > void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
+>
+> > Error **errp)
+>
+> > {
+>
+> >      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
+>
+> >      migration_channel_connect(s, ioc, NULL);
+>
+> >      ... ...
+>
+> > We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+> > QIO_CHANNEL_FEATURE_SHUTDOWN) above,
+>
+> > and the
+>
+> > migrate_fd_cancel()
+>
+> > {
+>
+> >   ... ...
+>
+> >      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
+>
+> >          qemu_file_shutdown(f);  --> This will not take effect. No ?
+>
+> >      }
+>
+> > }
+>
+>
+>
+> (cc'd in Daniel Berrange).
+>
+> I see that we call qio_channel_set_feature(ioc,
+>
+> QIO_CHANNEL_FEATURE_SHUTDOWN); at the
+>
+> top of qio_channel_socket_new;  so I think that's safe isn't it?
+>
+>
+>
+>
+Hmm, you are right, this problem is only exist for the migration incoming fd,
+>
+thanks.
+Yes, and I don't think we normally do a cancel on the incoming side of a 
+migration.
+
+Dave
+
+>
+> Dave
+>
+>
+>
+> > Thanks,
+>
+> > Hailiang
+>
+> >
+>
+> > On 2017/3/21 16:10, address@hidden wrote:
+>
+> > > Thank youã
+>
+> > >
+>
+> > > I have test areadyã
+>
+> > >
+>
+> > > When the Primary Node panic,the Secondary Node qemu hang at the same
+>
+> > > placeã
+>
+> > >
+>
+> > > Incorrding
+http://wiki.qemu-project.org/Features/COLO
+ï¼kill Primary
+>
+> > > Node qemu will not produce the problem,but Primary Node panic canã
+>
+> > >
+>
+> > > I think due to the feature of channel does not support
+>
+> > > QIO_CHANNEL_FEATURE_SHUTDOWN.
+>
+> > >
+>
+> > >
+>
+> > > when failover,channel_shutdown could not shut down the channel.
+>
+> > >
+>
+> > >
+>
+> > > so the colo_process_incoming_thread will hang at recvmsg.
+>
+> > >
+>
+> > >
+>
+> > > I test a patch:
+>
+> > >
+>
+> > >
+>
+> > > diff --git a/migration/socket.c b/migration/socket.c
+>
+> > >
+>
+> > >
+>
+> > > index 13966f1..d65a0ea 100644
+>
+> > >
+>
+> > >
+>
+> > > --- a/migration/socket.c
+>
+> > >
+>
+> > >
+>
+> > > +++ b/migration/socket.c
+>
+> > >
+>
+> > >
+>
+> > > @@ -147,8 +147,9 @@ static gboolean
+>
+> > > socket_accept_incoming_migration(QIOChannel *ioc,
+>
+> > >
+>
+> > >
+>
+> > >        }
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >        trace_migration_socket_incoming_accepted()
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >        qio_channel_set_name(QIO_CHANNEL(sioc),
+>
+> > > "migration-socket-incoming")
+>
+> > >
+>
+> > >
+>
+> > > +    qio_channel_set_feature(QIO_CHANNEL(sioc),
+>
+> > > QIO_CHANNEL_FEATURE_SHUTDOWN)
+>
+> > >
+>
+> > >
+>
+> > >        migration_channel_process_incoming(migrate_get_current(),
+>
+> > >
+>
+> > >
+>
+> > >                                           QIO_CHANNEL(sioc))
+>
+> > >
+>
+> > >
+>
+> > >        object_unref(OBJECT(sioc))
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > My test will not hang any more.
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > åå§é®ä»¶
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > åä»¶äººï¼ address@hidden
+>
+> > > æ¶ä»¶äººï¼çå¹¿10165992 address@hidden
+>
+> > > æéäººï¼ address@hidden address@hidden
+>
+> > > æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58
+>
+> > > ä¸» é¢ ï¼Re: [Qemu-devel]  çå¤: Re:  [BUG]COLO failover hang
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > Hi,Wang.
+>
+> > >
+>
+> > > You can test this branch:
+>
+> > >
+>
+> > >
+https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
+>
+> > >
+>
+> > > and please follow wiki ensure your own configuration correctly.
+>
+> > >
+>
+> > >
+http://wiki.qemu-project.org/Features/COLO
+>
+> > >
+>
+> > >
+>
+> > > Thanks
+>
+> > >
+>
+> > > Zhang Chen
+>
+> > >
+>
+> > >
+>
+> > > On 03/21/2017 03:27 PM, address@hidden wrote:
+>
+> > > ï¼
+>
+> > > ï¼ hi.
+>
+> > > ï¼
+>
+> > > ï¼ I test the git qemu master have the same problem.
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) bt
+>
+> > > ï¼
+>
+> > > ï¼ #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
+>
+> > > ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
+>
+> > > ï¼
+>
+> > > ï¼ #1  0x00007f658e4aa0c2 in qio_channel_read
+>
+> > > ï¼ (address@hidden, address@hidden "",
+>
+> > > ï¼ address@hidden, address@hidden) at io/channel.c:114
+>
+> > > ï¼
+>
+> > > ï¼ #2  0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼,
+>
+> > > ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at
+>
+> > > ï¼ migration/qemu-file-channel.c:78
+>
+> > > ï¼
+>
+> > > ï¼ #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
+>
+> > > ï¼ migration/qemu-file.c:295
+>
+> > > ï¼
+>
+> > > ï¼ #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
+>
+> > > ï¼ address@hidden) at migration/qemu-file.c:555
+>
+> > > ï¼
+>
+> > > ï¼ #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
+>
+> > > ï¼ migration/qemu-file.c:568
+>
+> > > ï¼
+>
+> > > ï¼ #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
+>
+> > > ï¼ migration/qemu-file.c:648
+>
+> > > ï¼
+>
+> > > ï¼ #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
+>
+> > > ï¼ address@hidden) at migration/colo.c:244
+>
+> > > ï¼
+>
+> > > ï¼ #8  0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized
+>
+> > > ï¼ outï¼, address@hidden,
+>
+> > > ï¼ address@hidden)
+>
+> > > ï¼
+>
+> > > ï¼     at migration/colo.c:264
+>
+> > > ï¼
+>
+> > > ï¼ #9  0x00007f658e3e740e in colo_process_incoming_thread
+>
+> > > ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577
+>
+> > > ï¼
+>
+> > > ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
+>
+> > > ï¼
+>
+> > > ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) p ioc-ï¼name
+>
+> > > ï¼
+>
+> > > ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) p ioc-ï¼features        Do not support
+>
+> > > QIO_CHANNEL_FEATURE_SHUTDOWN
+>
+> > > ï¼
+>
+> > > ï¼ $3 = 0
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) bt
+>
+> > > ï¼
+>
+> > > ï¼ #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
+>
+> > > ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
+>
+> > > ï¼
+>
+> > > ï¼ #1  0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at
+>
+> > > ï¼ gmain.c:3054
+>
+> > > ï¼
+>
+> > > ï¼ #2  g_main_context_dispatch (context=ï¼optimized outï¼,
+>
+> > > ï¼ address@hidden) at gmain.c:3630
+>
+> > > ï¼
+>
+> > > ï¼ #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
+>
+> > > ï¼
+>
+> > > ï¼ #4  os_host_main_loop_wait (timeout=ï¼optimized outï¼) at
+>
+> > > ï¼ util/main-loop.c:258
+>
+> > > ï¼
+>
+> > > ï¼ #5  main_loop_wait (address@hidden) at
+>
+> > > ï¼ util/main-loop.c:506
+>
+> > > ï¼
+>
+> > > ï¼ #6  0x00007fdccb526187 in main_loop () at vl.c:1898
+>
+> > > ï¼
+>
+> > > ï¼ #7  main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized
+>
+> > > ï¼ outï¼) at vl.c:4709
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) p ioc-ï¼features
+>
+> > > ï¼
+>
+> > > ï¼ $1 = 6
+>
+> > > ï¼
+>
+> > > ï¼ (gdb) p ioc-ï¼name
+>
+> > > ï¼
+>
+> > > ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ May be socket_accept_incoming_migration should
+>
+> > > ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ thank you.
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ åå§é®ä»¶
+>
+> > > ï¼ address@hidden
+>
+> > > ï¼ address@hidden
+>
+> > > ï¼ address@hidden@huawei.comï¼
+>
+> > > ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46
+>
+> > > ï¼ *ä¸» é¢ ï¼**Re: [Qemu-devel] COLO failover hang*
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ On 03/15/2017 05:06 PM, wangguang wrote:
+>
+> > > ï¼ ï¼   am testing QEMU COLO feature described here [QEMU
+>
+> > > ï¼ ï¼ Wiki](
+http://wiki.qemu-project.org/Features/COLO
+).
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang.
+>
+> > > ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv.
+>
+> > > ï¼ ï¼ And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
+>
+> > > ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's
+>
+> > > ï¼ ï¼ monitor,the  Secondary Node qemu still hang at recvmsg .
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼ I found that the colo in qemu is not complete yet.
+>
+> > > ï¼ ï¼ Do the colo have any plan for development?
+>
+> > > ï¼
+>
+> > > ï¼ Yes, We are developing. You can see some of patch we pushing.
+>
+> > > ï¼
+>
+> > > ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated!
+>
+> > > ï¼
+>
+> > > ï¼ In our internal version can run it successfully,
+>
+> > > ï¼ The failover detail you can ask Zhanghailiang for help.
+>
+> > > ï¼ Next time if you have some question about COLO,
+>
+> > > ï¼ please cc me and zhanghailiang address@hidden
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ Thanks
+>
+> > > ï¼ Zhang Chen
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼ centos7.2+qemu2.7.50
+>
+> > > ï¼ ï¼ (gdb) bt
+>
+> > > ï¼ ï¼ #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
+>
+> > > ï¼ ï¼ #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized
+>
+> > > outï¼,
+>
+> > > ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0,
+>
+> > > errp=0x0) at
+>
+> > > ï¼ ï¼ io/channel-socket.c:497
+>
+> > > ï¼ ï¼ #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
+>
+> > > ï¼ ï¼ address@hidden "", address@hidden,
+>
+> > > ï¼ ï¼ address@hidden) at io/channel.c:97
+>
+> > > ï¼ ï¼ #3  0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized
+>
+> > > outï¼,
+>
+> > > ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at
+>
+> > > ï¼ ï¼ migration/qemu-file-channel.c:78
+>
+> > > ï¼ ï¼ #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
+>
+> > > ï¼ ï¼ migration/qemu-file.c:257
+>
+> > > ï¼ ï¼ #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
+>
+> > > ï¼ ï¼ address@hidden) at migration/qemu-file.c:510
+>
+> > > ï¼ ï¼ #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
+>
+> > > ï¼ ï¼ migration/qemu-file.c:523
+>
+> > > ï¼ ï¼ #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
+>
+> > > ï¼ ï¼ migration/qemu-file.c:603
+>
+> > > ï¼ ï¼ #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
+>
+> > > ï¼ ï¼ address@hidden) at migration/colo.c:215
+>
+> > > ï¼ ï¼ #9  0x00007f3e0327250d in colo_wait_handle_message
+>
+> > > (errp=0x7f3d62bfaa48,
+>
+> > > ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at
+>
+> > > ï¼ ï¼ migration/colo.c:546
+>
+> > > ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
+>
+> > > ï¼ ï¼ migration/colo.c:649
+>
+> > > ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from
+>
+> > > /lib64/libpthread.so.0
+>
+> > > ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼ --
+>
+> > > ï¼ ï¼ View this message in context:
+>
+> > >
+http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
+>
+> > > ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com.
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼ ï¼
+>
+> > > ï¼
+>
+> > > ï¼ --
+>
+> > > ï¼ Thanks
+>
+> > > ï¼ Zhang Chen
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > > ï¼
+>
+> > >
+>
+> >
+>
+> --
+>
+> Dr. David Alan Gilbert / address@hidden / Manchester, UK
+>
+>
+>
+> .
+>
+>
+>
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
diff --git a/results/classifier/004/other/66743673 b/results/classifier/004/other/66743673
new file mode 100644
index 00000000..5562ad60
--- /dev/null
+++ b/results/classifier/004/other/66743673
@@ -0,0 +1,372 @@
+other: 0.967
+assembly: 0.954
+semantic: 0.951
+boot: 0.938
+instruction: 0.930
+network: 0.930
+graphic: 0.927
+socket: 0.926
+vnc: 0.926
+device: 0.926
+KVM: 0.891
+mistranslation: 0.855
+
+[Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT / CMP_LEG bits
+
+Hi Community,
+
+This email contains 3 bugs appear to share the same root cause.
+
+[1] We ran into the following warnings when running QEMU v10.0.0 in TCG mode:
+
+qemu-system-x86_64 \
+  -machine q35 \
+  -m 4G -smp 4 \
+  -kernel ./arch/x86/boot/bzImage \
+  -bios /usr/share/ovmf/OVMF.fd \
+  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
+  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
+  -nographic \
+  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
+qemu-system-x86_64: warning: TCG doesn't support requested feature:
+CPUID.01H:EDX.ht [bit 28]
+qemu-system-x86_64: warning: TCG doesn't support requested feature:
+CPUID.80000001H:ECX.cmp-legacy [bit 1]
+(repeats 4 times, once per vCPU)
+Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up CPUID_HT in
+x86_cpu_expand_features() instead of cpu_x86_cpuid()" is what introduced the
+warnings.
+Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) and
+CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT support, these
+bits trigger the warnings above.
+[2] Also, Zhao pointed me to a similar report on GitLab:
+https://gitlab.com/qemu-project/qemu/-/issues/2894
+The symptoms there look identical to what we're seeing.
+By convention we file one issue per email, but these two appear to share the
+same root cause, so I'm describing them together here.
+[3] My colleague Alan noticed what appears to be a related problem: if we launch
+a guest with '-cpu <model>,-ht --enable-kvm', which means explicitly removing
+the ht flag, but the guest still reports HT(cat /proc/cpuinfo in linux guest)
+enabled. In other words, under KVM the ht bit seems to be forced on even when
+the user tries to disable it.
+Best regards,
+Ewan
+
+On 4/29/25 11:02 AM, Ewan Hai wrote:
+Hi Community,
+
+This email contains 3 bugs appear to share the same root cause.
+
+[1] We ran into the following warnings when running QEMU v10.0.0 in TCG mode:
+
+qemu-system-x86_64 \
+ Â  -machine q35 \
+ Â  -m 4G -smp 4 \
+ Â  -kernel ./arch/x86/boot/bzImage \
+ Â  -bios /usr/share/ovmf/OVMF.fd \
+ Â  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
+ Â  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
+ Â  -nographic \
+ Â  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
+qemu-system-x86_64: warning: TCG doesn't support requested feature:
+CPUID.01H:EDX.ht [bit 28]
+qemu-system-x86_64: warning: TCG doesn't support requested feature:
+CPUID.80000001H:ECX.cmp-legacy [bit 1]
+(repeats 4 times, once per vCPU)
+Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up CPUID_HT in
+x86_cpu_expand_features() instead of cpu_x86_cpuid()" is what introduced the
+warnings.
+Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) and
+CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT support, these
+bits trigger the warnings above.
+[2] Also, Zhao pointed me to a similar report on GitLab:
+https://gitlab.com/qemu-project/qemu/-/issues/2894
+The symptoms there look identical to what we're seeing.
+By convention we file one issue per email, but these two appear to share the
+same root cause, so I'm describing them together here.
+[3] My colleague Alan noticed what appears to be a related problem: if we launch
+a guest with '-cpu <model>,-ht --enable-kvm', which means explicitly removing
+the ht flag, but the guest still reports HT(cat /proc/cpuinfo in linux guest)
+enabled. In other words, under KVM the ht bit seems to be forced on even when
+the user tries to disable it.
+XiaoYao reminded me that issue [3] stems from a different patch. Please ignore
+it for nowâI'll start a separate thread to discuss that one independently.
+Best regards,
+Ewan
+
+On 4/29/2025 11:02 AM, Ewan Hai wrote:
+Hi Community,
+
+This email contains 3 bugs appear to share the same root cause.
+[1] We ran into the following warnings when running QEMU v10.0.0 in TCG
+mode:
+qemu-system-x86_64 \
+ Â  -machine q35 \
+ Â  -m 4G -smp 4 \
+ Â  -kernel ./arch/x86/boot/bzImage \
+ Â  -bios /usr/share/ovmf/OVMF.fd \
+ Â  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
+ Â  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
+ Â  -nographic \
+ Â  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
+qemu-system-x86_64: warning: TCG doesn't support requested feature:
+CPUID.01H:EDX.ht [bit 28]
+qemu-system-x86_64: warning: TCG doesn't support requested feature:
+CPUID.80000001H:ECX.cmp-legacy [bit 1]
+(repeats 4 times, once per vCPU)
+Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up
+CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is
+what introduced the warnings.
+Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28])
+and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT
+support, these bits trigger the warnings above.
+[2] Also, Zhao pointed me to a similar report on GitLab:
+https://gitlab.com/qemu-project/qemu/-/issues/2894
+The symptoms there look identical to what we're seeing.
+By convention we file one issue per email, but these two appear to share
+the same root cause, so I'm describing them together here.
+It was caused by my two patches. I think the fix can be as follow.
+If no objection from the community, I can submit the formal patch.
+
+diff --git a/target/i386/cpu.c b/target/i386/cpu.c
+index 1f970aa4daa6..fb95aadd6161 100644
+--- a/target/i386/cpu.c
++++ b/target/i386/cpu.c
+@@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
+vendor1,
+CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \
+           CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \
+           CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \
+-          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE)
++          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \
++          CPUID_HT)
+           /* partly implemented:
+           CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */
+           /* missing:
+-          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */
++          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */
+
+ /*
+  * Kernel-only features that can be shown to usermode programs even if
+@@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
+vendor1,
+#define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
+           CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \
+-          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES)
++          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \
++          CPUID_EXT3_CMP_LEG)
+
+ #define TCG_EXT4_FEATURES 0
+[3] My colleague Alan noticed what appears to be a related problem: if
+we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
+explicitly removing the ht flag, but the guest still reports HT(cat /
+proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
+bit seems to be forced on even when the user tries to disable it.
+This has been the behavior of QEMU for many years, not some regression
+introduced by my patches. We can discuss how to address it separately.
+Best regards,
+Ewan
+
+On Tue, Apr 29, 2025 at 01:55:59PM +0800, Xiaoyao Li wrote:
+>
+Date: Tue, 29 Apr 2025 13:55:59 +0800
+>
+From: Xiaoyao Li <xiaoyao.li@intel.com>
+>
+Subject: Re: [Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT /
+>
+CMP_LEG bits
+>
+>
+On 4/29/2025 11:02 AM, Ewan Hai wrote:
+>
+> Hi Community,
+>
+>
+>
+> This email contains 3 bugs appear to share the same root cause.
+>
+>
+>
+> [1] We ran into the following warnings when running QEMU v10.0.0 in TCG
+>
+> mode:
+>
+>
+>
+> qemu-system-x86_64 \
+>
+>  Â  -machine q35 \
+>
+>  Â  -m 4G -smp 4 \
+>
+>  Â  -kernel ./arch/x86/boot/bzImage \
+>
+>  Â  -bios /usr/share/ovmf/OVMF.fd \
+>
+>  Â  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
+>
+>  Â  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
+>
+>  Â  -nographic \
+>
+>  Â  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
+>
+>
+>
+> qemu-system-x86_64: warning: TCG doesn't support requested feature:
+>
+> CPUID.01H:EDX.ht [bit 28]
+>
+> qemu-system-x86_64: warning: TCG doesn't support requested feature:
+>
+> CPUID.80000001H:ECX.cmp-legacy [bit 1]
+>
+> (repeats 4 times, once per vCPU)
+>
+>
+>
+> Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up
+>
+> CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is
+>
+> what introduced the warnings.
+>
+>
+>
+> Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28])
+>
+> and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT
+>
+> support, these bits trigger the warnings above.
+>
+>
+>
+> [2] Also, Zhao pointed me to a similar report on GitLab:
+>
+>
+https://gitlab.com/qemu-project/qemu/-/issues/2894
+>
+> The symptoms there look identical to what we're seeing.
+>
+>
+>
+> By convention we file one issue per email, but these two appear to share
+>
+> the same root cause, so I'm describing them together here.
+>
+>
+It was caused by my two patches. I think the fix can be as follow.
+>
+If no objection from the community, I can submit the formal patch.
+>
+>
+diff --git a/target/i386/cpu.c b/target/i386/cpu.c
+>
+index 1f970aa4daa6..fb95aadd6161 100644
+>
+--- a/target/i386/cpu.c
+>
++++ b/target/i386/cpu.c
+>
+@@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
+>
+vendor1,
+>
+CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \
+>
+CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \
+>
+CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \
+>
+-          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE)
+>
++          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \
+>
++          CPUID_HT)
+>
+/* partly implemented:
+>
+CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */
+>
+/* missing:
+>
+-          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */
+>
++          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */
+>
+>
+/*
+>
+* Kernel-only features that can be shown to usermode programs even if
+>
+@@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
+>
+vendor1,
+>
+>
+#define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
+>
+CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \
+>
+-          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES)
+>
++          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \
+>
++          CPUID_EXT3_CMP_LEG)
+>
+>
+#define TCG_EXT4_FEATURES 0
+This fix is fine for me...at least from SDM, HTT depends on topology and
+it should exist when user sets "-smp 4".
+
+>
+> [3] My colleague Alan noticed what appears to be a related problem: if
+>
+> we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
+>
+> explicitly removing the ht flag, but the guest still reports HT(cat
+>
+> /proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
+>
+> bit seems to be forced on even when the user tries to disable it.
+>
+>
+XiaoYao reminded me that issue [3] stems from a different patch. Please
+>
+ignore it for nowâI'll start a separate thread to discuss that one
+>
+independently.
+I haven't found any other thread :-).
+
+By the way, just curious, in what cases do you need to disbale the HT
+flag? "-smp 4" means 4 cores with 1 thread per core, and is it not
+enough?
+
+As for the â-htâ behavior, I'm also unsure whether this should be fixed
+or not - one possible consideration is whether â-htâ would be useful.
+
+On 5/8/25 5:04 PM, Zhao Liu wrote:
+[3] My colleague Alan noticed what appears to be a related problem: if
+we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
+explicitly removing the ht flag, but the guest still reports HT(cat
+/proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
+bit seems to be forced on even when the user tries to disable it.
+XiaoYao reminded me that issue [3] stems from a different patch. Please
+ignore it for nowâI'll start a separate thread to discuss that one
+independently.
+I haven't found any other thread :-).
+Please refer to
+https://lore.kernel.org/all/db6ae3bb-f4e5-4719-9beb-623fcff56af2@zhaoxin.com/
+.
+By the way, just curious, in what cases do you need to disbale the HT
+flag? "-smp 4" means 4 cores with 1 thread per core, and is it not
+enough?
+
+As for the â-htâ behavior, I'm also unsure whether this should be fixed
+or not - one possible consideration is whether â-htâ would be useful.
+I wasn't trying to target any specific use case, using "-ht" was simply a way to
+check how the ht feature behaves under both KVM and TCG. There's no special
+workload behind it; I just wanted to confirm that the flag is respected (or not)
+in each mode.
+
diff --git a/results/classifier/004/other/68897003 b/results/classifier/004/other/68897003
new file mode 100644
index 00000000..5076ebda
--- /dev/null
+++ b/results/classifier/004/other/68897003
@@ -0,0 +1,724 @@
+other: 0.714
+assembly: 0.697
+graphic: 0.694
+semantic: 0.671
+device: 0.647
+instruction: 0.641
+network: 0.614
+KVM: 0.598
+socket: 0.585
+boot: 0.569
+mistranslation: 0.535
+vnc: 0.525
+
+[Qemu-devel] [BUG] VM abort after migration
+
+Hi guys,
+
+We found a qemu core in our testing environment, the assertion
+'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
+the bus->irq_count[i] is '-1'.
+
+Through analysis, it was happened after VM migration and we think
+it was caused by the following sequence:
+
+*Migration Source*
+1. save bus pci.0 state, including irq_count[x] ( =0 , old )
+2. save E1000:
+   e1000_pre_save
+    e1000_mit_timer
+     set_interrupt_cause
+      pci_set_irq --> update pci_dev->irq_state to 1 and
+                  update bus->irq_count[x] to 1 ( new )
+    the irq_state sent to dest.
+
+*Migration Dest*
+1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
+2. If the e1000 need change irqline , it would call to pci_irq_handler(),
+  the irq_state maybe change to 0 and bus->irq_count[x] will become
+  -1 in this situation.
+3. do VM reboot then the assertion will be triggered.
+
+We also found some guys faced the similar problem:
+[1]
+https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
+[2]
+https://bugs.launchpad.net/qemu/+bug/1702621
+Is there some patches to fix this problem ?
+Can we save pcibus state after all the pci devs are saved ?
+
+Thanks,
+Longpeng(Mike)
+
+* longpeng (address@hidden) wrote:
+>
+Hi guys,
+>
+>
+We found a qemu core in our testing environment, the assertion
+>
+'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
+>
+the bus->irq_count[i] is '-1'.
+>
+>
+Through analysis, it was happened after VM migration and we think
+>
+it was caused by the following sequence:
+>
+>
+*Migration Source*
+>
+1. save bus pci.0 state, including irq_count[x] ( =0 , old )
+>
+2. save E1000:
+>
+e1000_pre_save
+>
+e1000_mit_timer
+>
+set_interrupt_cause
+>
+pci_set_irq --> update pci_dev->irq_state to 1 and
+>
+update bus->irq_count[x] to 1 ( new )
+>
+the irq_state sent to dest.
+>
+>
+*Migration Dest*
+>
+1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
+>
+2. If the e1000 need change irqline , it would call to pci_irq_handler(),
+>
+the irq_state maybe change to 0 and bus->irq_count[x] will become
+>
+-1 in this situation.
+>
+3. do VM reboot then the assertion will be triggered.
+>
+>
+We also found some guys faced the similar problem:
+>
+[1]
+https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
+>
+[2]
+https://bugs.launchpad.net/qemu/+bug/1702621
+>
+>
+Is there some patches to fix this problem ?
+I don't remember any.
+
+>
+Can we save pcibus state after all the pci devs are saved ?
+Does this problem only happen with e1000? I think so.
+If it's only e1000 I think we should fix it - I think once the VM is
+stopped for doing the device migration it shouldn't be raising
+interrupts.
+
+Dave
+
+>
+Thanks,
+>
+Longpeng(Mike)
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
+* longpeng (address@hidden) wrote:
+Hi guys,
+
+We found a qemu core in our testing environment, the assertion
+'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
+the bus->irq_count[i] is '-1'.
+
+Through analysis, it was happened after VM migration and we think
+it was caused by the following sequence:
+
+*Migration Source*
+1. save bus pci.0 state, including irq_count[x] ( =0 , old )
+2. save E1000:
+    e1000_pre_save
+     e1000_mit_timer
+      set_interrupt_cause
+       pci_set_irq --> update pci_dev->irq_state to 1 and
+                   update bus->irq_count[x] to 1 ( new )
+     the irq_state sent to dest.
+
+*Migration Dest*
+1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
+2. If the e1000 need change irqline , it would call to pci_irq_handler(),
+   the irq_state maybe change to 0 and bus->irq_count[x] will become
+   -1 in this situation.
+3. do VM reboot then the assertion will be triggered.
+
+We also found some guys faced the similar problem:
+[1]
+https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
+[2]
+https://bugs.launchpad.net/qemu/+bug/1702621
+Is there some patches to fix this problem ?
+I don't remember any.
+Can we save pcibus state after all the pci devs are saved ?
+Does this problem only happen with e1000? I think so.
+If it's only e1000 I think we should fix it - I think once the VM is
+stopped for doing the device migration it shouldn't be raising
+interrupts.
+I wonder maybe we can simply fix this by no setting ICS on pre_save()
+but scheduling mit timer unconditionally in post_load().
+Thanks
+Dave
+Thanks,
+Longpeng(Mike)
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+å¨ 2019/7/10 11:25, Jason Wang åé:
+>
+>
+On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
+>
+> * longpeng (address@hidden) wrote:
+>
+>> Hi guys,
+>
+>>
+>
+>> We found a qemu core in our testing environment, the assertion
+>
+>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
+>
+>> the bus->irq_count[i] is '-1'.
+>
+>>
+>
+>> Through analysis, it was happened after VM migration and we think
+>
+>> it was caused by the following sequence:
+>
+>>
+>
+>> *Migration Source*
+>
+>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old )
+>
+>> 2. save E1000:
+>
+>> Â Â Â  e1000_pre_save
+>
+>> Â Â Â Â  e1000_mit_timer
+>
+>> Â Â Â Â Â  set_interrupt_cause
+>
+>> Â Â Â Â Â Â  pci_set_irq --> update pci_dev->irq_state to 1 and
+>
+>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  update bus->irq_count[x] to 1 ( new )
+>
+>> Â Â Â Â  the irq_state sent to dest.
+>
+>>
+>
+>> *Migration Dest*
+>
+>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
+>
+>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(),
+>
+>> Â Â  the irq_state maybe change to 0 and bus->irq_count[x] will become
+>
+>> Â Â  -1 in this situation.
+>
+>> 3. do VM reboot then the assertion will be triggered.
+>
+>>
+>
+>> We also found some guys faced the similar problem:
+>
+>> [1]
+https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
+>
+>> [2]
+https://bugs.launchpad.net/qemu/+bug/1702621
+>
+>>
+>
+>> Is there some patches to fix this problem ?
+>
+> I don't remember any.
+>
+>
+>
+>> Can we save pcibus state after all the pci devs are saved ?
+>
+> Does this problem only happen with e1000? I think so.
+>
+> If it's only e1000 I think we should fix it - I think once the VM is
+>
+> stopped for doing the device migration it shouldn't be raising
+>
+> interrupts.
+>
+>
+>
+I wonder maybe we can simply fix this by no setting ICS on pre_save() but
+>
+scheduling mit timer unconditionally in post_load().
+>
+I also think this is a bug of e1000 because we find more cores with the same
+frame thease days.
+
+I'm not familiar with e1000 so hope someone could fix it, thanks. :)
+
+>
+Thanks
+>
+>
+>
+>
+>
+> Dave
+>
+>
+>
+>> Thanks,
+>
+>> Longpeng(Mike)
+>
+> --
+>
+> Dr. David Alan Gilbert / address@hidden / Manchester, UK
+>
+>
+.
+>
+-- 
+Regards,
+Longpeng(Mike)
+
+On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote:
+å¨ 2019/7/10 11:25, Jason Wang åé:
+On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
+* longpeng (address@hidden) wrote:
+Hi guys,
+
+We found a qemu core in our testing environment, the assertion
+'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
+the bus->irq_count[i] is '-1'.
+
+Through analysis, it was happened after VM migration and we think
+it was caused by the following sequence:
+
+*Migration Source*
+1. save bus pci.0 state, including irq_count[x] ( =0 , old )
+2. save E1000:
+ Â Â Â  e1000_pre_save
+ Â Â Â Â  e1000_mit_timer
+ Â Â Â Â Â  set_interrupt_cause
+ Â Â Â Â Â Â  pci_set_irq --> update pci_dev->irq_state to 1 and
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  update bus->irq_count[x] to 1 ( new )
+ Â Â Â Â  the irq_state sent to dest.
+
+*Migration Dest*
+1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
+2. If the e1000 need change irqline , it would call to pci_irq_handler(),
+ Â Â  the irq_state maybe change to 0 and bus->irq_count[x] will become
+ Â Â  -1 in this situation.
+3. do VM reboot then the assertion will be triggered.
+
+We also found some guys faced the similar problem:
+[1]
+https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
+[2]
+https://bugs.launchpad.net/qemu/+bug/1702621
+Is there some patches to fix this problem ?
+I don't remember any.
+Can we save pcibus state after all the pci devs are saved ?
+Does this problem only happen with e1000? I think so.
+If it's only e1000 I think we should fix it - I think once the VM is
+stopped for doing the device migration it shouldn't be raising
+interrupts.
+I wonder maybe we can simply fix this by no setting ICS on pre_save() but
+scheduling mit timer unconditionally in post_load().
+I also think this is a bug of e1000 because we find more cores with the same
+frame thease days.
+
+I'm not familiar with e1000 so hope someone could fix it, thanks. :)
+Draft a path in attachment, please test.
+
+Thanks
+Thanks
+Dave
+Thanks,
+Longpeng(Mike)
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+.
+0001-e1000-don-t-raise-interrupt-in-pre_save.patch
+Description:
+Text Data
+
+å¨ 2019/7/10 11:57, Jason Wang åé:
+>
+>
+On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote:
+>
+> å¨ 2019/7/10 11:25, Jason Wang åé:
+>
+>> On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
+>
+>>> * longpeng (address@hidden) wrote:
+>
+>>>> Hi guys,
+>
+>>>>
+>
+>>>> We found a qemu core in our testing environment, the assertion
+>
+>>>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
+>
+>>>> the bus->irq_count[i] is '-1'.
+>
+>>>>
+>
+>>>> Through analysis, it was happened after VM migration and we think
+>
+>>>> it was caused by the following sequence:
+>
+>>>>
+>
+>>>> *Migration Source*
+>
+>>>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old )
+>
+>>>> 2. save E1000:
+>
+>>>> Â Â Â Â  e1000_pre_save
+>
+>>>> Â Â Â Â Â  e1000_mit_timer
+>
+>>>> Â Â Â Â Â Â  set_interrupt_cause
+>
+>>>> Â Â Â Â Â Â Â  pci_set_irq --> update pci_dev->irq_state to 1 and
+>
+>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  update bus->irq_count[x] to 1 ( new )
+>
+>>>> Â Â Â Â Â  the irq_state sent to dest.
+>
+>>>>
+>
+>>>> *Migration Dest*
+>
+>>>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is
+>
+>>>> 1.
+>
+>>>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(),
+>
+>>>> Â Â Â  the irq_state maybe change to 0 and bus->irq_count[x] will become
+>
+>>>> Â Â Â  -1 in this situation.
+>
+>>>> 3. do VM reboot then the assertion will be triggered.
+>
+>>>>
+>
+>>>> We also found some guys faced the similar problem:
+>
+>>>> [1]
+https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
+>
+>>>> [2]
+https://bugs.launchpad.net/qemu/+bug/1702621
+>
+>>>>
+>
+>>>> Is there some patches to fix this problem ?
+>
+>>> I don't remember any.
+>
+>>>
+>
+>>>> Can we save pcibus state after all the pci devs are saved ?
+>
+>>> Does this problem only happen with e1000? I think so.
+>
+>>> If it's only e1000 I think we should fix it - I think once the VM is
+>
+>>> stopped for doing the device migration it shouldn't be raising
+>
+>>> interrupts.
+>
+>>
+>
+>> I wonder maybe we can simply fix this by no setting ICS on pre_save() but
+>
+>> scheduling mit timer unconditionally in post_load().
+>
+>>
+>
+> I also think this is a bug of e1000 because we find more cores with the same
+>
+> frame thease days.
+>
+>
+>
+> I'm not familiar with e1000 so hope someone could fix it, thanks. :)
+>
+>
+>
+>
+Draft a path in attachment, please test.
+>
+Thanks. We'll test it for a few weeks and then give you the feedback. :)
+
+>
+Thanks
+>
+>
+>
+>> Thanks
+>
+>>
+>
+>>
+>
+>>> Dave
+>
+>>>
+>
+>>>> Thanks,
+>
+>>>> Longpeng(Mike)
+>
+>>> --Â
+>
+>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
+>
+>> .
+>
+>>
+-- 
+Regards,
+Longpeng(Mike)
+
+å¨ 2019/7/10 11:57, Jason Wang åé:
+>
+>
+On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote:
+>
+> å¨ 2019/7/10 11:25, Jason Wang åé:
+>
+>> On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
+>
+>>> * longpeng (address@hidden) wrote:
+>
+>>>> Hi guys,
+>
+>>>>
+>
+>>>> We found a qemu core in our testing environment, the assertion
+>
+>>>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
+>
+>>>> the bus->irq_count[i] is '-1'.
+>
+>>>>
+>
+>>>> Through analysis, it was happened after VM migration and we think
+>
+>>>> it was caused by the following sequence:
+>
+>>>>
+>
+>>>> *Migration Source*
+>
+>>>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old )
+>
+>>>> 2. save E1000:
+>
+>>>> Â Â Â Â  e1000_pre_save
+>
+>>>> Â Â Â Â Â  e1000_mit_timer
+>
+>>>> Â Â Â Â Â Â  set_interrupt_cause
+>
+>>>> Â Â Â Â Â Â Â  pci_set_irq --> update pci_dev->irq_state to 1 and
+>
+>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  update bus->irq_count[x] to 1 ( new )
+>
+>>>> Â Â Â Â Â  the irq_state sent to dest.
+>
+>>>>
+>
+>>>> *Migration Dest*
+>
+>>>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is
+>
+>>>> 1.
+>
+>>>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(),
+>
+>>>> Â Â Â  the irq_state maybe change to 0 and bus->irq_count[x] will become
+>
+>>>> Â Â Â  -1 in this situation.
+>
+>>>> 3. do VM reboot then the assertion will be triggered.
+>
+>>>>
+>
+>>>> We also found some guys faced the similar problem:
+>
+>>>> [1]
+https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
+>
+>>>> [2]
+https://bugs.launchpad.net/qemu/+bug/1702621
+>
+>>>>
+>
+>>>> Is there some patches to fix this problem ?
+>
+>>> I don't remember any.
+>
+>>>
+>
+>>>> Can we save pcibus state after all the pci devs are saved ?
+>
+>>> Does this problem only happen with e1000? I think so.
+>
+>>> If it's only e1000 I think we should fix it - I think once the VM is
+>
+>>> stopped for doing the device migration it shouldn't be raising
+>
+>>> interrupts.
+>
+>>
+>
+>> I wonder maybe we can simply fix this by no setting ICS on pre_save() but
+>
+>> scheduling mit timer unconditionally in post_load().
+>
+>>
+>
+> I also think this is a bug of e1000 because we find more cores with the same
+>
+> frame thease days.
+>
+>
+>
+> I'm not familiar with e1000 so hope someone could fix it, thanks. :)
+>
+>
+>
+>
+Draft a path in attachment, please test.
+>
+Hi Jason,
+
+We've tested the patch for about two weeks, everything went well, thanks!
+
+Feel free to add my:
+Reported-and-tested-by: Longpeng <address@hidden>
+
+>
+Thanks
+>
+>
+>
+>> Thanks
+>
+>>
+>
+>>
+>
+>>> Dave
+>
+>>>
+>
+>>>> Thanks,
+>
+>>>> Longpeng(Mike)
+>
+>>> --Â
+>
+>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
+>
+>> .
+>
+>>
+-- 
+Regards,
+Longpeng(Mike)
+
+On 2019/7/27 ä¸å2:10, Longpeng (Mike) wrote:
+å¨ 2019/7/10 11:57, Jason Wang åé:
+On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote:
+å¨ 2019/7/10 11:25, Jason Wang åé:
+On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote:
+* longpeng (address@hidden) wrote:
+Hi guys,
+
+We found a qemu core in our testing environment, the assertion
+'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
+the bus->irq_count[i] is '-1'.
+
+Through analysis, it was happened after VM migration and we think
+it was caused by the following sequence:
+
+*Migration Source*
+1. save bus pci.0 state, including irq_count[x] ( =0 , old )
+2. save E1000:
+ Â Â Â Â  e1000_pre_save
+ Â Â Â Â Â  e1000_mit_timer
+ Â Â Â Â Â Â  set_interrupt_cause
+ Â Â Â Â Â Â Â  pci_set_irq --> update pci_dev->irq_state to 1 and
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  update bus->irq_count[x] to 1 ( new )
+ Â Â Â Â Â  the irq_state sent to dest.
+
+*Migration Dest*
+1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
+2. If the e1000 need change irqline , it would call to pci_irq_handler(),
+ Â Â Â  the irq_state maybe change to 0 and bus->irq_count[x] will become
+ Â Â Â  -1 in this situation.
+3. do VM reboot then the assertion will be triggered.
+
+We also found some guys faced the similar problem:
+[1]
+https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
+[2]
+https://bugs.launchpad.net/qemu/+bug/1702621
+Is there some patches to fix this problem ?
+I don't remember any.
+Can we save pcibus state after all the pci devs are saved ?
+Does this problem only happen with e1000? I think so.
+If it's only e1000 I think we should fix it - I think once the VM is
+stopped for doing the device migration it shouldn't be raising
+interrupts.
+I wonder maybe we can simply fix this by no setting ICS on pre_save() but
+scheduling mit timer unconditionally in post_load().
+I also think this is a bug of e1000 because we find more cores with the same
+frame thease days.
+
+I'm not familiar with e1000 so hope someone could fix it, thanks. :)
+Draft a path in attachment, please test.
+Hi Jason,
+
+We've tested the patch for about two weeks, everything went well, thanks!
+
+Feel free to add my:
+Reported-and-tested-by: Longpeng <address@hidden>
+Applied.
+
+Thanks
+Thanks
+Thanks
+Dave
+Thanks,
+Longpeng(Mike)
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+.
+
diff --git a/results/classifier/004/other/70021271 b/results/classifier/004/other/70021271
new file mode 100644
index 00000000..b8e3e535
--- /dev/null
+++ b/results/classifier/004/other/70021271
@@ -0,0 +1,7456 @@
+other: 0.963
+graphic: 0.958
+KVM: 0.949
+semantic: 0.946
+mistranslation: 0.929
+assembly: 0.910
+vnc: 0.900
+device: 0.887
+instruction: 0.880
+socket: 0.873
+network: 0.873
+boot: 0.872
+
+[Qemu-devel] [BUG]Unassigned mem write during pci device hot-plug
+
+Hi all,
+
+In our test, we configured VM with several pci-bridges and a virtio-net nic 
+been attached with bus 4,
+After VM is startup, We ping this nic from host to judge if it is working 
+normally. Then, we hot add pci devices to this VM with bus 0.
+We  found the virtio-net NIC in bus 4 is not working (can not connect) 
+occasionally, as it kick virtio backend failure with error below:
+    Unassigned mem write 00000000fc803004 = 0x1
+
+memory-region: pci_bridge_pci
+  0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
+    00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
+      00000000fc800000-00000000fc800fff (prio 0, RW): virtio-pci-common
+      00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
+      00000000fc802000-00000000fc802fff (prio 0, RW): virtio-pci-device
+      00000000fc803000-00000000fc803fff (prio 0, RW): virtio-pci-notify  <- io 
+mem unassigned
+      â¦
+
+We caught an exceptional address changing while this problem happened, show as 
+follow:
+Before pci_bridge_update_mappingsï¼
+      00000000fc000000-00000000fc1fffff (prio 1, RW): alias pci_bridge_pref_mem 
+@pci_bridge_pci 00000000fc000000-00000000fc1fffff
+      00000000fc200000-00000000fc3fffff (prio 1, RW): alias pci_bridge_pref_mem 
+@pci_bridge_pci 00000000fc200000-00000000fc3fffff
+      00000000fc400000-00000000fc5fffff (prio 1, RW): alias pci_bridge_pref_mem 
+@pci_bridge_pci 00000000fc400000-00000000fc5fffff
+      00000000fc600000-00000000fc7fffff (prio 1, RW): alias pci_bridge_pref_mem 
+@pci_bridge_pci 00000000fc600000-00000000fc7fffff
+      00000000fc800000-00000000fc9fffff (prio 1, RW): alias pci_bridge_pref_mem 
+@pci_bridge_pci 00000000fc800000-00000000fc9fffff <- correct Adress Spce
+      00000000fca00000-00000000fcbfffff (prio 1, RW): alias pci_bridge_pref_mem 
+@pci_bridge_pci 00000000fca00000-00000000fcbfffff
+      00000000fcc00000-00000000fcdfffff (prio 1, RW): alias pci_bridge_pref_mem 
+@pci_bridge_pci 00000000fcc00000-00000000fcdfffff
+      00000000fce00000-00000000fcffffff (prio 1, RW): alias pci_bridge_pref_mem 
+@pci_bridge_pci 00000000fce00000-00000000fcffffff
+
+After pci_bridge_update_mappingsï¼
+      00000000fda00000-00000000fdbfffff (prio 1, RW): alias pci_bridge_mem 
+@pci_bridge_pci 00000000fda00000-00000000fdbfffff
+      00000000fdc00000-00000000fddfffff (prio 1, RW): alias pci_bridge_mem 
+@pci_bridge_pci 00000000fdc00000-00000000fddfffff
+      00000000fde00000-00000000fdffffff (prio 1, RW): alias pci_bridge_mem 
+@pci_bridge_pci 00000000fde00000-00000000fdffffff
+      00000000fe000000-00000000fe1fffff (prio 1, RW): alias pci_bridge_mem 
+@pci_bridge_pci 00000000fe000000-00000000fe1fffff
+      00000000fe200000-00000000fe3fffff (prio 1, RW): alias pci_bridge_mem 
+@pci_bridge_pci 00000000fe200000-00000000fe3fffff
+      00000000fe400000-00000000fe5fffff (prio 1, RW): alias pci_bridge_mem 
+@pci_bridge_pci 00000000fe400000-00000000fe5fffff
+      00000000fe600000-00000000fe7fffff (prio 1, RW): alias pci_bridge_mem 
+@pci_bridge_pci 00000000fe600000-00000000fe7fffff
+      00000000fe800000-00000000fe9fffff (prio 1, RW): alias pci_bridge_mem 
+@pci_bridge_pci 00000000fe800000-00000000fe9fffff
+      fffffffffc800000-fffffffffc800000 (prio 1, RW): alias pci_bridge_pref_mem 
+@pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress Space
+
+We have figured out why this address becomes this value,  according to pci 
+spec,  pci driver can get BAR address size by writing 0xffffffff to
+the pci register firstly, and then read back the value from this register.
+We didn't handle this value  specially while process pci write in qemu, the 
+function call stack is:
+Pci_bridge_dev_write_config
+-> pci_bridge_write_config
+-> pci_default_write_config (we update the config[address] value here to 
+fffffffffc800000, which should be 0xfc800000 )
+-> pci_bridge_update_mappings
+                ->pci_bridge_region_del(br, br->windows);
+-> pci_bridge_region_init
+                                                                
+->pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong value 
+fffffffffc800000)
+                                                -> 
+memory_region_transaction_commit
+
+So, as we can see, we use the wrong base address in qemu to update the memory 
+regions, though, we update the base address to
+The correct value after pci driver in VM write the original value back, the 
+virtio NIC in bus 4 may still sends net packets concurrently with
+The wrong memory region address.
+
+We have tried to skip the memory region update action in qemu while detect pci 
+write with 0xffffffff value, and it does work, but
+This seems to be not gently.
+
+diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
+index b2e50c3..84b405d 100644
+--- a/hw/pci/pci_bridge.c
++++ b/hw/pci/pci_bridge.c
+@@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
+     pci_default_write_config(d, address, val, len);
+-    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
++    if ( (val != 0xffffffff) &&
++        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+         /* io base/limit */
+         ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+@@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
+         ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
+         /* vga enable */
+-        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
++        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
+         pci_bridge_update_mappings(s);
+     }
+
+Thinks,
+Xu
+
+On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+Hi all,
+>
+>
+>
+>
+In our test, we configured VM with several pci-bridges and a virtio-net nic
+>
+been attached with bus 4,
+>
+>
+After VM is startup, We ping this nic from host to judge if it is working
+>
+normally. Then, we hot add pci devices to this VM with bus 0.
+>
+>
+We  found the virtio-net NIC in bus 4 is not working (can not connect)
+>
+occasionally, as it kick virtio backend failure with error below:
+>
+>
+Unassigned mem write 00000000fc803004 = 0x1
+Thanks for the report. Which guest was used to produce this problem?
+
+-- 
+MST
+
+n Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+> Hi all,
+>
+>
+>
+>
+>
+>
+>
+> In our test, we configured VM with several pci-bridges and a
+>
+> virtio-net nic been attached with bus 4,
+>
+>
+>
+> After VM is startup, We ping this nic from host to judge if it is
+>
+> working normally. Then, we hot add pci devices to this VM with bus 0.
+>
+>
+>
+> We  found the virtio-net NIC in bus 4 is not working (can not connect)
+>
+> occasionally, as it kick virtio backend failure with error below:
+>
+>
+>
+>     Unassigned mem write 00000000fc803004 = 0x1
+>
+>
+Thanks for the report. Which guest was used to produce this problem?
+>
+>
+--
+>
+MST
+I was seeing this problem when I hotplug a VFIO device to guest CentOS 7.4,
+after that I compiled the latest Linux kernel and it also contains this problem.
+
+Thinks,
+Xu
+
+On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+Hi all,
+>
+>
+>
+>
+In our test, we configured VM with several pci-bridges and a virtio-net nic
+>
+been attached with bus 4,
+>
+>
+After VM is startup, We ping this nic from host to judge if it is working
+>
+normally. Then, we hot add pci devices to this VM with bus 0.
+>
+>
+We  found the virtio-net NIC in bus 4 is not working (can not connect)
+>
+occasionally, as it kick virtio backend failure with error below:
+>
+>
+Unassigned mem write 00000000fc803004 = 0x1
+>
+>
+>
+>
+memory-region: pci_bridge_pci
+>
+>
+0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
+>
+>
+00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
+>
+>
+00000000fc800000-00000000fc800fff (prio 0, RW): virtio-pci-common
+>
+>
+00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
+>
+>
+00000000fc802000-00000000fc802fff (prio 0, RW): virtio-pci-device
+>
+>
+00000000fc803000-00000000fc803fff (prio 0, RW): virtio-pci-notify  <- io
+>
+mem unassigned
+>
+>
+â¦
+>
+>
+>
+>
+We caught an exceptional address changing while this problem happened, show as
+>
+follow:
+>
+>
+Before pci_bridge_update_mappingsï¼
+>
+>
+00000000fc000000-00000000fc1fffff (prio 1, RW): alias
+>
+pci_bridge_pref_mem
+>
+@pci_bridge_pci 00000000fc000000-00000000fc1fffff
+>
+>
+00000000fc200000-00000000fc3fffff (prio 1, RW): alias
+>
+pci_bridge_pref_mem
+>
+@pci_bridge_pci 00000000fc200000-00000000fc3fffff
+>
+>
+00000000fc400000-00000000fc5fffff (prio 1, RW): alias
+>
+pci_bridge_pref_mem
+>
+@pci_bridge_pci 00000000fc400000-00000000fc5fffff
+>
+>
+00000000fc600000-00000000fc7fffff (prio 1, RW): alias
+>
+pci_bridge_pref_mem
+>
+@pci_bridge_pci 00000000fc600000-00000000fc7fffff
+>
+>
+00000000fc800000-00000000fc9fffff (prio 1, RW): alias
+>
+pci_bridge_pref_mem
+>
+@pci_bridge_pci 00000000fc800000-00000000fc9fffff <- correct Adress Spce
+>
+>
+00000000fca00000-00000000fcbfffff (prio 1, RW): alias
+>
+pci_bridge_pref_mem
+>
+@pci_bridge_pci 00000000fca00000-00000000fcbfffff
+>
+>
+00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
+>
+pci_bridge_pref_mem
+>
+@pci_bridge_pci 00000000fcc00000-00000000fcdfffff
+>
+>
+00000000fce00000-00000000fcffffff (prio 1, RW): alias
+>
+pci_bridge_pref_mem
+>
+@pci_bridge_pci 00000000fce00000-00000000fcffffff
+>
+>
+>
+>
+After pci_bridge_update_mappingsï¼
+>
+>
+00000000fda00000-00000000fdbfffff (prio 1, RW): alias pci_bridge_mem
+>
+@pci_bridge_pci 00000000fda00000-00000000fdbfffff
+>
+>
+00000000fdc00000-00000000fddfffff (prio 1, RW): alias pci_bridge_mem
+>
+@pci_bridge_pci 00000000fdc00000-00000000fddfffff
+>
+>
+00000000fde00000-00000000fdffffff (prio 1, RW): alias pci_bridge_mem
+>
+@pci_bridge_pci 00000000fde00000-00000000fdffffff
+>
+>
+00000000fe000000-00000000fe1fffff (prio 1, RW): alias pci_bridge_mem
+>
+@pci_bridge_pci 00000000fe000000-00000000fe1fffff
+>
+>
+00000000fe200000-00000000fe3fffff (prio 1, RW): alias pci_bridge_mem
+>
+@pci_bridge_pci 00000000fe200000-00000000fe3fffff
+>
+>
+00000000fe400000-00000000fe5fffff (prio 1, RW): alias pci_bridge_mem
+>
+@pci_bridge_pci 00000000fe400000-00000000fe5fffff
+>
+>
+00000000fe600000-00000000fe7fffff (prio 1, RW): alias pci_bridge_mem
+>
+@pci_bridge_pci 00000000fe600000-00000000fe7fffff
+>
+>
+00000000fe800000-00000000fe9fffff (prio 1, RW): alias pci_bridge_mem
+>
+@pci_bridge_pci 00000000fe800000-00000000fe9fffff
+>
+>
+fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
+>
+pci_bridge_pref_mem
+>
+@pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress
+>
+Space
+This one is empty though right?
+
+>
+>
+>
+We have figured out why this address becomes this value,  according to pci
+>
+spec,  pci driver can get BAR address size by writing 0xffffffff to
+>
+>
+the pci register firstly, and then read back the value from this register.
+OK however as you show below the BAR being sized is the BAR
+if a bridge. Are you then adding a bridge device by hotplug?
+
+
+
+>
+We didn't handle this value  specially while process pci write in qemu, the
+>
+function call stack is:
+>
+>
+Pci_bridge_dev_write_config
+>
+>
+-> pci_bridge_write_config
+>
+>
+-> pci_default_write_config (we update the config[address] value here to
+>
+fffffffffc800000, which should be 0xfc800000 )
+>
+>
+-> pci_bridge_update_mappings
+>
+>
+->pci_bridge_region_del(br, br->windows);
+>
+>
+-> pci_bridge_region_init
+>
+>
+->
+>
+pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong value
+>
+fffffffffc800000)
+>
+>
+->
+>
+memory_region_transaction_commit
+>
+>
+>
+>
+So, as we can see, we use the wrong base address in qemu to update the memory
+>
+regions, though, we update the base address to
+>
+>
+The correct value after pci driver in VM write the original value back, the
+>
+virtio NIC in bus 4 may still sends net packets concurrently with
+>
+>
+The wrong memory region address.
+>
+>
+>
+>
+We have tried to skip the memory region update action in qemu while detect pci
+>
+write with 0xffffffff value, and it does work, but
+>
+>
+This seems to be not gently.
+For sure. But I'm still puzzled as to why does Linux try to
+size the BAR of the bridge while a device behind it is
+used.
+
+Can you pls post your QEMU command line?
+
+
+
+>
+>
+>
+diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
+>
+>
+index b2e50c3..84b405d 100644
+>
+>
+--- a/hw/pci/pci_bridge.c
+>
+>
++++ b/hw/pci/pci_bridge.c
+>
+>
+@@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+>
+pci_default_write_config(d, address, val, len);
+>
+>
+-    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+>
++    if ( (val != 0xffffffff) &&
+>
+>
++        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+>
+/* io base/limit */
+>
+>
+ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
+>
+@@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+>
+ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
+>
+>
+/* vga enable */
+>
+>
+-        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
+>
+>
++        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
+>
+>
+pci_bridge_update_mappings(s);
+>
+>
+}
+>
+>
+>
+>
+Thinks,
+>
+>
+Xu
+>
+
+On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+> Hi all,
+>
+>
+>
+>
+>
+>
+>
+> In our test, we configured VM with several pci-bridges and a
+>
+> virtio-net nic been attached with bus 4,
+>
+>
+>
+> After VM is startup, We ping this nic from host to judge if it is
+>
+> working normally. Then, we hot add pci devices to this VM with bus 0.
+>
+>
+>
+> We  found the virtio-net NIC in bus 4 is not working (can not connect)
+>
+> occasionally, as it kick virtio backend failure with error below:
+>
+>
+>
+>     Unassigned mem write 00000000fc803004 = 0x1
+>
+>
+>
+>
+>
+>
+>
+> memory-region: pci_bridge_pci
+>
+>
+>
+>   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
+>
+>
+>
+>     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
+>
+>
+>
+>       00000000fc800000-00000000fc800fff (prio 0, RW):
+>
+> virtio-pci-common
+>
+>
+>
+>       00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
+>
+>
+>
+>       00000000fc802000-00000000fc802fff (prio 0, RW):
+>
+> virtio-pci-device
+>
+>
+>
+>       00000000fc803000-00000000fc803fff (prio 0, RW):
+>
+> virtio-pci-notify  <- io mem unassigned
+>
+>
+>
+>       â¦
+>
+>
+>
+>
+>
+>
+>
+> We caught an exceptional address changing while this problem happened,
+>
+> show as
+>
+> follow:
+>
+>
+>
+> Before pci_bridge_update_mappingsï¼
+>
+>
+>
+>       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
+>
+> pci_bridge_pref_mem @pci_bridge_pci 00000000fc000000-00000000fc1fffff
+>
+>
+>
+>       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
+>
+> pci_bridge_pref_mem @pci_bridge_pci 00000000fc200000-00000000fc3fffff
+>
+>
+>
+>       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
+>
+> pci_bridge_pref_mem @pci_bridge_pci 00000000fc400000-00000000fc5fffff
+>
+>
+>
+>       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
+>
+> pci_bridge_pref_mem @pci_bridge_pci 00000000fc600000-00000000fc7fffff
+>
+>
+>
+>       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
+>
+> pci_bridge_pref_mem @pci_bridge_pci 00000000fc800000-00000000fc9fffff
+>
+> <- correct Adress Spce
+>
+>
+>
+>       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
+>
+> pci_bridge_pref_mem @pci_bridge_pci 00000000fca00000-00000000fcbfffff
+>
+>
+>
+>       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
+>
+> pci_bridge_pref_mem @pci_bridge_pci 00000000fcc00000-00000000fcdfffff
+>
+>
+>
+>       00000000fce00000-00000000fcffffff (prio 1, RW): alias
+>
+> pci_bridge_pref_mem @pci_bridge_pci 00000000fce00000-00000000fcffffff
+>
+>
+>
+>
+>
+>
+>
+> After pci_bridge_update_mappingsï¼
+>
+>
+>
+>       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
+>
+> pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
+>
+>
+>
+>       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
+>
+> pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
+>
+>
+>
+>       00000000fde00000-00000000fdffffff (prio 1, RW): alias
+>
+> pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
+>
+>
+>
+>       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
+>
+> pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
+>
+>
+>
+>       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
+>
+> pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
+>
+>
+>
+>       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
+>
+> pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
+>
+>
+>
+>       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
+>
+> pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
+>
+>
+>
+>       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
+>
+> pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
+>
+>
+>
+>       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
+>
+> pci_bridge_pref_mem
+>
+> @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress
+>
+Space
+>
+>
+This one is empty though right?
+>
+>
+>
+>
+>
+>
+> We have figured out why this address becomes this value,  according to
+>
+> pci spec,  pci driver can get BAR address size by writing 0xffffffff
+>
+> to
+>
+>
+>
+> the pci register firstly, and then read back the value from this register.
+>
+>
+>
+OK however as you show below the BAR being sized is the BAR if a bridge. Are
+>
+you then adding a bridge device by hotplug?
+No, I just simply hot plugged a VFIO device to Bus 0, another interesting 
+phenomenon is
+If I hot plug the device to other bus, this doesn't happened.
+ 
+>
+>
+>
+> We didn't handle this value  specially while process pci write in
+>
+> qemu, the function call stack is:
+>
+>
+>
+> Pci_bridge_dev_write_config
+>
+>
+>
+> -> pci_bridge_write_config
+>
+>
+>
+> -> pci_default_write_config (we update the config[address] value here
+>
+> -> to
+>
+> fffffffffc800000, which should be 0xfc800000 )
+>
+>
+>
+> -> pci_bridge_update_mappings
+>
+>
+>
+>                 ->pci_bridge_region_del(br, br->windows);
+>
+>
+>
+> -> pci_bridge_region_init
+>
+>
+>
+>                                                                 ->
+>
+> pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
+>
+> value
+>
+> fffffffffc800000)
+>
+>
+>
+>                                                 ->
+>
+> memory_region_transaction_commit
+>
+>
+>
+>
+>
+>
+>
+> So, as we can see, we use the wrong base address in qemu to update the
+>
+> memory regions, though, we update the base address to
+>
+>
+>
+> The correct value after pci driver in VM write the original value
+>
+> back, the virtio NIC in bus 4 may still sends net packets concurrently
+>
+> with
+>
+>
+>
+> The wrong memory region address.
+>
+>
+>
+>
+>
+>
+>
+> We have tried to skip the memory region update action in qemu while
+>
+> detect pci write with 0xffffffff value, and it does work, but
+>
+>
+>
+> This seems to be not gently.
+>
+>
+For sure. But I'm still puzzled as to why does Linux try to size the BAR of
+>
+the
+>
+bridge while a device behind it is used.
+>
+>
+Can you pls post your QEMU command line?
+My QEMU command line:
+/root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -object 
+secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-Linux/master-key.aes
+ -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu 
+host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m 
+size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp 
+20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -numa 
+node,nodeid=1,cpus=5-9,mem=1024 -numa node,nodeid=2,cpus=10-14,mem=1024 -numa 
+node,nodeid=3,cpus=15-19,mem=1024 -uuid 34a588c7-b0f2-4952-b39c-47fae3411439 
+-no-user-config -nodefaults -chardev 
+socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/monitor.sock,server,nowait
+ -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet 
+-global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device 
+pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device 
+pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device 
+pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device 
+pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device 
+pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device 
+piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
+usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device 
+nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device 
+virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device 
+virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device 
+virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device 
+virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device 
+virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive 
+file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none
+ -device 
+virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
+ -drive if=none,id=drive-ide0-1-1,readonly=on,cache=none -device 
+ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev 
+tap,fd=35,id=hostnet0 -device 
+virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4,addr=0x1 
+-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
+-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device 
+cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device 
+virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
+
+I am also very curious about this issue, in the linux kernel code, maybe double 
+check in function pci_bridge_check_ranges triggered this problem.
+
+
+>
+>
+>
+>
+>
+>
+>
+>
+> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
+>
+>
+>
+> index b2e50c3..84b405d 100644
+>
+>
+>
+> --- a/hw/pci/pci_bridge.c
+>
+>
+>
+> +++ b/hw/pci/pci_bridge.c
+>
+>
+>
+> @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+>
+>
+>      pci_default_write_config(d, address, val, len);
+>
+>
+>
+> -    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+>
+>
+> +    if ( (val != 0xffffffff) &&
+>
+>
+>
+> +        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+>
+>
+>          /* io base/limit */
+>
+>
+>
+>          ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
+>
+>
+> @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+>
+>
+>          ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
+>
+>
+>
+>          /* vga enable */
+>
+>
+>
+> -        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
+>
+>
+>
+> +        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
+>
+>
+>
+>          pci_bridge_update_mappings(s);
+>
+>
+>
+>      }
+>
+>
+>
+>
+>
+>
+>
+> Thinks,
+>
+>
+>
+> Xu
+>
+>
+
+On Mon, Dec 10, 2018 at 03:12:53AM +0000, xuyandong wrote:
+>
+On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+> > Hi all,
+>
+> >
+>
+> >
+>
+> >
+>
+> > In our test, we configured VM with several pci-bridges and a
+>
+> > virtio-net nic been attached with bus 4,
+>
+> >
+>
+> > After VM is startup, We ping this nic from host to judge if it is
+>
+> > working normally. Then, we hot add pci devices to this VM with bus 0.
+>
+> >
+>
+> > We  found the virtio-net NIC in bus 4 is not working (can not connect)
+>
+> > occasionally, as it kick virtio backend failure with error below:
+>
+> >
+>
+> >     Unassigned mem write 00000000fc803004 = 0x1
+>
+> >
+>
+> >
+>
+> >
+>
+> > memory-region: pci_bridge_pci
+>
+> >
+>
+> >   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
+>
+> >
+>
+> >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
+>
+> >
+>
+> >       00000000fc800000-00000000fc800fff (prio 0, RW):
+>
+> > virtio-pci-common
+>
+> >
+>
+> >       00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
+>
+> >
+>
+> >       00000000fc802000-00000000fc802fff (prio 0, RW):
+>
+> > virtio-pci-device
+>
+> >
+>
+> >       00000000fc803000-00000000fc803fff (prio 0, RW):
+>
+> > virtio-pci-notify  <- io mem unassigned
+>
+> >
+>
+> >       â¦
+>
+> >
+>
+> >
+>
+> >
+>
+> > We caught an exceptional address changing while this problem happened,
+>
+> > show as
+>
+> > follow:
+>
+> >
+>
+> > Before pci_bridge_update_mappingsï¼
+>
+> >
+>
+> >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
+>
+> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc000000-00000000fc1fffff
+>
+> >
+>
+> >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
+>
+> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc200000-00000000fc3fffff
+>
+> >
+>
+> >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
+>
+> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc400000-00000000fc5fffff
+>
+> >
+>
+> >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
+>
+> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc600000-00000000fc7fffff
+>
+> >
+>
+> >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
+>
+> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc800000-00000000fc9fffff
+>
+> > <- correct Adress Spce
+>
+> >
+>
+> >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
+>
+> > pci_bridge_pref_mem @pci_bridge_pci 00000000fca00000-00000000fcbfffff
+>
+> >
+>
+> >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
+>
+> > pci_bridge_pref_mem @pci_bridge_pci 00000000fcc00000-00000000fcdfffff
+>
+> >
+>
+> >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
+>
+> > pci_bridge_pref_mem @pci_bridge_pci 00000000fce00000-00000000fcffffff
+>
+> >
+>
+> >
+>
+> >
+>
+> > After pci_bridge_update_mappingsï¼
+>
+> >
+>
+> >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
+>
+> > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
+>
+> >
+>
+> >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
+>
+> > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
+>
+> >
+>
+> >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
+>
+> > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
+>
+> >
+>
+> >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
+>
+> > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
+>
+> >
+>
+> >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
+>
+> > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
+>
+> >
+>
+> >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
+>
+> > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
+>
+> >
+>
+> >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
+>
+> > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
+>
+> >
+>
+> >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
+>
+> > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
+>
+> >
+>
+> >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
+>
+> > pci_bridge_pref_mem
+>
+> > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress
+>
+> Space
+>
+>
+>
+> This one is empty though right?
+>
+>
+>
+> >
+>
+> >
+>
+> > We have figured out why this address becomes this value,  according to
+>
+> > pci spec,  pci driver can get BAR address size by writing 0xffffffff
+>
+> > to
+>
+> >
+>
+> > the pci register firstly, and then read back the value from this register.
+>
+>
+>
+>
+>
+> OK however as you show below the BAR being sized is the BAR if a bridge. Are
+>
+> you then adding a bridge device by hotplug?
+>
+>
+No, I just simply hot plugged a VFIO device to Bus 0, another interesting
+>
+phenomenon is
+>
+If I hot plug the device to other bus, this doesn't happened.
+>
+>
+>
+>
+>
+>
+> > We didn't handle this value  specially while process pci write in
+>
+> > qemu, the function call stack is:
+>
+> >
+>
+> > Pci_bridge_dev_write_config
+>
+> >
+>
+> > -> pci_bridge_write_config
+>
+> >
+>
+> > -> pci_default_write_config (we update the config[address] value here
+>
+> > -> to
+>
+> > fffffffffc800000, which should be 0xfc800000 )
+>
+> >
+>
+> > -> pci_bridge_update_mappings
+>
+> >
+>
+> >                 ->pci_bridge_region_del(br, br->windows);
+>
+> >
+>
+> > -> pci_bridge_region_init
+>
+> >
+>
+> >                                                                 ->
+>
+> > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
+>
+> > value
+>
+> > fffffffffc800000)
+>
+> >
+>
+> >                                                 ->
+>
+> > memory_region_transaction_commit
+>
+> >
+>
+> >
+>
+> >
+>
+> > So, as we can see, we use the wrong base address in qemu to update the
+>
+> > memory regions, though, we update the base address to
+>
+> >
+>
+> > The correct value after pci driver in VM write the original value
+>
+> > back, the virtio NIC in bus 4 may still sends net packets concurrently
+>
+> > with
+>
+> >
+>
+> > The wrong memory region address.
+>
+> >
+>
+> >
+>
+> >
+>
+> > We have tried to skip the memory region update action in qemu while
+>
+> > detect pci write with 0xffffffff value, and it does work, but
+>
+> >
+>
+> > This seems to be not gently.
+>
+>
+>
+> For sure. But I'm still puzzled as to why does Linux try to size the BAR of
+>
+> the
+>
+> bridge while a device behind it is used.
+>
+>
+>
+> Can you pls post your QEMU command line?
+>
+>
+My QEMU command line:
+>
+/root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -object
+>
+secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-Linux/master-key.aes
+>
+-machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
+>
+host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
+>
+size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
+>
+20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -numa
+>
+node,nodeid=1,cpus=5-9,mem=1024 -numa node,nodeid=2,cpus=10-14,mem=1024 -numa
+>
+node,nodeid=3,cpus=15-19,mem=1024 -uuid 34a588c7-b0f2-4952-b39c-47fae3411439
+>
+-no-user-config -nodefaults -chardev
+>
+socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/monitor.sock,server,nowait
+>
+-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
+>
+-global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device
+>
+pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
+>
+pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
+>
+pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
+>
+pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
+>
+pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
+>
+piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
+>
+usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
+>
+nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
+>
+virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
+>
+virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
+>
+virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
+>
+virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
+>
+virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
+>
+file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none
+>
+-device
+>
+virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
+>
+-drive if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
+>
+ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
+>
+tap,fd=35,id=hostnet0 -device
+>
+virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4,addr=0x1
+>
+-chardev pty,id=charserial0 -device
+>
+isa-serial,chardev=charserial0,id=serial0 -device
+>
+usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
+>
+cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
+>
+virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
+>
+>
+I am also very curious about this issue, in the linux kernel code, maybe
+>
+double check in function pci_bridge_check_ranges triggered this problem.
+If you can get the stacktrace in Linux when it tries to write this
+fffff value, that would be quite helpful.
+
+
+>
+>
+>
+>
+>
+>
+>
+>
+> >
+>
+> >
+>
+> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
+>
+> >
+>
+> > index b2e50c3..84b405d 100644
+>
+> >
+>
+> > --- a/hw/pci/pci_bridge.c
+>
+> >
+>
+> > +++ b/hw/pci/pci_bridge.c
+>
+> >
+>
+> > @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+> >
+>
+> >      pci_default_write_config(d, address, val, len);
+>
+> >
+>
+> > -    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+> >
+>
+> > +    if ( (val != 0xffffffff) &&
+>
+> >
+>
+> > +        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+> >
+>
+> >          /* io base/limit */
+>
+> >
+>
+> >          ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
+> >
+>
+> > @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+> >
+>
+> >          ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
+>
+> >
+>
+> >          /* vga enable */
+>
+> >
+>
+> > -        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
+>
+> >
+>
+> > +        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
+>
+> >
+>
+> >          pci_bridge_update_mappings(s);
+>
+> >
+>
+> >      }
+>
+> >
+>
+> >
+>
+> >
+>
+> > Thinks,
+>
+> >
+>
+> > Xu
+>
+> >
+
+On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+> > > Hi all,
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > In our test, we configured VM with several pci-bridges and a
+>
+> > > virtio-net nic been attached with bus 4,
+>
+> > >
+>
+> > > After VM is startup, We ping this nic from host to judge if it is
+>
+> > > working normally. Then, we hot add pci devices to this VM with bus 0.
+>
+> > >
+>
+> > > We  found the virtio-net NIC in bus 4 is not working (can not
+>
+> > > connect) occasionally, as it kick virtio backend failure with error
+>
+> > > below:
+>
+> > >
+>
+> > >     Unassigned mem write 00000000fc803004 = 0x1
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > memory-region: pci_bridge_pci
+>
+> > >
+>
+> > >   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
+>
+> > >
+>
+> > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
+>
+> > >
+>
+> > >       00000000fc800000-00000000fc800fff (prio 0, RW):
+>
+> > > virtio-pci-common
+>
+> > >
+>
+> > >       00000000fc801000-00000000fc801fff (prio 0, RW):
+>
+> > > virtio-pci-isr
+>
+> > >
+>
+> > >       00000000fc802000-00000000fc802fff (prio 0, RW):
+>
+> > > virtio-pci-device
+>
+> > >
+>
+> > >       00000000fc803000-00000000fc803fff (prio 0, RW):
+>
+> > > virtio-pci-notify  <- io mem unassigned
+>
+> > >
+>
+> > >       â¦
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > We caught an exceptional address changing while this problem
+>
+> > > happened, show as
+>
+> > > follow:
+>
+> > >
+>
+> > > Before pci_bridge_update_mappingsï¼
+>
+> > >
+>
+> > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
+>
+> > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > 00000000fc000000-00000000fc1fffff
+>
+> > >
+>
+> > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
+>
+> > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > 00000000fc200000-00000000fc3fffff
+>
+> > >
+>
+> > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
+>
+> > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > 00000000fc400000-00000000fc5fffff
+>
+> > >
+>
+> > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
+>
+> > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > 00000000fc600000-00000000fc7fffff
+>
+> > >
+>
+> > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
+>
+> > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > 00000000fc800000-00000000fc9fffff
+>
+> > > <- correct Adress Spce
+>
+> > >
+>
+> > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
+>
+> > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > 00000000fca00000-00000000fcbfffff
+>
+> > >
+>
+> > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
+>
+> > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > 00000000fcc00000-00000000fcdfffff
+>
+> > >
+>
+> > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
+>
+> > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > 00000000fce00000-00000000fcffffff
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > After pci_bridge_update_mappingsï¼
+>
+> > >
+>
+> > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
+>
+> > > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
+>
+> > >
+>
+> > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
+>
+> > > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
+>
+> > >
+>
+> > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
+>
+> > > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
+>
+> > >
+>
+> > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
+>
+> > > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
+>
+> > >
+>
+> > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
+>
+> > > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
+>
+> > >
+>
+> > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
+>
+> > > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
+>
+> > >
+>
+> > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
+>
+> > > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
+>
+> > >
+>
+> > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
+>
+> > > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
+>
+> > >
+>
+> > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
+>
+pci_bridge_pref_mem
+>
+> > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
+>
+> > > Adress
+>
+> > Space
+>
+> >
+>
+> > This one is empty though right?
+>
+> >
+>
+> > >
+>
+> > >
+>
+> > > We have figured out why this address becomes this value,
+>
+> > > according to pci spec,  pci driver can get BAR address size by
+>
+> > > writing 0xffffffff to
+>
+> > >
+>
+> > > the pci register firstly, and then read back the value from this
+>
+> > > register.
+>
+> >
+>
+> >
+>
+> > OK however as you show below the BAR being sized is the BAR if a
+>
+> > bridge. Are you then adding a bridge device by hotplug?
+>
+>
+>
+> No, I just simply hot plugged a VFIO device to Bus 0, another
+>
+> interesting phenomenon is If I hot plug the device to other bus, this
+>
+> doesn't
+>
+happened.
+>
+>
+>
+> >
+>
+> >
+>
+> > > We didn't handle this value  specially while process pci write in
+>
+> > > qemu, the function call stack is:
+>
+> > >
+>
+> > > Pci_bridge_dev_write_config
+>
+> > >
+>
+> > > -> pci_bridge_write_config
+>
+> > >
+>
+> > > -> pci_default_write_config (we update the config[address] value
+>
+> > > -> here to
+>
+> > > fffffffffc800000, which should be 0xfc800000 )
+>
+> > >
+>
+> > > -> pci_bridge_update_mappings
+>
+> > >
+>
+> > >                 ->pci_bridge_region_del(br, br->windows);
+>
+> > >
+>
+> > > -> pci_bridge_region_init
+>
+> > >
+>
+> > >                                                                 ->
+>
+> > > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
+>
+> > > value
+>
+> > > fffffffffc800000)
+>
+> > >
+>
+> > >                                                 ->
+>
+> > > memory_region_transaction_commit
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > So, as we can see, we use the wrong base address in qemu to update
+>
+> > > the memory regions, though, we update the base address to
+>
+> > >
+>
+> > > The correct value after pci driver in VM write the original value
+>
+> > > back, the virtio NIC in bus 4 may still sends net packets
+>
+> > > concurrently with
+>
+> > >
+>
+> > > The wrong memory region address.
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > We have tried to skip the memory region update action in qemu
+>
+> > > while detect pci write with 0xffffffff value, and it does work,
+>
+> > > but
+>
+> > >
+>
+> > > This seems to be not gently.
+>
+> >
+>
+> > For sure. But I'm still puzzled as to why does Linux try to size the
+>
+> > BAR of the bridge while a device behind it is used.
+>
+> >
+>
+> > Can you pls post your QEMU command line?
+>
+>
+>
+> My QEMU command line:
+>
+> /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
+>
+> -object
+>
+> secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-
+>
+> Linux/master-key.aes -machine
+>
+> pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
+>
+> host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
+>
+> size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
+>
+> 20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024
+>
+> -numa node,nodeid=1,cpus=5-9,mem=1024 -numa
+>
+> node,nodeid=2,cpus=10-14,mem=1024 -numa
+>
+> node,nodeid=3,cpus=15-19,mem=1024 -uuid
+>
+> 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
+>
+> -chardev
+>
+> socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/moni
+>
+> tor.sock,server,nowait -mon
+>
+> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
+>
+> -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on
+>
+> -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
+>
+> pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
+>
+> pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
+>
+> pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
+>
+> pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
+>
+> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
+>
+> usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
+>
+> nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
+>
+> virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
+>
+> virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
+>
+> virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
+>
+> virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
+>
+> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
+>
+> file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-v
+>
+> irtio-disk0,cache=none -device
+>
+> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id
+>
+> =virtio-disk0,bootindex=1 -drive
+>
+> if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
+>
+> ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
+>
+> tap,fd=35,id=hostnet0 -device
+>
+> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4
+>
+> ,addr=0x1 -chardev pty,id=charserial0 -device
+>
+> isa-serial,chardev=charserial0,id=serial0 -device
+>
+> usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
+>
+> cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
+>
+> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
+>
+>
+>
+> I am also very curious about this issue, in the linux kernel code, maybe
+>
+> double
+>
+check in function pci_bridge_check_ranges triggered this problem.
+>
+>
+If you can get the stacktrace in Linux when it tries to write this fffff
+>
+value, that
+>
+would be quite helpful.
+>
+After I add mdelay(100) in function pci_bridge_check_ranges, this phenomenon is
+easier to reproduce, below is my modify in kernel:
+diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
+index cb389277..86e232d 100644
+--- a/drivers/pci/setup-bus.c
++++ b/drivers/pci/setup-bus.c
+@@ -27,7 +27,7 @@
+ #include <linux/slab.h>
+ #include <linux/acpi.h>
+ #include "pci.h"
+-
++#include <linux/delay.h>
+ unsigned int pci_flags;
+ 
+ struct pci_dev_resource {
+@@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
+                pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+                                               0xffffffff);
+                pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp);
++               mdelay(100);
++               printk(KERN_ERR "sleep\n");
++                dump_stack();
+                if (!tmp)
+                        b_res[2].flags &= ~IORESOURCE_MEM_64;
+                pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+
+After hot plugging, we get the following log:
+
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:14.0: BAR 0: assigned [mem 
+0xc2360000-0xc237ffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:14.0: BAR 3: assigned [mem 
+0xc2328000-0xc232bfff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:18 uefi-linux kernel: sleep
+Dec 11 09:28:18 uefi-linux kernel: CPU: 16 PID: 502 Comm: kworker/u40:1 Not 
+tainted 4.11.0-rc3+ #11
+Dec 11 09:28:18 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
+PIIX, 1996), BIOS 0.0.0 02/06/2015
+Dec 11 09:28:18 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
+Dec 11 09:28:18 uefi-linux kernel: Call Trace:
+Dec 11 09:28:18 uefi-linux kernel: dump_stack+0x63/0x87
+Dec 11 09:28:18 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
+Dec 11 09:28:18 uefi-linux kernel: ? dev_printk+0x4d/0x50
+Dec 11 09:28:18 uefi-linux kernel: enable_slot+0x140/0x2f0
+Dec 11 09:28:18 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
+Dec 11 09:28:18 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
+Dec 11 09:28:18 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
+Dec 11 09:28:18 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
+Dec 11 09:28:18 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
+Dec 11 09:28:18 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
+Dec 11 09:28:18 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
+Dec 11 09:28:18 uefi-linux kernel: process_one_work+0x165/0x410
+Dec 11 09:28:18 uefi-linux kernel: worker_thread+0x137/0x4c0
+Dec 11 09:28:18 uefi-linux kernel: kthread+0x101/0x140
+Dec 11 09:28:18 uefi-linux kernel: ? rescuer_thread+0x380/0x380
+Dec 11 09:28:18 uefi-linux kernel: ? kthread_park+0x90/0x90
+Dec 11 09:28:18 uefi-linux kernel: ret_from_fork+0x2c/0x40
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:18 uefi-linux kernel: sleep
+Dec 11 09:28:18 uefi-linux kernel: CPU: 16 PID: 502 Comm: kworker/u40:1 Not 
+tainted 4.11.0-rc3+ #11
+Dec 11 09:28:18 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
+PIIX, 1996), BIOS 0.0.0 02/06/2015
+Dec 11 09:28:18 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
+Dec 11 09:28:18 uefi-linux kernel: Call Trace:
+Dec 11 09:28:18 uefi-linux kernel: dump_stack+0x63/0x87
+Dec 11 09:28:18 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
+Dec 11 09:28:18 uefi-linux kernel: ? dev_printk+0x4d/0x50
+Dec 11 09:28:18 uefi-linux kernel: enable_slot+0x140/0x2f0
+Dec 11 09:28:18 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
+Dec 11 09:28:18 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
+Dec 11 09:28:18 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
+Dec 11 09:28:18 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
+Dec 11 09:28:18 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
+Dec 11 09:28:18 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
+Dec 11 09:28:18 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
+Dec 11 09:28:18 uefi-linux kernel: process_one_work+0x165/0x410
+Dec 11 09:28:18 uefi-linux kernel: worker_thread+0x137/0x4c0
+Dec 11 09:28:18 uefi-linux kernel: kthread+0x101/0x140
+Dec 11 09:28:18 uefi-linux kernel: ? rescuer_thread+0x380/0x380
+Dec 11 09:28:18 uefi-linux kernel: ? kthread_park+0x90/0x90
+Dec 11 09:28:18 uefi-linux kernel: ret_from_fork+0x2c/0x40
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:19 uefi-linux kernel: sleep
+Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not 
+tainted 4.11.0-rc3+ #11
+Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
+PIIX, 1996), BIOS 0.0.0 02/06/2015
+Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
+Dec 11 09:28:19 uefi-linux kernel: Call Trace:
+Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87
+Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
+Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50
+Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0
+Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
+Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
+Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
+Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
+Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
+Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
+Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
+Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410
+Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0
+Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140
+Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380
+Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90
+Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:19 uefi-linux kernel: sleep
+Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not 
+tainted 4.11.0-rc3+ #11
+Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
+PIIX, 1996), BIOS 0.0.0 02/06/2015
+Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
+Dec 11 09:28:19 uefi-linux kernel: Call Trace:
+Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87
+Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
+Dec 11 09:28:19 uefi-linux kernel: ? pci_conf1_read+0xba/0x100
+Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0xe9/0x960
+Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50
+Dec 11 09:28:19 uefi-linux kernel: ? pcibios_allocate_rom_resources+0x45/0x80
+Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0
+Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
+Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
+Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
+Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
+Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
+Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
+Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
+Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
+Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410
+Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0
+Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140
+Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380
+Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90
+Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40
+Dec 11 09:28:19 uefi-linux kernel: sleep
+Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not 
+tainted 4.11.0-rc3+ #11
+Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
+PIIX, 1996), BIOS 0.0.0 02/06/2015
+Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
+Dec 11 09:28:19 uefi-linux kernel: Call Trace:
+Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87
+Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
+Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50
+Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0
+Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
+Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
+Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
+Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
+Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
+Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
+Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
+Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
+Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410
+Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0
+Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140
+Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380
+Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90
+Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 lost sync at byte 1
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 - driver resynced.
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 lost sync at byte 1
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 - driver resynced.
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 lost sync at byte 1
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 - driver resynced.
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 lost sync at byte 1
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 - driver resynced.
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 lost sync at byte 1
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 - driver resynced.
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 lost sync at byte 1
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 - driver resynced.
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 lost sync at byte 1
+Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 - driver resynced.
+Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 lost sync at byte 1
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 - driver resynced.
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 lost sync at byte 1
+Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 lost sync at byte 1
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
+isa0060/serio1/input0 - driver resynced.
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
+0xf000-0xffff]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2800000-0xc29fffff]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
+0xc2b00000-0xc2cfffff 64bit pref]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
+0xe000-0xefff]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2600000-0xc27fffff]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
+0xc2d00000-0xc2efffff 64bit pref]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
+0xd000-0xdfff]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2400000-0xc25fffff]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
+0xc2f00000-0xc30fffff 64bit pref]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
+0xc000-0xcfff]
+Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
+0xc2000000-0xc21fffff]
+
+>
+>
+>
+> >
+>
+> >
+>
+> >
+>
+> > >
+>
+> > >
+>
+> > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
+>
+> > >
+>
+> > > index b2e50c3..84b405d 100644
+>
+> > >
+>
+> > > --- a/hw/pci/pci_bridge.c
+>
+> > >
+>
+> > > +++ b/hw/pci/pci_bridge.c
+>
+> > >
+>
+> > > @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+> > >
+>
+> > >      pci_default_write_config(d, address, val, len);
+>
+> > >
+>
+> > > -    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+> > >
+>
+> > > +    if ( (val != 0xffffffff) &&
+>
+> > >
+>
+> > > +        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+> > >
+>
+> > >          /* io base/limit */
+>
+> > >
+>
+> > >          ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
+> > >
+>
+> > > @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+> > >
+>
+> > >          ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
+>
+> > >
+>
+> > >          /* vga enable */
+>
+> > >
+>
+> > > -        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
+>
+> > >
+>
+> > > +        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
+>
+> > >
+>
+> > >          pci_bridge_update_mappings(s);
+>
+> > >
+>
+> > >      }
+>
+> > >
+>
+> > >
+>
+> > >
+>
+> > > Thinks,
+>
+> > >
+>
+> > > Xu
+>
+> > >
+
+On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
+>
+On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+> > > > Hi all,
+>
+> > > >
+>
+> > > >
+>
+> > > >
+>
+> > > > In our test, we configured VM with several pci-bridges and a
+>
+> > > > virtio-net nic been attached with bus 4,
+>
+> > > >
+>
+> > > > After VM is startup, We ping this nic from host to judge if it is
+>
+> > > > working normally. Then, we hot add pci devices to this VM with bus 0.
+>
+> > > >
+>
+> > > > We  found the virtio-net NIC in bus 4 is not working (can not
+>
+> > > > connect) occasionally, as it kick virtio backend failure with error
+>
+> > > > below:
+>
+> > > >
+>
+> > > >     Unassigned mem write 00000000fc803004 = 0x1
+>
+> > > >
+>
+> > > >
+>
+> > > >
+>
+> > > > memory-region: pci_bridge_pci
+>
+> > > >
+>
+> > > >   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
+>
+> > > >
+>
+> > > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
+>
+> > > >
+>
+> > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
+>
+> > > > virtio-pci-common
+>
+> > > >
+>
+> > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
+>
+> > > > virtio-pci-isr
+>
+> > > >
+>
+> > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
+>
+> > > > virtio-pci-device
+>
+> > > >
+>
+> > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
+>
+> > > > virtio-pci-notify  <- io mem unassigned
+>
+> > > >
+>
+> > > >       â¦
+>
+> > > >
+>
+> > > >
+>
+> > > >
+>
+> > > > We caught an exceptional address changing while this problem
+>
+> > > > happened, show as
+>
+> > > > follow:
+>
+> > > >
+>
+> > > > Before pci_bridge_update_mappingsï¼
+>
+> > > >
+>
+> > > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
+>
+> > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > 00000000fc000000-00000000fc1fffff
+>
+> > > >
+>
+> > > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
+>
+> > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > 00000000fc200000-00000000fc3fffff
+>
+> > > >
+>
+> > > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
+>
+> > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > 00000000fc400000-00000000fc5fffff
+>
+> > > >
+>
+> > > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
+>
+> > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > 00000000fc600000-00000000fc7fffff
+>
+> > > >
+>
+> > > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
+>
+> > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > 00000000fc800000-00000000fc9fffff
+>
+> > > > <- correct Adress Spce
+>
+> > > >
+>
+> > > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
+>
+> > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > 00000000fca00000-00000000fcbfffff
+>
+> > > >
+>
+> > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
+>
+> > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > 00000000fcc00000-00000000fcdfffff
+>
+> > > >
+>
+> > > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
+>
+> > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > 00000000fce00000-00000000fcffffff
+>
+> > > >
+>
+> > > >
+>
+> > > >
+>
+> > > > After pci_bridge_update_mappingsï¼
+>
+> > > >
+>
+> > > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
+>
+> > > > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
+>
+> > > >
+>
+> > > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
+>
+> > > > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
+>
+> > > >
+>
+> > > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
+>
+> > > > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
+>
+> > > >
+>
+> > > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
+>
+> > > > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
+>
+> > > >
+>
+> > > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
+>
+> > > > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
+>
+> > > >
+>
+> > > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
+>
+> > > > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
+>
+> > > >
+>
+> > > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
+>
+> > > > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
+>
+> > > >
+>
+> > > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
+>
+> > > > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
+>
+> > > >
+>
+> > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
+>
+> pci_bridge_pref_mem
+>
+> > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
+>
+> > > > Adress
+>
+> > > Space
+>
+> > >
+>
+> > > This one is empty though right?
+>
+> > >
+>
+> > > >
+>
+> > > >
+>
+> > > > We have figured out why this address becomes this value,
+>
+> > > > according to pci spec,  pci driver can get BAR address size by
+>
+> > > > writing 0xffffffff to
+>
+> > > >
+>
+> > > > the pci register firstly, and then read back the value from this
+>
+> > > > register.
+>
+> > >
+>
+> > >
+>
+> > > OK however as you show below the BAR being sized is the BAR if a
+>
+> > > bridge. Are you then adding a bridge device by hotplug?
+>
+> >
+>
+> > No, I just simply hot plugged a VFIO device to Bus 0, another
+>
+> > interesting phenomenon is If I hot plug the device to other bus, this
+>
+> > doesn't
+>
+> happened.
+>
+> >
+>
+> > >
+>
+> > >
+>
+> > > > We didn't handle this value  specially while process pci write in
+>
+> > > > qemu, the function call stack is:
+>
+> > > >
+>
+> > > > Pci_bridge_dev_write_config
+>
+> > > >
+>
+> > > > -> pci_bridge_write_config
+>
+> > > >
+>
+> > > > -> pci_default_write_config (we update the config[address] value
+>
+> > > > -> here to
+>
+> > > > fffffffffc800000, which should be 0xfc800000 )
+>
+> > > >
+>
+> > > > -> pci_bridge_update_mappings
+>
+> > > >
+>
+> > > >                 ->pci_bridge_region_del(br, br->windows);
+>
+> > > >
+>
+> > > > -> pci_bridge_region_init
+>
+> > > >
+>
+> > > >                                                                 ->
+>
+> > > > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
+>
+> > > > value
+>
+> > > > fffffffffc800000)
+>
+> > > >
+>
+> > > >                                                 ->
+>
+> > > > memory_region_transaction_commit
+>
+> > > >
+>
+> > > >
+>
+> > > >
+>
+> > > > So, as we can see, we use the wrong base address in qemu to update
+>
+> > > > the memory regions, though, we update the base address to
+>
+> > > >
+>
+> > > > The correct value after pci driver in VM write the original value
+>
+> > > > back, the virtio NIC in bus 4 may still sends net packets
+>
+> > > > concurrently with
+>
+> > > >
+>
+> > > > The wrong memory region address.
+>
+> > > >
+>
+> > > >
+>
+> > > >
+>
+> > > > We have tried to skip the memory region update action in qemu
+>
+> > > > while detect pci write with 0xffffffff value, and it does work,
+>
+> > > > but
+>
+> > > >
+>
+> > > > This seems to be not gently.
+>
+> > >
+>
+> > > For sure. But I'm still puzzled as to why does Linux try to size the
+>
+> > > BAR of the bridge while a device behind it is used.
+>
+> > >
+>
+> > > Can you pls post your QEMU command line?
+>
+> >
+>
+> > My QEMU command line:
+>
+> > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
+>
+> > -object
+>
+> > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-
+>
+> > Linux/master-key.aes -machine
+>
+> > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
+>
+> > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
+>
+> > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
+>
+> > 20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024
+>
+> > -numa node,nodeid=1,cpus=5-9,mem=1024 -numa
+>
+> > node,nodeid=2,cpus=10-14,mem=1024 -numa
+>
+> > node,nodeid=3,cpus=15-19,mem=1024 -uuid
+>
+> > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
+>
+> > -chardev
+>
+> > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/moni
+>
+> > tor.sock,server,nowait -mon
+>
+> > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
+>
+> > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on
+>
+> > -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
+>
+> > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
+>
+> > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
+>
+> > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
+>
+> > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
+>
+> > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
+>
+> > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
+>
+> > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
+>
+> > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
+>
+> > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
+>
+> > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
+>
+> > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
+>
+> > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
+>
+> > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-v
+>
+> > irtio-disk0,cache=none -device
+>
+> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id
+>
+> > =virtio-disk0,bootindex=1 -drive
+>
+> > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
+>
+> > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
+>
+> > tap,fd=35,id=hostnet0 -device
+>
+> > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4
+>
+> > ,addr=0x1 -chardev pty,id=charserial0 -device
+>
+> > isa-serial,chardev=charserial0,id=serial0 -device
+>
+> > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
+>
+> > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
+>
+> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
+>
+> >
+>
+> > I am also very curious about this issue, in the linux kernel code, maybe
+>
+> > double
+>
+> check in function pci_bridge_check_ranges triggered this problem.
+>
+>
+>
+> If you can get the stacktrace in Linux when it tries to write this fffff
+>
+> value, that
+>
+> would be quite helpful.
+>
+>
+>
+>
+After I add mdelay(100) in function pci_bridge_check_ranges, this phenomenon
+>
+is
+>
+easier to reproduce, below is my modify in kernel:
+>
+diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
+>
+index cb389277..86e232d 100644
+>
+--- a/drivers/pci/setup-bus.c
+>
++++ b/drivers/pci/setup-bus.c
+>
+@@ -27,7 +27,7 @@
+>
+#include <linux/slab.h>
+>
+#include <linux/acpi.h>
+>
+#include "pci.h"
+>
+-
+>
++#include <linux/delay.h>
+>
+unsigned int pci_flags;
+>
+>
+struct pci_dev_resource {
+>
+@@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
+>
+pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+>
+0xffffffff);
+>
+pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp);
+>
++               mdelay(100);
+>
++               printk(KERN_ERR "sleep\n");
+>
++                dump_stack();
+>
+if (!tmp)
+>
+b_res[2].flags &= ~IORESOURCE_MEM_64;
+>
+pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+>
+OK!
+I just sent a Linux patch that should help.
+I would appreciate it if you will give it a try
+and if that helps reply to it with
+a Tested-by: tag.
+
+-- 
+MST
+
+On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
+>
+> On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+> > > > > Hi all,
+>
+> > > > >
+>
+> > > > >
+>
+> > > > >
+>
+> > > > > In our test, we configured VM with several pci-bridges and a
+>
+> > > > > virtio-net nic been attached with bus 4,
+>
+> > > > >
+>
+> > > > > After VM is startup, We ping this nic from host to judge if it
+>
+> > > > > is working normally. Then, we hot add pci devices to this VM with
+>
+> > > > > bus
+>
+0.
+>
+> > > > >
+>
+> > > > > We  found the virtio-net NIC in bus 4 is not working (can not
+>
+> > > > > connect) occasionally, as it kick virtio backend failure with error
+>
+> > > > > below:
+>
+> > > > >
+>
+> > > > >     Unassigned mem write 00000000fc803004 = 0x1
+>
+> > > > >
+>
+> > > > >
+>
+> > > > >
+>
+> > > > > memory-region: pci_bridge_pci
+>
+> > > > >
+>
+> > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
+>
+> > > > > pci_bridge_pci
+>
+> > > > >
+>
+> > > > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
+>
+> > > > >
+>
+> > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
+>
+> > > > > virtio-pci-common
+>
+> > > > >
+>
+> > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
+>
+> > > > > virtio-pci-isr
+>
+> > > > >
+>
+> > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
+>
+> > > > > virtio-pci-device
+>
+> > > > >
+>
+> > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
+>
+> > > > > virtio-pci-notify  <- io mem unassigned
+>
+> > > > >
+>
+> > > > >       â¦
+>
+> > > > >
+>
+> > > > >
+>
+> > > > >
+>
+> > > > > We caught an exceptional address changing while this problem
+>
+> > > > > happened, show as
+>
+> > > > > follow:
+>
+> > > > >
+>
+> > > > > Before pci_bridge_update_mappingsï¼
+>
+> > > > >
+>
+> > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > 00000000fc000000-00000000fc1fffff
+>
+> > > > >
+>
+> > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > 00000000fc200000-00000000fc3fffff
+>
+> > > > >
+>
+> > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > 00000000fc400000-00000000fc5fffff
+>
+> > > > >
+>
+> > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > 00000000fc600000-00000000fc7fffff
+>
+> > > > >
+>
+> > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > 00000000fc800000-00000000fc9fffff
+>
+> > > > > <- correct Adress Spce
+>
+> > > > >
+>
+> > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > 00000000fca00000-00000000fcbfffff
+>
+> > > > >
+>
+> > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > 00000000fcc00000-00000000fcdfffff
+>
+> > > > >
+>
+> > > > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > 00000000fce00000-00000000fcffffff
+>
+> > > > >
+>
+> > > > >
+>
+> > > > >
+>
+> > > > > After pci_bridge_update_mappingsï¼
+>
+> > > > >
+>
+> > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > 00000000fda00000-00000000fdbfffff
+>
+> > > > >
+>
+> > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > 00000000fdc00000-00000000fddfffff
+>
+> > > > >
+>
+> > > > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > 00000000fde00000-00000000fdffffff
+>
+> > > > >
+>
+> > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > 00000000fe000000-00000000fe1fffff
+>
+> > > > >
+>
+> > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > 00000000fe200000-00000000fe3fffff
+>
+> > > > >
+>
+> > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > 00000000fe400000-00000000fe5fffff
+>
+> > > > >
+>
+> > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > 00000000fe600000-00000000fe7fffff
+>
+> > > > >
+>
+> > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
+>
+> > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > 00000000fe800000-00000000fe9fffff
+>
+> > > > >
+>
+> > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
+>
+> > pci_bridge_pref_mem
+>
+> > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
+>
+Adress
+>
+> > > > Space
+>
+> > > >
+>
+> > > > This one is empty though right?
+>
+> > > >
+>
+> > > > >
+>
+> > > > >
+>
+> > > > > We have figured out why this address becomes this value,
+>
+> > > > > according to pci spec,  pci driver can get BAR address size by
+>
+> > > > > writing 0xffffffff to
+>
+> > > > >
+>
+> > > > > the pci register firstly, and then read back the value from this
+>
+> > > > > register.
+>
+> > > >
+>
+> > > >
+>
+> > > > OK however as you show below the BAR being sized is the BAR if a
+>
+> > > > bridge. Are you then adding a bridge device by hotplug?
+>
+> > >
+>
+> > > No, I just simply hot plugged a VFIO device to Bus 0, another
+>
+> > > interesting phenomenon is If I hot plug the device to other bus,
+>
+> > > this doesn't
+>
+> > happened.
+>
+> > >
+>
+> > > >
+>
+> > > >
+>
+> > > > > We didn't handle this value  specially while process pci write
+>
+> > > > > in qemu, the function call stack is:
+>
+> > > > >
+>
+> > > > > Pci_bridge_dev_write_config
+>
+> > > > >
+>
+> > > > > -> pci_bridge_write_config
+>
+> > > > >
+>
+> > > > > -> pci_default_write_config (we update the config[address]
+>
+> > > > > -> value here to
+>
+> > > > > fffffffffc800000, which should be 0xfc800000 )
+>
+> > > > >
+>
+> > > > > -> pci_bridge_update_mappings
+>
+> > > > >
+>
+> > > > >                 ->pci_bridge_region_del(br, br->windows);
+>
+> > > > >
+>
+> > > > > -> pci_bridge_region_init
+>
+> > > > >
+>
+> > > > >
+>
+> > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use the
+>
+> > > > > wrong value
+>
+> > > > > fffffffffc800000)
+>
+> > > > >
+>
+> > > > >                                                 ->
+>
+> > > > > memory_region_transaction_commit
+>
+> > > > >
+>
+> > > > >
+>
+> > > > >
+>
+> > > > > So, as we can see, we use the wrong base address in qemu to
+>
+> > > > > update the memory regions, though, we update the base address
+>
+> > > > > to
+>
+> > > > >
+>
+> > > > > The correct value after pci driver in VM write the original
+>
+> > > > > value back, the virtio NIC in bus 4 may still sends net
+>
+> > > > > packets concurrently with
+>
+> > > > >
+>
+> > > > > The wrong memory region address.
+>
+> > > > >
+>
+> > > > >
+>
+> > > > >
+>
+> > > > > We have tried to skip the memory region update action in qemu
+>
+> > > > > while detect pci write with 0xffffffff value, and it does
+>
+> > > > > work, but
+>
+> > > > >
+>
+> > > > > This seems to be not gently.
+>
+> > > >
+>
+> > > > For sure. But I'm still puzzled as to why does Linux try to size
+>
+> > > > the BAR of the bridge while a device behind it is used.
+>
+> > > >
+>
+> > > > Can you pls post your QEMU command line?
+>
+> > >
+>
+> > > My QEMU command line:
+>
+> > > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
+>
+> > > -object
+>
+> > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-
+>
+> > > 194-
+>
+> > > Linux/master-key.aes -machine
+>
+> > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
+>
+> > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
+>
+> > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
+>
+> > > 20,sockets=20,cores=1,threads=1 -numa
+>
+> > > node,nodeid=0,cpus=0-4,mem=1024 -numa
+>
+> > > node,nodeid=1,cpus=5-9,mem=1024 -numa
+>
+> > > node,nodeid=2,cpus=10-14,mem=1024 -numa
+>
+> > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
+>
+> > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
+>
+> > > -chardev
+>
+> > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/
+>
+> > > moni
+>
+> > > tor.sock,server,nowait -mon
+>
+> > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
+>
+> > > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot
+>
+> > > strict=on -device
+>
+> > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
+>
+> > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
+>
+> > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
+>
+> > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
+>
+> > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
+>
+> > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
+>
+> > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
+>
+> > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
+>
+> > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
+>
+> > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
+>
+> > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
+>
+> > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
+>
+> > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
+>
+> > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=dri
+>
+> > > ve-v
+>
+> > > irtio-disk0,cache=none -device
+>
+> > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk
+>
+> > > 0,id
+>
+> > > =virtio-disk0,bootindex=1 -drive
+>
+> > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
+>
+> > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
+>
+> > > tap,fd=35,id=hostnet0 -device
+>
+> > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=p
+>
+> > > ci.4
+>
+> > > ,addr=0x1 -chardev pty,id=charserial0 -device
+>
+> > > isa-serial,chardev=charserial0,id=serial0 -device
+>
+> > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
+>
+> > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
+>
+> > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
+>
+> > > timestamp=on
+>
+> > >
+>
+> > > I am also very curious about this issue, in the linux kernel code,
+>
+> > > maybe double
+>
+> > check in function pci_bridge_check_ranges triggered this problem.
+>
+> >
+>
+> > If you can get the stacktrace in Linux when it tries to write this
+>
+> > fffff value, that would be quite helpful.
+>
+> >
+>
+>
+>
+> After I add mdelay(100) in function pci_bridge_check_ranges, this
+>
+> phenomenon is easier to reproduce, below is my modify in kernel:
+>
+> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index
+>
+> cb389277..86e232d 100644
+>
+> --- a/drivers/pci/setup-bus.c
+>
+> +++ b/drivers/pci/setup-bus.c
+>
+> @@ -27,7 +27,7 @@
+>
+>  #include <linux/slab.h>
+>
+>  #include <linux/acpi.h>
+>
+>  #include "pci.h"
+>
+> -
+>
+> +#include <linux/delay.h>
+>
+>  unsigned int pci_flags;
+>
+>
+>
+>  struct pci_dev_resource {
+>
+> @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus
+>
+*bus)
+>
+>                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+>
+>                                                0xffffffff);
+>
+>                 pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+>
+> &tmp);
+>
+> +               mdelay(100);
+>
+> +               printk(KERN_ERR "sleep\n");
+>
+> +                dump_stack();
+>
+>                 if (!tmp)
+>
+>                         b_res[2].flags &= ~IORESOURCE_MEM_64;
+>
+>                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+>
+>
+>
+>
+OK!
+>
+I just sent a Linux patch that should help.
+>
+I would appreciate it if you will give it a try and if that helps reply to it
+>
+with a
+>
+Tested-by: tag.
+>
+I tested this patch and it works fine on my machine.
+
+But I have another question, if we only fix this problem in the kernel, the 
+Linux
+version that has been released does not work well on the virtualization 
+platform. 
+Is there a way to fix this problem in the backend?
+
+>
+--
+>
+MST
+
+On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote:
+>
+On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
+>
+> > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+> > > > > > Hi all,
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > > In our test, we configured VM with several pci-bridges and a
+>
+> > > > > > virtio-net nic been attached with bus 4,
+>
+> > > > > >
+>
+> > > > > > After VM is startup, We ping this nic from host to judge if it
+>
+> > > > > > is working normally. Then, we hot add pci devices to this VM with
+>
+> > > > > > bus
+>
+> 0.
+>
+> > > > > >
+>
+> > > > > > We  found the virtio-net NIC in bus 4 is not working (can not
+>
+> > > > > > connect) occasionally, as it kick virtio backend failure with
+>
+> > > > > > error below:
+>
+> > > > > >
+>
+> > > > > >     Unassigned mem write 00000000fc803004 = 0x1
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > > memory-region: pci_bridge_pci
+>
+> > > > > >
+>
+> > > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
+>
+> > > > > > pci_bridge_pci
+>
+> > > > > >
+>
+> > > > > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
+>
+> > > > > >
+>
+> > > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
+>
+> > > > > > virtio-pci-common
+>
+> > > > > >
+>
+> > > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
+>
+> > > > > > virtio-pci-isr
+>
+> > > > > >
+>
+> > > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
+>
+> > > > > > virtio-pci-device
+>
+> > > > > >
+>
+> > > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
+>
+> > > > > > virtio-pci-notify  <- io mem unassigned
+>
+> > > > > >
+>
+> > > > > >       â¦
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > > We caught an exceptional address changing while this problem
+>
+> > > > > > happened, show as
+>
+> > > > > > follow:
+>
+> > > > > >
+>
+> > > > > > Before pci_bridge_update_mappingsï¼
+>
+> > > > > >
+>
+> > > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > 00000000fc000000-00000000fc1fffff
+>
+> > > > > >
+>
+> > > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > 00000000fc200000-00000000fc3fffff
+>
+> > > > > >
+>
+> > > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > 00000000fc400000-00000000fc5fffff
+>
+> > > > > >
+>
+> > > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > 00000000fc600000-00000000fc7fffff
+>
+> > > > > >
+>
+> > > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > 00000000fc800000-00000000fc9fffff
+>
+> > > > > > <- correct Adress Spce
+>
+> > > > > >
+>
+> > > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > 00000000fca00000-00000000fcbfffff
+>
+> > > > > >
+>
+> > > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > 00000000fcc00000-00000000fcdfffff
+>
+> > > > > >
+>
+> > > > > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > 00000000fce00000-00000000fcffffff
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > > After pci_bridge_update_mappingsï¼
+>
+> > > > > >
+>
+> > > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > 00000000fda00000-00000000fdbfffff
+>
+> > > > > >
+>
+> > > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > 00000000fdc00000-00000000fddfffff
+>
+> > > > > >
+>
+> > > > > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > 00000000fde00000-00000000fdffffff
+>
+> > > > > >
+>
+> > > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > 00000000fe000000-00000000fe1fffff
+>
+> > > > > >
+>
+> > > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > 00000000fe200000-00000000fe3fffff
+>
+> > > > > >
+>
+> > > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > 00000000fe400000-00000000fe5fffff
+>
+> > > > > >
+>
+> > > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > 00000000fe600000-00000000fe7fffff
+>
+> > > > > >
+>
+> > > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
+>
+> > > > > > pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > 00000000fe800000-00000000fe9fffff
+>
+> > > > > >
+>
+> > > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
+>
+> > > pci_bridge_pref_mem
+>
+> > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
+>
+> Adress
+>
+> > > > > Space
+>
+> > > > >
+>
+> > > > > This one is empty though right?
+>
+> > > > >
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > > We have figured out why this address becomes this value,
+>
+> > > > > > according to pci spec,  pci driver can get BAR address size by
+>
+> > > > > > writing 0xffffffff to
+>
+> > > > > >
+>
+> > > > > > the pci register firstly, and then read back the value from this
+>
+> > > > > > register.
+>
+> > > > >
+>
+> > > > >
+>
+> > > > > OK however as you show below the BAR being sized is the BAR if a
+>
+> > > > > bridge. Are you then adding a bridge device by hotplug?
+>
+> > > >
+>
+> > > > No, I just simply hot plugged a VFIO device to Bus 0, another
+>
+> > > > interesting phenomenon is If I hot plug the device to other bus,
+>
+> > > > this doesn't
+>
+> > > happened.
+>
+> > > >
+>
+> > > > >
+>
+> > > > >
+>
+> > > > > > We didn't handle this value  specially while process pci write
+>
+> > > > > > in qemu, the function call stack is:
+>
+> > > > > >
+>
+> > > > > > Pci_bridge_dev_write_config
+>
+> > > > > >
+>
+> > > > > > -> pci_bridge_write_config
+>
+> > > > > >
+>
+> > > > > > -> pci_default_write_config (we update the config[address]
+>
+> > > > > > -> value here to
+>
+> > > > > > fffffffffc800000, which should be 0xfc800000 )
+>
+> > > > > >
+>
+> > > > > > -> pci_bridge_update_mappings
+>
+> > > > > >
+>
+> > > > > >                 ->pci_bridge_region_del(br, br->windows);
+>
+> > > > > >
+>
+> > > > > > -> pci_bridge_region_init
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use the
+>
+> > > > > > wrong value
+>
+> > > > > > fffffffffc800000)
+>
+> > > > > >
+>
+> > > > > >                                                 ->
+>
+> > > > > > memory_region_transaction_commit
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > > So, as we can see, we use the wrong base address in qemu to
+>
+> > > > > > update the memory regions, though, we update the base address
+>
+> > > > > > to
+>
+> > > > > >
+>
+> > > > > > The correct value after pci driver in VM write the original
+>
+> > > > > > value back, the virtio NIC in bus 4 may still sends net
+>
+> > > > > > packets concurrently with
+>
+> > > > > >
+>
+> > > > > > The wrong memory region address.
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > > We have tried to skip the memory region update action in qemu
+>
+> > > > > > while detect pci write with 0xffffffff value, and it does
+>
+> > > > > > work, but
+>
+> > > > > >
+>
+> > > > > > This seems to be not gently.
+>
+> > > > >
+>
+> > > > > For sure. But I'm still puzzled as to why does Linux try to size
+>
+> > > > > the BAR of the bridge while a device behind it is used.
+>
+> > > > >
+>
+> > > > > Can you pls post your QEMU command line?
+>
+> > > >
+>
+> > > > My QEMU command line:
+>
+> > > > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
+>
+> > > > -object
+>
+> > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-
+>
+> > > > 194-
+>
+> > > > Linux/master-key.aes -machine
+>
+> > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
+>
+> > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
+>
+> > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
+>
+> > > > 20,sockets=20,cores=1,threads=1 -numa
+>
+> > > > node,nodeid=0,cpus=0-4,mem=1024 -numa
+>
+> > > > node,nodeid=1,cpus=5-9,mem=1024 -numa
+>
+> > > > node,nodeid=2,cpus=10-14,mem=1024 -numa
+>
+> > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
+>
+> > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
+>
+> > > > -chardev
+>
+> > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/
+>
+> > > > moni
+>
+> > > > tor.sock,server,nowait -mon
+>
+> > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
+>
+> > > > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot
+>
+> > > > strict=on -device
+>
+> > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
+>
+> > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
+>
+> > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
+>
+> > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
+>
+> > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
+>
+> > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
+>
+> > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
+>
+> > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
+>
+> > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
+>
+> > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
+>
+> > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
+>
+> > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
+>
+> > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
+>
+> > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=dri
+>
+> > > > ve-v
+>
+> > > > irtio-disk0,cache=none -device
+>
+> > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk
+>
+> > > > 0,id
+>
+> > > > =virtio-disk0,bootindex=1 -drive
+>
+> > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
+>
+> > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
+>
+> > > > tap,fd=35,id=hostnet0 -device
+>
+> > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=p
+>
+> > > > ci.4
+>
+> > > > ,addr=0x1 -chardev pty,id=charserial0 -device
+>
+> > > > isa-serial,chardev=charserial0,id=serial0 -device
+>
+> > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
+>
+> > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
+>
+> > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
+>
+> > > > timestamp=on
+>
+> > > >
+>
+> > > > I am also very curious about this issue, in the linux kernel code,
+>
+> > > > maybe double
+>
+> > > check in function pci_bridge_check_ranges triggered this problem.
+>
+> > >
+>
+> > > If you can get the stacktrace in Linux when it tries to write this
+>
+> > > fffff value, that would be quite helpful.
+>
+> > >
+>
+> >
+>
+> > After I add mdelay(100) in function pci_bridge_check_ranges, this
+>
+> > phenomenon is easier to reproduce, below is my modify in kernel:
+>
+> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index
+>
+> > cb389277..86e232d 100644
+>
+> > --- a/drivers/pci/setup-bus.c
+>
+> > +++ b/drivers/pci/setup-bus.c
+>
+> > @@ -27,7 +27,7 @@
+>
+> >  #include <linux/slab.h>
+>
+> >  #include <linux/acpi.h>
+>
+> >  #include "pci.h"
+>
+> > -
+>
+> > +#include <linux/delay.h>
+>
+> >  unsigned int pci_flags;
+>
+> >
+>
+> >  struct pci_dev_resource {
+>
+> > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus
+>
+> *bus)
+>
+> >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+>
+> >                                                0xffffffff);
+>
+> >                 pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+>
+> > &tmp);
+>
+> > +               mdelay(100);
+>
+> > +               printk(KERN_ERR "sleep\n");
+>
+> > +                dump_stack();
+>
+> >                 if (!tmp)
+>
+> >                         b_res[2].flags &= ~IORESOURCE_MEM_64;
+>
+> >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+>
+> >
+>
+>
+>
+> OK!
+>
+> I just sent a Linux patch that should help.
+>
+> I would appreciate it if you will give it a try and if that helps reply to
+>
+> it with a
+>
+> Tested-by: tag.
+>
+>
+>
+>
+I tested this patch and it works fine on my machine.
+>
+>
+But I have another question, if we only fix this problem in the kernel, the
+>
+Linux
+>
+version that has been released does not work well on the virtualization
+>
+platform.
+>
+Is there a way to fix this problem in the backend?
+There could we a way to work around this.
+Does below help?
+
+diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
+index 236a20eaa8..7834cac4b0 100644
+--- a/hw/i386/acpi-build.c
++++ b/hw/i386/acpi-build.c
+@@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml *parent_scope, 
+PCIBus *bus,
+ 
+         aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
+         aml_append(method,
+-            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check */)
++            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device Check 
+Light */)
+         );
+         aml_append(method,
+             aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request */)
+
+>
+On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote:
+>
+> On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
+>
+> > > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+> > > > > > > Hi all,
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > > In our test, we configured VM with several pci-bridges and
+>
+> > > > > > > a virtio-net nic been attached with bus 4,
+>
+> > > > > > >
+>
+> > > > > > > After VM is startup, We ping this nic from host to judge
+>
+> > > > > > > if it is working normally. Then, we hot add pci devices to
+>
+> > > > > > > this VM with bus
+>
+> > 0.
+>
+> > > > > > >
+>
+> > > > > > > We  found the virtio-net NIC in bus 4 is not working (can
+>
+> > > > > > > not
+>
+> > > > > > > connect) occasionally, as it kick virtio backend failure with
+>
+> > > > > > > error
+>
+below:
+>
+> > > > > > >
+>
+> > > > > > >     Unassigned mem write 00000000fc803004 = 0x1
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > > memory-region: pci_bridge_pci
+>
+> > > > > > >
+>
+> > > > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
+>
+> > > > > > > pci_bridge_pci
+>
+> > > > > > >
+>
+> > > > > > >     00000000fc800000-00000000fc803fff (prio 1, RW):
+>
+> > > > > > > virtio-pci
+>
+> > > > > > >
+>
+> > > > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
+>
+> > > > > > > virtio-pci-common
+>
+> > > > > > >
+>
+> > > > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
+>
+> > > > > > > virtio-pci-isr
+>
+> > > > > > >
+>
+> > > > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
+>
+> > > > > > > virtio-pci-device
+>
+> > > > > > >
+>
+> > > > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
+>
+> > > > > > > virtio-pci-notify  <- io mem unassigned
+>
+> > > > > > >
+>
+> > > > > > >       â¦
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > > We caught an exceptional address changing while this
+>
+> > > > > > > problem happened, show as
+>
+> > > > > > > follow:
+>
+> > > > > > >
+>
+> > > > > > > Before pci_bridge_update_mappingsï¼
+>
+> > > > > > >
+>
+> > > > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fc000000-00000000fc1fffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fc200000-00000000fc3fffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fc400000-00000000fc5fffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fc600000-00000000fc7fffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fc800000-00000000fc9fffff
+>
+> > > > > > > <- correct Adress Spce
+>
+> > > > > > >
+>
+> > > > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fca00000-00000000fcbfffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fcc00000-00000000fcdfffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fce00000-00000000fcffffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fce00000-00000000fcffffff
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > > After pci_bridge_update_mappingsï¼
+>
+> > > > > > >
+>
+> > > > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fda00000-00000000fdbfffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fdc00000-00000000fddfffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fde00000-00000000fdffffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fde00000-00000000fdffffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fe000000-00000000fe1fffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fe200000-00000000fe3fffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fe400000-00000000fe5fffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fe600000-00000000fe7fffff
+>
+> > > > > > >
+>
+> > > > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW):
+>
+> > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > 00000000fe800000-00000000fe9fffff
+>
+> > > > > > >
+>
+> > > > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW):
+>
+> > > > > > > alias
+>
+> > > > pci_bridge_pref_mem
+>
+> > > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <-
+>
+> > > > > > > Exceptional
+>
+> > Adress
+>
+> > > > > > Space
+>
+> > > > > >
+>
+> > > > > > This one is empty though right?
+>
+> > > > > >
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > > We have figured out why this address becomes this value,
+>
+> > > > > > > according to pci spec,  pci driver can get BAR address
+>
+> > > > > > > size by writing 0xffffffff to
+>
+> > > > > > >
+>
+> > > > > > > the pci register firstly, and then read back the value from this
+>
+register.
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > > OK however as you show below the BAR being sized is the BAR
+>
+> > > > > > if a bridge. Are you then adding a bridge device by hotplug?
+>
+> > > > >
+>
+> > > > > No, I just simply hot plugged a VFIO device to Bus 0, another
+>
+> > > > > interesting phenomenon is If I hot plug the device to other
+>
+> > > > > bus, this doesn't
+>
+> > > > happened.
+>
+> > > > >
+>
+> > > > > >
+>
+> > > > > >
+>
+> > > > > > > We didn't handle this value  specially while process pci
+>
+> > > > > > > write in qemu, the function call stack is:
+>
+> > > > > > >
+>
+> > > > > > > Pci_bridge_dev_write_config
+>
+> > > > > > >
+>
+> > > > > > > -> pci_bridge_write_config
+>
+> > > > > > >
+>
+> > > > > > > -> pci_default_write_config (we update the config[address]
+>
+> > > > > > > -> value here to
+>
+> > > > > > > fffffffffc800000, which should be 0xfc800000 )
+>
+> > > > > > >
+>
+> > > > > > > -> pci_bridge_update_mappings
+>
+> > > > > > >
+>
+> > > > > > >                 ->pci_bridge_region_del(br, br->windows);
+>
+> > > > > > >
+>
+> > > > > > > -> pci_bridge_region_init
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use
+>
+> > > > > > > -> the
+>
+> > > > > > > wrong value
+>
+> > > > > > > fffffffffc800000)
+>
+> > > > > > >
+>
+> > > > > > >                                                 ->
+>
+> > > > > > > memory_region_transaction_commit
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > > So, as we can see, we use the wrong base address in qemu
+>
+> > > > > > > to update the memory regions, though, we update the base
+>
+> > > > > > > address to
+>
+> > > > > > >
+>
+> > > > > > > The correct value after pci driver in VM write the
+>
+> > > > > > > original value back, the virtio NIC in bus 4 may still
+>
+> > > > > > > sends net packets concurrently with
+>
+> > > > > > >
+>
+> > > > > > > The wrong memory region address.
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > > We have tried to skip the memory region update action in
+>
+> > > > > > > qemu while detect pci write with 0xffffffff value, and it
+>
+> > > > > > > does work, but
+>
+> > > > > > >
+>
+> > > > > > > This seems to be not gently.
+>
+> > > > > >
+>
+> > > > > > For sure. But I'm still puzzled as to why does Linux try to
+>
+> > > > > > size the BAR of the bridge while a device behind it is used.
+>
+> > > > > >
+>
+> > > > > > Can you pls post your QEMU command line?
+>
+> > > > >
+>
+> > > > > My QEMU command line:
+>
+> > > > > /root/xyd/qemu-system-x86_64 -name
+>
+> > > > > guest=Linux,debug-threads=on -S -object
+>
+> > > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/dom
+>
+> > > > > ain-
+>
+> > > > > 194-
+>
+> > > > > Linux/master-key.aes -machine
+>
+> > > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
+>
+> > > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
+>
+> > > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off
+>
+> > > > > -smp
+>
+> > > > > 20,sockets=20,cores=1,threads=1 -numa
+>
+> > > > > node,nodeid=0,cpus=0-4,mem=1024 -numa
+>
+> > > > > node,nodeid=1,cpus=5-9,mem=1024 -numa
+>
+> > > > > node,nodeid=2,cpus=10-14,mem=1024 -numa
+>
+> > > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
+>
+> > > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config
+>
+> > > > > -nodefaults -chardev
+>
+> > > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Li
+>
+> > > > > nux/
+>
+> > > > > moni
+>
+> > > > > tor.sock,server,nowait -mon
+>
+> > > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc
+>
+> > > > > -no-hpet -global kvm-pit.lost_tick_policy=delay -no-shutdown
+>
+> > > > > -boot strict=on -device
+>
+> > > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
+>
+> > > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
+>
+> > > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
+>
+> > > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
+>
+> > > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
+>
+> > > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
+>
+> > > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
+>
+> > > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
+>
+> > > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
+>
+> > > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
+>
+> > > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
+>
+> > > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
+>
+> > > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
+>
+> > > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id
+>
+> > > > > =dri
+>
+> > > > > ve-v
+>
+> > > > > irtio-disk0,cache=none -device
+>
+> > > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-
+>
+> > > > > disk
+>
+> > > > > 0,id
+>
+> > > > > =virtio-disk0,bootindex=1 -drive
+>
+> > > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
+>
+> > > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1
+>
+> > > > > -netdev
+>
+> > > > > tap,fd=35,id=hostnet0 -device
+>
+> > > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,b
+>
+> > > > > us=p
+>
+> > > > > ci.4
+>
+> > > > > ,addr=0x1 -chardev pty,id=charserial0 -device
+>
+> > > > > isa-serial,chardev=charserial0,id=serial0 -device
+>
+> > > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
+>
+> > > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
+>
+> > > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
+>
+> > > > > timestamp=on
+>
+> > > > >
+>
+> > > > > I am also very curious about this issue, in the linux kernel
+>
+> > > > > code, maybe double
+>
+> > > > check in function pci_bridge_check_ranges triggered this problem.
+>
+> > > >
+>
+> > > > If you can get the stacktrace in Linux when it tries to write
+>
+> > > > this fffff value, that would be quite helpful.
+>
+> > > >
+>
+> > >
+>
+> > > After I add mdelay(100) in function pci_bridge_check_ranges, this
+>
+> > > phenomenon is easier to reproduce, below is my modify in kernel:
+>
+> > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
+>
+> > > index cb389277..86e232d 100644
+>
+> > > --- a/drivers/pci/setup-bus.c
+>
+> > > +++ b/drivers/pci/setup-bus.c
+>
+> > > @@ -27,7 +27,7 @@
+>
+> > >  #include <linux/slab.h>
+>
+> > >  #include <linux/acpi.h>
+>
+> > >  #include "pci.h"
+>
+> > > -
+>
+> > > +#include <linux/delay.h>
+>
+> > >  unsigned int pci_flags;
+>
+> > >
+>
+> > >  struct pci_dev_resource {
+>
+> > > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct
+>
+> > > pci_bus
+>
+> > *bus)
+>
+> > >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+>
+> > >                                                0xffffffff);
+>
+> > >                 pci_read_config_dword(bridge,
+>
+> > > PCI_PREF_BASE_UPPER32, &tmp);
+>
+> > > +               mdelay(100);
+>
+> > > +               printk(KERN_ERR "sleep\n");
+>
+> > > +                dump_stack();
+>
+> > >                 if (!tmp)
+>
+> > >                         b_res[2].flags &= ~IORESOURCE_MEM_64;
+>
+> > >                 pci_write_config_dword(bridge,
+>
+> > > PCI_PREF_BASE_UPPER32,
+>
+> > >
+>
+> >
+>
+> > OK!
+>
+> > I just sent a Linux patch that should help.
+>
+> > I would appreciate it if you will give it a try and if that helps
+>
+> > reply to it with a
+>
+> > Tested-by: tag.
+>
+> >
+>
+>
+>
+> I tested this patch and it works fine on my machine.
+>
+>
+>
+> But I have another question, if we only fix this problem in the
+>
+> kernel, the Linux version that has been released does not work well on the
+>
+virtualization platform.
+>
+> Is there a way to fix this problem in the backend?
+>
+>
+There could we a way to work around this.
+>
+Does below help?
+I am sorry to tell you, I tested this patch and it doesn't work fine.
+
+>
+>
+diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
+>
+236a20eaa8..7834cac4b0 100644
+>
+--- a/hw/i386/acpi-build.c
+>
++++ b/hw/i386/acpi-build.c
+>
+@@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
+>
+*parent_scope, PCIBus *bus,
+>
+>
+aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
+>
+aml_append(method,
+>
+-            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check
+>
+*/)
+>
++            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device
+>
++ Check Light */)
+>
+);
+>
+aml_append(method,
+>
+aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request
+>
+*/)
+
+On Tue, Dec 11, 2018 at 03:51:09AM +0000, xuyandong wrote:
+>
+> There could we a way to work around this.
+>
+> Does below help?
+>
+>
+I am sorry to tell you, I tested this patch and it doesn't work fine.
+What happens?
+
+>
+>
+>
+> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
+>
+> 236a20eaa8..7834cac4b0 100644
+>
+> --- a/hw/i386/acpi-build.c
+>
+> +++ b/hw/i386/acpi-build.c
+>
+> @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
+>
+> *parent_scope, PCIBus *bus,
+>
+>
+>
+>          aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
+>
+>          aml_append(method,
+>
+> -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check
+>
+> */)
+>
+> +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device
+>
+> + Check Light */)
+>
+>          );
+>
+>          aml_append(method,
+>
+>              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request
+>
+> */)
+
+On Tue, Dec 11, 2018 at 03:51:09AM +0000, xuyandong wrote:
+>
+> On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote:
+>
+> > On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
+>
+> > > > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
+>
+> > > > > > > > Hi all,
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > > In our test, we configured VM with several pci-bridges and
+>
+> > > > > > > > a virtio-net nic been attached with bus 4,
+>
+> > > > > > > >
+>
+> > > > > > > > After VM is startup, We ping this nic from host to judge
+>
+> > > > > > > > if it is working normally. Then, we hot add pci devices to
+>
+> > > > > > > > this VM with bus
+>
+> > > 0.
+>
+> > > > > > > >
+>
+> > > > > > > > We  found the virtio-net NIC in bus 4 is not working (can
+>
+> > > > > > > > not
+>
+> > > > > > > > connect) occasionally, as it kick virtio backend failure with
+>
+> > > > > > > > error
+>
+> below:
+>
+> > > > > > > >
+>
+> > > > > > > >     Unassigned mem write 00000000fc803004 = 0x1
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > > memory-region: pci_bridge_pci
+>
+> > > > > > > >
+>
+> > > > > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
+>
+> > > > > > > > pci_bridge_pci
+>
+> > > > > > > >
+>
+> > > > > > > >     00000000fc800000-00000000fc803fff (prio 1, RW):
+>
+> > > > > > > > virtio-pci
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
+>
+> > > > > > > > virtio-pci-common
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
+>
+> > > > > > > > virtio-pci-isr
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
+>
+> > > > > > > > virtio-pci-device
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
+>
+> > > > > > > > virtio-pci-notify  <- io mem unassigned
+>
+> > > > > > > >
+>
+> > > > > > > >       â¦
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > > We caught an exceptional address changing while this
+>
+> > > > > > > > problem happened, show as
+>
+> > > > > > > > follow:
+>
+> > > > > > > >
+>
+> > > > > > > > Before pci_bridge_update_mappingsï¼
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fc000000-00000000fc1fffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fc200000-00000000fc3fffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fc400000-00000000fc5fffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fc600000-00000000fc7fffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fc800000-00000000fc9fffff
+>
+> > > > > > > > <- correct Adress Spce
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fca00000-00000000fcbfffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fcc00000-00000000fcdfffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fce00000-00000000fcffffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fce00000-00000000fcffffff
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > > After pci_bridge_update_mappingsï¼
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fda00000-00000000fdbfffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fdc00000-00000000fddfffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fde00000-00000000fdffffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fde00000-00000000fdffffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fe000000-00000000fe1fffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fe200000-00000000fe3fffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fe400000-00000000fe5fffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fe600000-00000000fe7fffff
+>
+> > > > > > > >
+>
+> > > > > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW):
+>
+> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
+>
+> > > > > > > > 00000000fe800000-00000000fe9fffff
+>
+> > > > > > > >
+>
+> > > > > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW):
+>
+> > > > > > > > alias
+>
+> > > > > pci_bridge_pref_mem
+>
+> > > > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <-
+>
+> > > > > > > > Exceptional
+>
+> > > Adress
+>
+> > > > > > > Space
+>
+> > > > > > >
+>
+> > > > > > > This one is empty though right?
+>
+> > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > > We have figured out why this address becomes this value,
+>
+> > > > > > > > according to pci spec,  pci driver can get BAR address
+>
+> > > > > > > > size by writing 0xffffffff to
+>
+> > > > > > > >
+>
+> > > > > > > > the pci register firstly, and then read back the value from
+>
+> > > > > > > > this
+>
+> register.
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > > OK however as you show below the BAR being sized is the BAR
+>
+> > > > > > > if a bridge. Are you then adding a bridge device by hotplug?
+>
+> > > > > >
+>
+> > > > > > No, I just simply hot plugged a VFIO device to Bus 0, another
+>
+> > > > > > interesting phenomenon is If I hot plug the device to other
+>
+> > > > > > bus, this doesn't
+>
+> > > > > happened.
+>
+> > > > > >
+>
+> > > > > > >
+>
+> > > > > > >
+>
+> > > > > > > > We didn't handle this value  specially while process pci
+>
+> > > > > > > > write in qemu, the function call stack is:
+>
+> > > > > > > >
+>
+> > > > > > > > Pci_bridge_dev_write_config
+>
+> > > > > > > >
+>
+> > > > > > > > -> pci_bridge_write_config
+>
+> > > > > > > >
+>
+> > > > > > > > -> pci_default_write_config (we update the config[address]
+>
+> > > > > > > > -> value here to
+>
+> > > > > > > > fffffffffc800000, which should be 0xfc800000 )
+>
+> > > > > > > >
+>
+> > > > > > > > -> pci_bridge_update_mappings
+>
+> > > > > > > >
+>
+> > > > > > > >                 ->pci_bridge_region_del(br, br->windows);
+>
+> > > > > > > >
+>
+> > > > > > > > -> pci_bridge_region_init
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use
+>
+> > > > > > > > -> the
+>
+> > > > > > > > wrong value
+>
+> > > > > > > > fffffffffc800000)
+>
+> > > > > > > >
+>
+> > > > > > > >                                                 ->
+>
+> > > > > > > > memory_region_transaction_commit
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > > So, as we can see, we use the wrong base address in qemu
+>
+> > > > > > > > to update the memory regions, though, we update the base
+>
+> > > > > > > > address to
+>
+> > > > > > > >
+>
+> > > > > > > > The correct value after pci driver in VM write the
+>
+> > > > > > > > original value back, the virtio NIC in bus 4 may still
+>
+> > > > > > > > sends net packets concurrently with
+>
+> > > > > > > >
+>
+> > > > > > > > The wrong memory region address.
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > >
+>
+> > > > > > > > We have tried to skip the memory region update action in
+>
+> > > > > > > > qemu while detect pci write with 0xffffffff value, and it
+>
+> > > > > > > > does work, but
+>
+> > > > > > > >
+>
+> > > > > > > > This seems to be not gently.
+>
+> > > > > > >
+>
+> > > > > > > For sure. But I'm still puzzled as to why does Linux try to
+>
+> > > > > > > size the BAR of the bridge while a device behind it is used.
+>
+> > > > > > >
+>
+> > > > > > > Can you pls post your QEMU command line?
+>
+> > > > > >
+>
+> > > > > > My QEMU command line:
+>
+> > > > > > /root/xyd/qemu-system-x86_64 -name
+>
+> > > > > > guest=Linux,debug-threads=on -S -object
+>
+> > > > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/dom
+>
+> > > > > > ain-
+>
+> > > > > > 194-
+>
+> > > > > > Linux/master-key.aes -machine
+>
+> > > > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
+>
+> > > > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
+>
+> > > > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off
+>
+> > > > > > -smp
+>
+> > > > > > 20,sockets=20,cores=1,threads=1 -numa
+>
+> > > > > > node,nodeid=0,cpus=0-4,mem=1024 -numa
+>
+> > > > > > node,nodeid=1,cpus=5-9,mem=1024 -numa
+>
+> > > > > > node,nodeid=2,cpus=10-14,mem=1024 -numa
+>
+> > > > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
+>
+> > > > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config
+>
+> > > > > > -nodefaults -chardev
+>
+> > > > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Li
+>
+> > > > > > nux/
+>
+> > > > > > moni
+>
+> > > > > > tor.sock,server,nowait -mon
+>
+> > > > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc
+>
+> > > > > > -no-hpet -global kvm-pit.lost_tick_policy=delay -no-shutdown
+>
+> > > > > > -boot strict=on -device
+>
+> > > > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
+>
+> > > > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
+>
+> > > > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
+>
+> > > > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
+>
+> > > > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
+>
+> > > > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
+>
+> > > > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
+>
+> > > > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
+>
+> > > > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
+>
+> > > > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
+>
+> > > > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
+>
+> > > > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
+>
+> > > > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
+>
+> > > > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id
+>
+> > > > > > =dri
+>
+> > > > > > ve-v
+>
+> > > > > > irtio-disk0,cache=none -device
+>
+> > > > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-
+>
+> > > > > > disk
+>
+> > > > > > 0,id
+>
+> > > > > > =virtio-disk0,bootindex=1 -drive
+>
+> > > > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
+>
+> > > > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1
+>
+> > > > > > -netdev
+>
+> > > > > > tap,fd=35,id=hostnet0 -device
+>
+> > > > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,b
+>
+> > > > > > us=p
+>
+> > > > > > ci.4
+>
+> > > > > > ,addr=0x1 -chardev pty,id=charserial0 -device
+>
+> > > > > > isa-serial,chardev=charserial0,id=serial0 -device
+>
+> > > > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
+>
+> > > > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
+>
+> > > > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
+>
+> > > > > > timestamp=on
+>
+> > > > > >
+>
+> > > > > > I am also very curious about this issue, in the linux kernel
+>
+> > > > > > code, maybe double
+>
+> > > > > check in function pci_bridge_check_ranges triggered this problem.
+>
+> > > > >
+>
+> > > > > If you can get the stacktrace in Linux when it tries to write
+>
+> > > > > this fffff value, that would be quite helpful.
+>
+> > > > >
+>
+> > > >
+>
+> > > > After I add mdelay(100) in function pci_bridge_check_ranges, this
+>
+> > > > phenomenon is easier to reproduce, below is my modify in kernel:
+>
+> > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
+>
+> > > > index cb389277..86e232d 100644
+>
+> > > > --- a/drivers/pci/setup-bus.c
+>
+> > > > +++ b/drivers/pci/setup-bus.c
+>
+> > > > @@ -27,7 +27,7 @@
+>
+> > > >  #include <linux/slab.h>
+>
+> > > >  #include <linux/acpi.h>
+>
+> > > >  #include "pci.h"
+>
+> > > > -
+>
+> > > > +#include <linux/delay.h>
+>
+> > > >  unsigned int pci_flags;
+>
+> > > >
+>
+> > > >  struct pci_dev_resource {
+>
+> > > > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct
+>
+> > > > pci_bus
+>
+> > > *bus)
+>
+> > > >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
+>
+> > > >                                                0xffffffff);
+>
+> > > >                 pci_read_config_dword(bridge,
+>
+> > > > PCI_PREF_BASE_UPPER32, &tmp);
+>
+> > > > +               mdelay(100);
+>
+> > > > +               printk(KERN_ERR "sleep\n");
+>
+> > > > +                dump_stack();
+>
+> > > >                 if (!tmp)
+>
+> > > >                         b_res[2].flags &= ~IORESOURCE_MEM_64;
+>
+> > > >                 pci_write_config_dword(bridge,
+>
+> > > > PCI_PREF_BASE_UPPER32,
+>
+> > > >
+>
+> > >
+>
+> > > OK!
+>
+> > > I just sent a Linux patch that should help.
+>
+> > > I would appreciate it if you will give it a try and if that helps
+>
+> > > reply to it with a
+>
+> > > Tested-by: tag.
+>
+> > >
+>
+> >
+>
+> > I tested this patch and it works fine on my machine.
+>
+> >
+>
+> > But I have another question, if we only fix this problem in the
+>
+> > kernel, the Linux version that has been released does not work well on the
+>
+> virtualization platform.
+>
+> > Is there a way to fix this problem in the backend?
+>
+>
+>
+> There could we a way to work around this.
+>
+> Does below help?
+>
+>
+I am sorry to tell you, I tested this patch and it doesn't work fine.
+>
+>
+>
+>
+> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
+>
+> 236a20eaa8..7834cac4b0 100644
+>
+> --- a/hw/i386/acpi-build.c
+>
+> +++ b/hw/i386/acpi-build.c
+>
+> @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
+>
+> *parent_scope, PCIBus *bus,
+>
+>
+>
+>          aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
+>
+>          aml_append(method,
+>
+> -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check
+>
+> */)
+>
+> +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device
+>
+> + Check Light */)
+>
+>          );
+>
+>          aml_append(method,
+>
+>              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request
+>
+> */)
+Oh I see, another bug:
+
+        case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
+                acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT 
+event\n");
+                /* TBD: Exactly what does 'light' mean? */
+                break;
+
+And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type)
+and friends all just ignore this event type.
+
+
+
+-- 
+MST
+
+>
+> > > > > > > > > Hi all,
+>
+> > > > > > > > >
+>
+> > > > > > > > >
+>
+> > > > > > > > >
+>
+> > > > > > > > > In our test, we configured VM with several pci-bridges
+>
+> > > > > > > > > and a virtio-net nic been attached with bus 4,
+>
+> > > > > > > > >
+>
+> > > > > > > > > After VM is startup, We ping this nic from host to
+>
+> > > > > > > > > judge if it is working normally. Then, we hot add pci
+>
+> > > > > > > > > devices to this VM with bus
+>
+> > > > 0.
+>
+> > > > > > > > >
+>
+> > > > > > > > > We  found the virtio-net NIC in bus 4 is not working
+>
+> > > > > > > > > (can not
+>
+> > > > > > > > > connect) occasionally, as it kick virtio backend
+>
+> > > > > > > > > failure with error
+>
+> > > But I have another question, if we only fix this problem in the
+>
+> > > kernel, the Linux version that has been released does not work
+>
+> > > well on the
+>
+> > virtualization platform.
+>
+> > > Is there a way to fix this problem in the backend?
+>
+> >
+>
+> > There could we a way to work around this.
+>
+> > Does below help?
+>
+>
+>
+> I am sorry to tell you, I tested this patch and it doesn't work fine.
+>
+>
+>
+> >
+>
+> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
+>
+> > 236a20eaa8..7834cac4b0 100644
+>
+> > --- a/hw/i386/acpi-build.c
+>
+> > +++ b/hw/i386/acpi-build.c
+>
+> > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
+>
+> > *parent_scope, PCIBus *bus,
+>
+> >
+>
+> >          aml_append(method, aml_store(aml_int(bsel_val),
+>
+aml_name("BNUM")));
+>
+> >          aml_append(method,
+>
+> > -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device
+>
+> > Check
+>
+*/)
+>
+> > +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /*
+>
+> > + Device Check Light */)
+>
+> >          );
+>
+> >          aml_append(method,
+>
+> >              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject
+>
+> > Request */)
+>
+>
+>
+Oh I see, another bug:
+>
+>
+case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
+>
+acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT
+>
+event\n");
+>
+/* TBD: Exactly what does 'light' mean? */
+>
+break;
+>
+>
+And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type)
+>
+and friends all just ignore this event type.
+>
+>
+>
+>
+--
+>
+MST
+Hi Michael,
+
+If we want to fix this problem on the backend, it is not enough to consider 
+only PCI
+device hot plugging, because I found that if we use a command like
+"echo 1 > /sys/bus/pci/rescan" in guest, this problem is very easy to reproduce.
+
+From the perspective of device emulation, when guest writes 0xffffffff to the 
+BAR,
+guest just want to get the size of the region but not really updating the 
+address space.
+So I made the following patch to avoid  update pci mapping.
+
+Do you think this make sense?
+
+[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
+
+When guest writes 0xffffffff to the BAR, guest just want to get the size of the 
+region
+but not really updating the address space.
+So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings 
+or pci_bridge_update_mappings.
+
+Signed-off-by: xuyandong <address@hidden>
+---
+ hw/pci/pci.c        | 6 ++++--
+ hw/pci/pci_bridge.c | 8 +++++---
+ 2 files changed, 9 insertions(+), 5 deletions(-)
+
+diff --git a/hw/pci/pci.c b/hw/pci/pci.c
+index 56b13b3..ef368e1 100644
+--- a/hw/pci/pci.c
++++ b/hw/pci/pci.c
+@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
+addr, uint32_t val_in, int
+ {
+     int i, was_irq_disabled = pci_irq_disabled(d);
+     uint32_t val = val_in;
++    uint64_t barmask = (1 << l*8) - 1;
+ 
+     for (i = 0; i < l; val >>= 8, ++i) {
+         uint8_t wmask = d->wmask[addr + i];
+@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
+addr, uint32_t val_in, int
+         d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
+         d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
+     }
+-    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
++    if ((val_in != barmask &&
++       (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+         ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
+-        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
++        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
+         range_covers_byte(addr, l, PCI_COMMAND))
+         pci_update_mappings(d);
+ 
+diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
+index ee9dff2..f2bad79 100644
+--- a/hw/pci/pci_bridge.c
++++ b/hw/pci/pci_bridge.c
+@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
+     PCIBridge *s = PCI_BRIDGE(d);
+     uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
+     uint16_t newctl;
++    uint64_t barmask = (1 << len * 8) - 1;
+ 
+     pci_default_write_config(d, address, val, len);
+ 
+     if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+ 
+-        /* io base/limit */
+-        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
++        (val != barmask &&
++       /* io base/limit */
++        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+ 
+         /* memory base/limit, prefetchable base/limit and
+            io base/limit upper 16 */
+-        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
++        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
+ 
+         /* vga enable */
+         ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
+-- 
+1.8.3.1
+
+On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote:
+>
+> > > > > > > > > > Hi all,
+>
+> > > > > > > > > >
+>
+> > > > > > > > > >
+>
+> > > > > > > > > >
+>
+> > > > > > > > > > In our test, we configured VM with several pci-bridges
+>
+> > > > > > > > > > and a virtio-net nic been attached with bus 4,
+>
+> > > > > > > > > >
+>
+> > > > > > > > > > After VM is startup, We ping this nic from host to
+>
+> > > > > > > > > > judge if it is working normally. Then, we hot add pci
+>
+> > > > > > > > > > devices to this VM with bus
+>
+> > > > > 0.
+>
+> > > > > > > > > >
+>
+> > > > > > > > > > We  found the virtio-net NIC in bus 4 is not working
+>
+> > > > > > > > > > (can not
+>
+> > > > > > > > > > connect) occasionally, as it kick virtio backend
+>
+> > > > > > > > > > failure with error
+>
+>
+> > > > But I have another question, if we only fix this problem in the
+>
+> > > > kernel, the Linux version that has been released does not work
+>
+> > > > well on the
+>
+> > > virtualization platform.
+>
+> > > > Is there a way to fix this problem in the backend?
+>
+> > >
+>
+> > > There could we a way to work around this.
+>
+> > > Does below help?
+>
+> >
+>
+> > I am sorry to tell you, I tested this patch and it doesn't work fine.
+>
+> >
+>
+> > >
+>
+> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
+>
+> > > 236a20eaa8..7834cac4b0 100644
+>
+> > > --- a/hw/i386/acpi-build.c
+>
+> > > +++ b/hw/i386/acpi-build.c
+>
+> > > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
+>
+> > > *parent_scope, PCIBus *bus,
+>
+> > >
+>
+> > >          aml_append(method, aml_store(aml_int(bsel_val),
+>
+> aml_name("BNUM")));
+>
+> > >          aml_append(method,
+>
+> > > -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device
+>
+> > > Check
+>
+> */)
+>
+> > > +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /*
+>
+> > > + Device Check Light */)
+>
+> > >          );
+>
+> > >          aml_append(method,
+>
+> > >              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject
+>
+> > > Request */)
+>
+>
+>
+>
+>
+> Oh I see, another bug:
+>
+>
+>
+>         case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
+>
+>                 acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT
+>
+> event\n");
+>
+>                 /* TBD: Exactly what does 'light' mean? */
+>
+>                 break;
+>
+>
+>
+> And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type)
+>
+> and friends all just ignore this event type.
+>
+>
+>
+>
+>
+>
+>
+> --
+>
+> MST
+>
+>
+Hi Michael,
+>
+>
+If we want to fix this problem on the backend, it is not enough to consider
+>
+only PCI
+>
+device hot plugging, because I found that if we use a command like
+>
+"echo 1 > /sys/bus/pci/rescan" in guest, this problem is very easy to
+>
+reproduce.
+>
+>
+From the perspective of device emulation, when guest writes 0xffffffff to the
+>
+BAR,
+>
+guest just want to get the size of the region but not really updating the
+>
+address space.
+>
+So I made the following patch to avoid  update pci mapping.
+>
+>
+Do you think this make sense?
+>
+>
+[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
+>
+>
+When guest writes 0xffffffff to the BAR, guest just want to get the size of
+>
+the region
+>
+but not really updating the address space.
+>
+So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings
+>
+or pci_bridge_update_mappings.
+>
+>
+Signed-off-by: xuyandong <address@hidden>
+I see how that will address the common case however there are a bunch of
+issues here.  First of all it's easy to trigger the update by some other
+action like VM migration.  More importantly it's just possible that
+guest actually does want to set the low 32 bit of the address to all
+ones.  For example, that is clearly listed as a way to disable all
+devices behind the bridge in the pci to pci bridge spec.
+
+Given upstream is dragging it's feet I'm open to adding a flag
+that will help keep guests going as a temporary measure.
+We will need to think about ways to restrict this as much as
+we can.
+
+
+>
+---
+>
+hw/pci/pci.c        | 6 ++++--
+>
+hw/pci/pci_bridge.c | 8 +++++---
+>
+2 files changed, 9 insertions(+), 5 deletions(-)
+>
+>
+diff --git a/hw/pci/pci.c b/hw/pci/pci.c
+>
+index 56b13b3..ef368e1 100644
+>
+--- a/hw/pci/pci.c
+>
++++ b/hw/pci/pci.c
+>
+@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t
+>
+addr, uint32_t val_in, int
+>
+{
+>
+int i, was_irq_disabled = pci_irq_disabled(d);
+>
+uint32_t val = val_in;
+>
++    uint64_t barmask = (1 << l*8) - 1;
+>
+>
+for (i = 0; i < l; val >>= 8, ++i) {
+>
+uint8_t wmask = d->wmask[addr + i];
+>
+@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t
+>
+addr, uint32_t val_in, int
+>
+d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
+>
+d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
+>
+}
+>
+-    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+>
++    if ((val_in != barmask &&
+>
++     (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+>
+ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
+>
+-        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
+>
++        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
+>
+range_covers_byte(addr, l, PCI_COMMAND))
+>
+pci_update_mappings(d);
+>
+>
+diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
+>
+index ee9dff2..f2bad79 100644
+>
+--- a/hw/pci/pci_bridge.c
+>
++++ b/hw/pci/pci_bridge.c
+>
+@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+PCIBridge *s = PCI_BRIDGE(d);
+>
+uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
+>
+uint16_t newctl;
+>
++    uint64_t barmask = (1 << len * 8) - 1;
+>
+>
+pci_default_write_config(d, address, val, len);
+>
+>
+if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+>
+-        /* io base/limit */
+>
+-        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
++        (val != barmask &&
+>
++     /* io base/limit */
+>
++        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
+>
+/* memory base/limit, prefetchable base/limit and
+>
+io base/limit upper 16 */
+>
+-        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
+>
++        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
+>
+>
+/* vga enable */
+>
+ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
+>
+--
+>
+1.8.3.1
+>
+>
+>
+
+>
+-----Original Message-----
+>
+From: Michael S. Tsirkin [
+mailto:address@hidden
+>
+Sent: Monday, January 07, 2019 11:06 PM
+>
+To: xuyandong <address@hidden>
+>
+Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu-
+>
+address@hidden; Zhanghailiang <address@hidden>;
+>
+wangxin (U) <address@hidden>; Huangweidong (C)
+>
+<address@hidden>
+>
+Subject: Re: [BUG]Unassigned mem write during pci device hot-plug
+>
+>
+On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote:
+>
+> > > > > > > > > > > Hi all,
+>
+> > > > > > > > > > >
+>
+> > > > > > > > > > >
+>
+> > > > > > > > > > >
+>
+> > > > > > > > > > > In our test, we configured VM with several
+>
+> > > > > > > > > > > pci-bridges and a virtio-net nic been attached
+>
+> > > > > > > > > > > with bus 4,
+>
+> > > > > > > > > > >
+>
+> > > > > > > > > > > After VM is startup, We ping this nic from host to
+>
+> > > > > > > > > > > judge if it is working normally. Then, we hot add
+>
+> > > > > > > > > > > pci devices to this VM with bus
+>
+> > > > > > 0.
+>
+> > > > > > > > > > >
+>
+> > > > > > > > > > > We  found the virtio-net NIC in bus 4 is not
+>
+> > > > > > > > > > > working (can not
+>
+> > > > > > > > > > > connect) occasionally, as it kick virtio backend
+>
+> > > > > > > > > > > failure with error
+>
+>
+>
+> > > > > But I have another question, if we only fix this problem in
+>
+> > > > > the kernel, the Linux version that has been released does not
+>
+> > > > > work well on the
+>
+> > > > virtualization platform.
+>
+> > > > > Is there a way to fix this problem in the backend?
+>
+>
+>
+> Hi Michael,
+>
+>
+>
+> If we want to fix this problem on the backend, it is not enough to
+>
+> consider only PCI device hot plugging, because I found that if we use
+>
+> a command like "echo 1 > /sys/bus/pci/rescan" in guest, this problem is very
+>
+easy to reproduce.
+>
+>
+>
+> From the perspective of device emulation, when guest writes 0xffffffff
+>
+> to the BAR, guest just want to get the size of the region but not really
+>
+updating the address space.
+>
+> So I made the following patch to avoid  update pci mapping.
+>
+>
+>
+> Do you think this make sense?
+>
+>
+>
+> [PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
+>
+>
+>
+> When guest writes 0xffffffff to the BAR, guest just want to get the
+>
+> size of the region but not really updating the address space.
+>
+> So when guest writes 0xffffffff to BAR, we need avoid
+>
+> pci_update_mappings or pci_bridge_update_mappings.
+>
+>
+>
+> Signed-off-by: xuyandong <address@hidden>
+>
+>
+I see how that will address the common case however there are a bunch of
+>
+issues here.  First of all it's easy to trigger the update by some other
+>
+action like
+>
+VM migration.  More importantly it's just possible that guest actually does
+>
+want
+>
+to set the low 32 bit of the address to all ones.  For example, that is
+>
+clearly
+>
+listed as a way to disable all devices behind the bridge in the pci to pci
+>
+bridge
+>
+spec.
+Ok, I see. If I only skip upate when guest writing 0xFFFFFFFF to Prefetcable 
+Base Upper 32 Bits
+to meet the kernel double check problem.
+Do you think there is still risk?
+
+>
+>
+Given upstream is dragging it's feet I'm open to adding a flag that will help
+>
+keep guests going as a temporary measure.
+>
+We will need to think about ways to restrict this as much as we can.
+>
+>
+>
+> ---
+>
+>  hw/pci/pci.c        | 6 ++++--
+>
+>  hw/pci/pci_bridge.c | 8 +++++---
+>
+>  2 files changed, 9 insertions(+), 5 deletions(-)
+>
+>
+>
+> diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644
+>
+> --- a/hw/pci/pci.c
+>
+> +++ b/hw/pci/pci.c
+>
+> @@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d,
+>
+> uint32_t addr, uint32_t val_in, int  {
+>
+>      int i, was_irq_disabled = pci_irq_disabled(d);
+>
+>      uint32_t val = val_in;
+>
+> +    uint64_t barmask = (1 << l*8) - 1;
+>
+>
+>
+>      for (i = 0; i < l; val >>= 8, ++i) {
+>
+>          uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@
+>
+> void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in,
+>
+int
+>
+>          d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val &
+>
+> wmask);
+>
+>          d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear
+>
+> */
+>
+>      }
+>
+> -    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+>
+> +    if ((val_in != barmask &&
+>
+> +   (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+>
+>          ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
+>
+> -        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
+>
+> +        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
+>
+>          range_covers_byte(addr, l, PCI_COMMAND))
+>
+>          pci_update_mappings(d);
+>
+>
+>
+> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index
+>
+> ee9dff2..f2bad79 100644
+>
+> --- a/hw/pci/pci_bridge.c
+>
+> +++ b/hw/pci/pci_bridge.c
+>
+> @@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+>      PCIBridge *s = PCI_BRIDGE(d);
+>
+>      uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
+>
+>      uint16_t newctl;
+>
+> +    uint64_t barmask = (1 << len * 8) - 1;
+>
+>
+>
+>      pci_default_write_config(d, address, val, len);
+>
+>
+>
+>      if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+>
+>
+> -        /* io base/limit */
+>
+> -        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
+> +        (val != barmask &&
+>
+> +   /* io base/limit */
+>
+> +        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
+>
+>
+>          /* memory base/limit, prefetchable base/limit and
+>
+>             io base/limit upper 16 */
+>
+> -        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
+>
+> +        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
+>
+>
+>
+>          /* vga enable */
+>
+>          ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
+>
+> --
+>
+> 1.8.3.1
+>
+>
+>
+>
+>
+>
+
+On Mon, Jan 07, 2019 at 03:28:36PM +0000, xuyandong wrote:
+>
+>
+>
+> -----Original Message-----
+>
+> From: Michael S. Tsirkin [
+mailto:address@hidden
+>
+> Sent: Monday, January 07, 2019 11:06 PM
+>
+> To: xuyandong <address@hidden>
+>
+> Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu-
+>
+> address@hidden; Zhanghailiang <address@hidden>;
+>
+> wangxin (U) <address@hidden>; Huangweidong (C)
+>
+> <address@hidden>
+>
+> Subject: Re: [BUG]Unassigned mem write during pci device hot-plug
+>
+>
+>
+> On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote:
+>
+> > > > > > > > > > > > Hi all,
+>
+> > > > > > > > > > > >
+>
+> > > > > > > > > > > >
+>
+> > > > > > > > > > > >
+>
+> > > > > > > > > > > > In our test, we configured VM with several
+>
+> > > > > > > > > > > > pci-bridges and a virtio-net nic been attached
+>
+> > > > > > > > > > > > with bus 4,
+>
+> > > > > > > > > > > >
+>
+> > > > > > > > > > > > After VM is startup, We ping this nic from host to
+>
+> > > > > > > > > > > > judge if it is working normally. Then, we hot add
+>
+> > > > > > > > > > > > pci devices to this VM with bus
+>
+> > > > > > > 0.
+>
+> > > > > > > > > > > >
+>
+> > > > > > > > > > > > We  found the virtio-net NIC in bus 4 is not
+>
+> > > > > > > > > > > > working (can not
+>
+> > > > > > > > > > > > connect) occasionally, as it kick virtio backend
+>
+> > > > > > > > > > > > failure with error
+>
+> >
+>
+> > > > > > But I have another question, if we only fix this problem in
+>
+> > > > > > the kernel, the Linux version that has been released does not
+>
+> > > > > > work well on the
+>
+> > > > > virtualization platform.
+>
+> > > > > > Is there a way to fix this problem in the backend?
+>
+> >
+>
+> > Hi Michael,
+>
+> >
+>
+> > If we want to fix this problem on the backend, it is not enough to
+>
+> > consider only PCI device hot plugging, because I found that if we use
+>
+> > a command like "echo 1 > /sys/bus/pci/rescan" in guest, this problem is
+>
+> > very
+>
+> easy to reproduce.
+>
+> >
+>
+> > From the perspective of device emulation, when guest writes 0xffffffff
+>
+> > to the BAR, guest just want to get the size of the region but not really
+>
+> updating the address space.
+>
+> > So I made the following patch to avoid  update pci mapping.
+>
+> >
+>
+> > Do you think this make sense?
+>
+> >
+>
+> > [PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
+>
+> >
+>
+> > When guest writes 0xffffffff to the BAR, guest just want to get the
+>
+> > size of the region but not really updating the address space.
+>
+> > So when guest writes 0xffffffff to BAR, we need avoid
+>
+> > pci_update_mappings or pci_bridge_update_mappings.
+>
+> >
+>
+> > Signed-off-by: xuyandong <address@hidden>
+>
+>
+>
+> I see how that will address the common case however there are a bunch of
+>
+> issues here.  First of all it's easy to trigger the update by some other
+>
+> action like
+>
+> VM migration.  More importantly it's just possible that guest actually does
+>
+> want
+>
+> to set the low 32 bit of the address to all ones.  For example, that is
+>
+> clearly
+>
+> listed as a way to disable all devices behind the bridge in the pci to pci
+>
+> bridge
+>
+> spec.
+>
+>
+Ok, I see. If I only skip upate when guest writing 0xFFFFFFFF to Prefetcable
+>
+Base Upper 32 Bits
+>
+to meet the kernel double check problem.
+>
+Do you think there is still risk?
+Well it's non zero since spec says such a write should disable all
+accesses. Just an idea: why not add an option to disable upper 32 bit?
+That is ugly and limits space but spec compliant.
+
+>
+>
+>
+> Given upstream is dragging it's feet I'm open to adding a flag that will
+>
+> help
+>
+> keep guests going as a temporary measure.
+>
+> We will need to think about ways to restrict this as much as we can.
+>
+>
+>
+>
+>
+> > ---
+>
+> >  hw/pci/pci.c        | 6 ++++--
+>
+> >  hw/pci/pci_bridge.c | 8 +++++---
+>
+> >  2 files changed, 9 insertions(+), 5 deletions(-)
+>
+> >
+>
+> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644
+>
+> > --- a/hw/pci/pci.c
+>
+> > +++ b/hw/pci/pci.c
+>
+> > @@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d,
+>
+> > uint32_t addr, uint32_t val_in, int  {
+>
+> >      int i, was_irq_disabled = pci_irq_disabled(d);
+>
+> >      uint32_t val = val_in;
+>
+> > +    uint64_t barmask = (1 << l*8) - 1;
+>
+> >
+>
+> >      for (i = 0; i < l; val >>= 8, ++i) {
+>
+> >          uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@
+>
+> > void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t
+>
+> > val_in,
+>
+> int
+>
+> >          d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val &
+>
+> > wmask);
+>
+> >          d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to
+>
+> > Clear */
+>
+> >      }
+>
+> > -    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+>
+> > +    if ((val_in != barmask &&
+>
+> > + (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+>
+> >          ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
+>
+> > -        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
+>
+> > +        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
+>
+> >          range_covers_byte(addr, l, PCI_COMMAND))
+>
+> >          pci_update_mappings(d);
+>
+> >
+>
+> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index
+>
+> > ee9dff2..f2bad79 100644
+>
+> > --- a/hw/pci/pci_bridge.c
+>
+> > +++ b/hw/pci/pci_bridge.c
+>
+> > @@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+> >      PCIBridge *s = PCI_BRIDGE(d);
+>
+> >      uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
+>
+> >      uint16_t newctl;
+>
+> > +    uint64_t barmask = (1 << len * 8) - 1;
+>
+> >
+>
+> >      pci_default_write_config(d, address, val, len);
+>
+> >
+>
+> >      if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+> >
+>
+> > -        /* io base/limit */
+>
+> > -        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
+> > +        (val != barmask &&
+>
+> > + /* io base/limit */
+>
+> > +        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
+> >
+>
+> >          /* memory base/limit, prefetchable base/limit and
+>
+> >             io base/limit upper 16 */
+>
+> > -        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
+>
+> > +        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
+>
+> >
+>
+> >          /* vga enable */
+>
+> >          ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
+>
+> > --
+>
+> > 1.8.3.1
+>
+> >
+>
+> >
+>
+> >
+
+>
+-----Original Message-----
+>
+From: xuyandong
+>
+Sent: Monday, January 07, 2019 10:37 PM
+>
+To: 'Michael S. Tsirkin' <address@hidden>
+>
+Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu-
+>
+address@hidden; Zhanghailiang <address@hidden>;
+>
+wangxin (U) <address@hidden>; Huangweidong (C)
+>
+<address@hidden>
+>
+Subject: RE: [BUG]Unassigned mem write during pci device hot-plug
+>
+>
+> > > > > > > > > > Hi all,
+>
+> > > > > > > > > >
+>
+> > > > > > > > > >
+>
+> > > > > > > > > >
+>
+> > > > > > > > > > In our test, we configured VM with several
+>
+> > > > > > > > > > pci-bridges and a virtio-net nic been attached with
+>
+> > > > > > > > > > bus 4,
+>
+> > > > > > > > > >
+>
+> > > > > > > > > > After VM is startup, We ping this nic from host to
+>
+> > > > > > > > > > judge if it is working normally. Then, we hot add
+>
+> > > > > > > > > > pci devices to this VM with bus
+>
+> > > > > 0.
+>
+> > > > > > > > > >
+>
+> > > > > > > > > > We  found the virtio-net NIC in bus 4 is not working
+>
+> > > > > > > > > > (can not
+>
+> > > > > > > > > > connect) occasionally, as it kick virtio backend
+>
+> > > > > > > > > > failure with error
+>
+>
+> > > > But I have another question, if we only fix this problem in the
+>
+> > > > kernel, the Linux version that has been released does not work
+>
+> > > > well on the
+>
+> > > virtualization platform.
+>
+> > > > Is there a way to fix this problem in the backend?
+>
+> > >
+>
+> > > There could we a way to work around this.
+>
+> > > Does below help?
+>
+> >
+>
+> > I am sorry to tell you, I tested this patch and it doesn't work fine.
+>
+> >
+>
+> > >
+>
+> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
+>
+> > > 236a20eaa8..7834cac4b0 100644
+>
+> > > --- a/hw/i386/acpi-build.c
+>
+> > > +++ b/hw/i386/acpi-build.c
+>
+> > > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
+>
+> > > *parent_scope, PCIBus *bus,
+>
+> > >
+>
+> > >          aml_append(method, aml_store(aml_int(bsel_val),
+>
+> aml_name("BNUM")));
+>
+> > >          aml_append(method,
+>
+> > > -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device
+>
+Check
+>
+> */)
+>
+> > > +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /*
+>
+> > > + Device Check Light */)
+>
+> > >          );
+>
+> > >          aml_append(method,
+>
+> > >              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/*
+>
+> > > Eject Request */)
+>
+>
+>
+>
+>
+> Oh I see, another bug:
+>
+>
+>
+>         case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
+>
+>                 acpi_handle_debug(handle,
+>
+> "ACPI_NOTIFY_DEVICE_CHECK_LIGHT event\n");
+>
+>                 /* TBD: Exactly what does 'light' mean? */
+>
+>                 break;
+>
+>
+>
+> And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32
+>
+> type) and friends all just ignore this event type.
+>
+>
+>
+>
+>
+>
+>
+> --
+>
+> MST
+>
+>
+Hi Michael,
+>
+>
+If we want to fix this problem on the backend, it is not enough to consider
+>
+only
+>
+PCI device hot plugging, because I found that if we use a command like "echo
+>
+1 >
+>
+/sys/bus/pci/rescan" in guest, this problem is very easy to reproduce.
+>
+>
+From the perspective of device emulation, when guest writes 0xffffffff to the
+>
+BAR, guest just want to get the size of the region but not really updating the
+>
+address space.
+>
+So I made the following patch to avoid  update pci mapping.
+>
+>
+Do you think this make sense?
+>
+>
+[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
+>
+>
+When guest writes 0xffffffff to the BAR, guest just want to get the size of
+>
+the
+>
+region but not really updating the address space.
+>
+So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings or
+>
+pci_bridge_update_mappings.
+>
+>
+Signed-off-by: xuyandong <address@hidden>
+>
+---
+>
+hw/pci/pci.c        | 6 ++++--
+>
+hw/pci/pci_bridge.c | 8 +++++---
+>
+2 files changed, 9 insertions(+), 5 deletions(-)
+>
+>
+diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644
+>
+--- a/hw/pci/pci.c
+>
++++ b/hw/pci/pci.c
+>
+@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t
+>
+addr, uint32_t val_in, int  {
+>
+int i, was_irq_disabled = pci_irq_disabled(d);
+>
+uint32_t val = val_in;
+>
++    uint64_t barmask = (1 << l*8) - 1;
+>
+>
+for (i = 0; i < l; val >>= 8, ++i) {
+>
+uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@ void
+>
+pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
+>
+d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
+>
+d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
+>
+}
+>
+-    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+>
++    if ((val_in != barmask &&
+>
++     (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+>
+ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
+>
+-        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
+>
++        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
+>
+range_covers_byte(addr, l, PCI_COMMAND))
+>
+pci_update_mappings(d);
+>
+>
+diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index ee9dff2..f2bad79
+>
+100644
+>
+--- a/hw/pci/pci_bridge.c
+>
++++ b/hw/pci/pci_bridge.c
+>
+@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
+>
+PCIBridge *s = PCI_BRIDGE(d);
+>
+uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
+>
+uint16_t newctl;
+>
++    uint64_t barmask = (1 << len * 8) - 1;
+>
+>
+pci_default_write_config(d, address, val, len);
+>
+>
+if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+>
+>
+-        /* io base/limit */
+>
+-        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
++        (val != barmask &&
+>
++     /* io base/limit */
+>
++        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+>
+>
+/* memory base/limit, prefetchable base/limit and
+>
+io base/limit upper 16 */
+>
+-        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
+>
++        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
+>
+>
+/* vga enable */
+>
+ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
+>
+--
+>
+1.8.3.1
+>
+>
+Sorry, please ignore the patch above.
+
+Here is the patch I want to post:
+
+diff --git a/hw/pci/pci.c b/hw/pci/pci.c
+index 56b13b3..38a300f 100644
+--- a/hw/pci/pci.c
++++ b/hw/pci/pci.c
+@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
+addr, uint32_t val_in, int
+ {
+     int i, was_irq_disabled = pci_irq_disabled(d);
+     uint32_t val = val_in;
++    uint64_t barmask = ((uint64_t)1 << l*8) - 1;
+ 
+     for (i = 0; i < l; val >>= 8, ++i) {
+         uint8_t wmask = d->wmask[addr + i];
+@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
+addr, uint32_t val_in, int
+         d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
+         d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
+     }
+-    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
++    if ((val_in != barmask &&
++       (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+         ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
+-        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
++        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
+         range_covers_byte(addr, l, PCI_COMMAND))
+         pci_update_mappings(d);
+ 
+diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
+index ee9dff2..b8f7d48 100644
+--- a/hw/pci/pci_bridge.c
++++ b/hw/pci/pci_bridge.c
+@@ -253,20 +253,22 @@ void pci_bridge_write_config(PCIDevice *d,
+     PCIBridge *s = PCI_BRIDGE(d);
+     uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
+     uint16_t newctl;
++    uint64_t barmask = ((uint64_t)1 << len * 8) - 1;
+ 
+     pci_default_write_config(d, address, val, len);
+ 
+     if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
+ 
+-        /* io base/limit */
+-        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
++        /* vga enable */
++        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2) ||
+ 
+-        /* memory base/limit, prefetchable base/limit and
+-           io base/limit upper 16 */
+-        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
++        (val != barmask &&
++        /* io base/limit */
++         (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
+ 
+-        /* vga enable */
+-        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
++         /* memory base/limit, prefetchable base/limit and
++            io base/limit upper 16 */
++         ranges_overlap(address, len, PCI_MEMORY_BASE, 20)))) {
+         pci_bridge_update_mappings(s);
+     }
+ 
+-- 
+1.8.3.1
+
diff --git a/results/classifier/004/other/70416488 b/results/classifier/004/other/70416488
new file mode 100644
index 00000000..6c0da5fd
--- /dev/null
+++ b/results/classifier/004/other/70416488
@@ -0,0 +1,1187 @@
+other: 0.980
+semantic: 0.975
+graphic: 0.972
+assembly: 0.953
+instruction: 0.947
+device: 0.945
+mistranslation: 0.942
+vnc: 0.910
+KVM: 0.908
+boot: 0.897
+network: 0.881
+socket: 0.870
+
+[Bug Report] smmuv3 event 0x10 report when running virtio-blk-pci
+
+Hi All,
+
+When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
+during kernel booting up.
+
+qemu command which I use is as below:
+
+qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
+-kernel Image -initrd minifs.cpio.gz \
+-enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
+-append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
+-device 
+pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 
+\
+-device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
+-device 
+virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
+-drive file=/home/boot.img,if=none,id=drive0,format=raw
+
+smmuv3 event 0x10 log:
+[...]
+[    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
+[    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
+[    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
+[    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks (1.07 
+GB/1.00 GiB)
+[    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+[    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
+[    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+[    1.968381] clk: Disabling unused clocks
+[    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+[    1.968990] PM: genpd: Disabling unused power domains
+[    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+[    1.969814] ALSA device list:
+[    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+[    1.970471]   No soundcards found.
+[    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+[    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+[    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+[    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+[    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+[    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+[    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+[    1.975005] Freeing unused kernel memory: 10112K
+[    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+[    1.975442] Run init as init process
+
+Another information is that if "maxcpus=3" is removed from the kernel command 
+line,
+it will be OK.
+
+I am not sure if there is a bug about vsmmu. It will be very appreciated if 
+anyone
+know this issue or can take a look at it.
+
+Thanks,
+Zhou
+
+On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
+>
+>
+Hi All,
+>
+>
+When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
+>
+during kernel booting up.
+Does it still do this if you either:
+ (1) use the v9.1.0 release (commit fd1952d814da)
+ (2) use "-machine virt-9.1" instead of "-machine virt"
+
+?
+
+My suspicion is that this will have started happening now that
+we expose an SMMU with two-stage translation support to the guest
+in the "virt" machine type (which we do not if you either
+use virt-9.1 or in the v9.1.0 release).
+
+I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
+the two-stage support).
+
+>
+qemu command which I use is as below:
+>
+>
+qemu-system-aarch64 -machine
+>
+virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
+>
+-kernel Image -initrd minifs.cpio.gz \
+>
+-enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
+>
+-append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
+>
+-device
+>
+pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
+>
+\
+>
+-device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
+>
+-device
+>
+virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
+>
+-drive file=/home/boot.img,if=none,id=drive0,format=raw
+>
+>
+smmuv3 event 0x10 log:
+>
+[...]
+>
+[    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
+>
+[    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
+>
+[    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
+>
+[    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
+>
+(1.07 GB/1.00 GiB)
+>
+[    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+[    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
+>
+[    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+[    1.968381] clk: Disabling unused clocks
+>
+[    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+[    1.968990] PM: genpd: Disabling unused power domains
+>
+[    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.969814] ALSA device list:
+>
+[    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.970471]   No soundcards found.
+>
+[    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+[    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+[    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+[    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+[    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.975005] Freeing unused kernel memory: 10112K
+>
+[    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.975442] Run init as init process
+>
+>
+Another information is that if "maxcpus=3" is removed from the kernel command
+>
+line,
+>
+it will be OK.
+>
+>
+I am not sure if there is a bug about vsmmu. It will be very appreciated if
+>
+anyone
+>
+know this issue or can take a look at it.
+thanks
+-- PMM
+
+On 2024/9/9 22:31, Peter Maydell wrote:
+>
+On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
+>
+>
+>
+> Hi All,
+>
+>
+>
+> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
+>
+> during kernel booting up.
+>
+>
+Does it still do this if you either:
+>
+(1) use the v9.1.0 release (commit fd1952d814da)
+>
+(2) use "-machine virt-9.1" instead of "-machine virt"
+I tested above two cases, the problem is still there.
+
+>
+>
+?
+>
+>
+My suspicion is that this will have started happening now that
+>
+we expose an SMMU with two-stage translation support to the guest
+>
+in the "virt" machine type (which we do not if you either
+>
+use virt-9.1 or in the v9.1.0 release).
+>
+>
+I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
+>
+the two-stage support).
+>
+>
+> qemu command which I use is as below:
+>
+>
+>
+> qemu-system-aarch64 -machine
+>
+> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
+>
+> -kernel Image -initrd minifs.cpio.gz \
+>
+> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
+>
+> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
+>
+> -device
+>
+> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
+>
+>  \
+>
+> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
+>
+> -device
+>
+> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
+>
+> -drive file=/home/boot.img,if=none,id=drive0,format=raw
+>
+>
+>
+> smmuv3 event 0x10 log:
+>
+> [...]
+>
+> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
+>
+> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
+>
+> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
+>
+> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
+>
+> (1.07 GB/1.00 GiB)
+>
+> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
+>
+> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+> [    1.968381] clk: Disabling unused clocks
+>
+> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+> [    1.968990] PM: genpd: Disabling unused power domains
+>
+> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.969814] ALSA device list:
+>
+> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.970471]   No soundcards found.
+>
+> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.975005] Freeing unused kernel memory: 10112K
+>
+> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.975442] Run init as init process
+>
+>
+>
+> Another information is that if "maxcpus=3" is removed from the kernel
+>
+> command line,
+>
+> it will be OK.
+>
+>
+>
+> I am not sure if there is a bug about vsmmu. It will be very appreciated if
+>
+> anyone
+>
+> know this issue or can take a look at it.
+>
+>
+thanks
+>
+-- PMM
+>
+.
+
+Hi Zhou,
+On 9/10/24 03:24, Zhou Wang via wrote:
+>
+On 2024/9/9 22:31, Peter Maydell wrote:
+>
+> On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
+>
+>> Hi All,
+>
+>>
+>
+>> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
+>
+>> during kernel booting up.
+>
+> Does it still do this if you either:
+>
+>  (1) use the v9.1.0 release (commit fd1952d814da)
+>
+>  (2) use "-machine virt-9.1" instead of "-machine virt"
+>
+I tested above two cases, the problem is still there.
+Thank you for reporting. I am able to reproduce and effectively the
+maxcpus kernel option is triggering the issue. It works without. I will
+come back to you asap.
+
+Eric
+>
+>
+> ?
+>
+>
+>
+> My suspicion is that this will have started happening now that
+>
+> we expose an SMMU with two-stage translation support to the guest
+>
+> in the "virt" machine type (which we do not if you either
+>
+> use virt-9.1 or in the v9.1.0 release).
+>
+>
+>
+> I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
+>
+> the two-stage support).
+>
+>
+>
+>> qemu command which I use is as below:
+>
+>>
+>
+>> qemu-system-aarch64 -machine
+>
+>> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
+>
+>> -kernel Image -initrd minifs.cpio.gz \
+>
+>> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
+>
+>> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
+>
+>> -device
+>
+>> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
+>
+>>  \
+>
+>> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
+>
+>> -device
+>
+>> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
+>
+>> -drive file=/home/boot.img,if=none,id=drive0,format=raw
+>
+>>
+>
+>> smmuv3 event 0x10 log:
+>
+>> [...]
+>
+>> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
+>
+>> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
+>
+>> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
+>
+>> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
+>
+>> (1.07 GB/1.00 GiB)
+>
+>> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+>> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
+>
+>> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+>> [    1.968381] clk: Disabling unused clocks
+>
+>> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+>> [    1.968990] PM: genpd: Disabling unused power domains
+>
+>> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.969814] ALSA device list:
+>
+>> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.970471]   No soundcards found.
+>
+>> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+>> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+>> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+>> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+>> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+>> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+>> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.975005] Freeing unused kernel memory: 10112K
+>
+>> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.975442] Run init as init process
+>
+>>
+>
+>> Another information is that if "maxcpus=3" is removed from the kernel
+>
+>> command line,
+>
+>> it will be OK.
+>
+>>
+>
+>> I am not sure if there is a bug about vsmmu. It will be very appreciated if
+>
+>> anyone
+>
+>> know this issue or can take a look at it.
+>
+> thanks
+>
+> -- PMM
+>
+> .
+
+Hi,
+
+On 9/10/24 03:24, Zhou Wang via wrote:
+>
+On 2024/9/9 22:31, Peter Maydell wrote:
+>
+> On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
+>
+>> Hi All,
+>
+>>
+>
+>> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
+>
+>> during kernel booting up.
+>
+> Does it still do this if you either:
+>
+>  (1) use the v9.1.0 release (commit fd1952d814da)
+>
+>  (2) use "-machine virt-9.1" instead of "-machine virt"
+>
+I tested above two cases, the problem is still there.
+I have not much progressed yet but I see it comes with
+qemu traces.
+
+smmuv3-iommu-memory-region-0-0 translation failed for iova=0x0
+(SMMU_EVT_F_TRANSLATION)
+../..
+qemu-system-aarch64: virtio-blk failed to set guest notifier (-22),
+ensure -accel kvm is set.
+qemu-system-aarch64: virtio_bus_start_ioeventfd: failed. Fallback to
+userspace (slower).
+
+the PCIe Host bridge seems to cause that translation failure at iova=0
+
+Also virtio-iommu has the same issue:
+qemu-system-aarch64: virtio_iommu_translate no mapping for 0x0 for sid=1024
+qemu-system-aarch64: virtio-blk failed to set guest notifier (-22),
+ensure -accel kvm is set.
+qemu-system-aarch64: virtio_bus_start_ioeventfd: failed. Fallback to
+userspace (slower).
+
+Only happens with maxcpus=3. Note the virtio-blk-pci is not protected by
+the vIOMMU in your case.
+
+Thanks
+
+Eric
+
+>
+>
+> ?
+>
+>
+>
+> My suspicion is that this will have started happening now that
+>
+> we expose an SMMU with two-stage translation support to the guest
+>
+> in the "virt" machine type (which we do not if you either
+>
+> use virt-9.1 or in the v9.1.0 release).
+>
+>
+>
+> I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
+>
+> the two-stage support).
+>
+>
+>
+>> qemu command which I use is as below:
+>
+>>
+>
+>> qemu-system-aarch64 -machine
+>
+>> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
+>
+>> -kernel Image -initrd minifs.cpio.gz \
+>
+>> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
+>
+>> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
+>
+>> -device
+>
+>> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
+>
+>>  \
+>
+>> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
+>
+>> -device
+>
+>> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
+>
+>> -drive file=/home/boot.img,if=none,id=drive0,format=raw
+>
+>>
+>
+>> smmuv3 event 0x10 log:
+>
+>> [...]
+>
+>> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
+>
+>> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
+>
+>> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
+>
+>> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
+>
+>> (1.07 GB/1.00 GiB)
+>
+>> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+>> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
+>
+>> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+>> [    1.968381] clk: Disabling unused clocks
+>
+>> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+>> [    1.968990] PM: genpd: Disabling unused power domains
+>
+>> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.969814] ALSA device list:
+>
+>> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.970471]   No soundcards found.
+>
+>> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+>> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+>> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+>> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+>> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+>> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+>> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.975005] Freeing unused kernel memory: 10112K
+>
+>> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.975442] Run init as init process
+>
+>>
+>
+>> Another information is that if "maxcpus=3" is removed from the kernel
+>
+>> command line,
+>
+>> it will be OK.
+>
+>>
+>
+>> I am not sure if there is a bug about vsmmu. It will be very appreciated if
+>
+>> anyone
+>
+>> know this issue or can take a look at it.
+>
+> thanks
+>
+> -- PMM
+>
+> .
+
+Hi Zhou,
+
+On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote:
+>
+>
+Hi All,
+>
+>
+When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
+>
+during kernel booting up.
+>
+>
+qemu command which I use is as below:
+>
+>
+qemu-system-aarch64 -machine
+>
+virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
+>
+-kernel Image -initrd minifs.cpio.gz \
+>
+-enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
+>
+-append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
+>
+-device
+>
+pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
+>
+\
+>
+-device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
+>
+-device
+>
+virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
+>
+-drive file=/home/boot.img,if=none,id=drive0,format=raw
+>
+>
+smmuv3 event 0x10 log:
+>
+[...]
+>
+[    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
+>
+[    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
+>
+[    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
+>
+[    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
+>
+(1.07 GB/1.00 GiB)
+>
+[    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+[    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
+>
+[    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+[    1.968381] clk: Disabling unused clocks
+>
+[    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+[    1.968990] PM: genpd: Disabling unused power domains
+>
+[    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.969814] ALSA device list:
+>
+[    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.970471]   No soundcards found.
+>
+[    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+[    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+[    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+[    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+[    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.975005] Freeing unused kernel memory: 10112K
+>
+[    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+[    1.975442] Run init as init process
+>
+>
+Another information is that if "maxcpus=3" is removed from the kernel command
+>
+line,
+>
+it will be OK.
+>
+That's interesting, not sure how that would be related.
+
+>
+I am not sure if there is a bug about vsmmu. It will be very appreciated if
+>
+anyone
+>
+know this issue or can take a look at it.
+>
+Can you please provide logs with adding "-d trace:smmu*" to qemu invocation.
+
+Also if possible, can you please provide which Linux kernel version
+you are using, I will see if I can repro.
+
+Thanks,
+Mostafa
+
+>
+Thanks,
+>
+Zhou
+>
+>
+>
+
+On 2024/9/9 22:47, Mostafa Saleh wrote:
+>
+Hi Zhou,
+>
+>
+On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote:
+>
+>
+>
+> Hi All,
+>
+>
+>
+> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
+>
+> during kernel booting up.
+>
+>
+>
+> qemu command which I use is as below:
+>
+>
+>
+> qemu-system-aarch64 -machine
+>
+> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
+>
+> -kernel Image -initrd minifs.cpio.gz \
+>
+> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
+>
+> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
+>
+> -device
+>
+> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
+>
+>  \
+>
+> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
+>
+> -device
+>
+> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
+>
+> -drive file=/home/boot.img,if=none,id=drive0,format=raw
+>
+>
+>
+> smmuv3 event 0x10 log:
+>
+> [...]
+>
+> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
+>
+> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
+>
+> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
+>
+> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
+>
+> (1.07 GB/1.00 GiB)
+>
+> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
+>
+> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+> [    1.968381] clk: Disabling unused clocks
+>
+> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+> [    1.968990] PM: genpd: Disabling unused power domains
+>
+> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.969814] ALSA device list:
+>
+> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.970471]   No soundcards found.
+>
+> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.975005] Freeing unused kernel memory: 10112K
+>
+> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+> [    1.975442] Run init as init process
+>
+>
+>
+> Another information is that if "maxcpus=3" is removed from the kernel
+>
+> command line,
+>
+> it will be OK.
+>
+>
+>
+>
+That's interesting, not sure how that would be related.
+>
+>
+> I am not sure if there is a bug about vsmmu. It will be very appreciated if
+>
+> anyone
+>
+> know this issue or can take a look at it.
+>
+>
+>
+>
+Can you please provide logs with adding "-d trace:smmu*" to qemu invocation.
+Sure. Please see the attached log(using above qemu commit and command).
+
+>
+>
+Also if possible, can you please provide which Linux kernel version
+>
+you are using, I will see if I can repro.
+I just use the latest mainline kernel(commit b831f83e40a2) with defconfig.
+
+Thanks,
+Zhou
+
+>
+>
+Thanks,
+>
+Mostafa
+>
+>
+> Thanks,
+>
+> Zhou
+>
+>
+>
+>
+>
+>
+>
+>
+.
+qemu_boot_log.txt
+Description:
+Text document
+
+On Tue, Sep 10, 2024 at 2:51â¯AM Zhou Wang <wangzhou1@hisilicon.com> wrote:
+>
+>
+On 2024/9/9 22:47, Mostafa Saleh wrote:
+>
+> Hi Zhou,
+>
+>
+>
+> On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote:
+>
+>>
+>
+>> Hi All,
+>
+>>
+>
+>> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event
+>
+>> 0x10
+>
+>> during kernel booting up.
+>
+>>
+>
+>> qemu command which I use is as below:
+>
+>>
+>
+>> qemu-system-aarch64 -machine
+>
+>> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
+>
+>> -kernel Image -initrd minifs.cpio.gz \
+>
+>> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
+>
+>> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
+>
+>> -device
+>
+>> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
+>
+>>  \
+>
+>> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1
+>
+>> \
+>
+>> -device
+>
+>> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
+>
+>> -drive file=/home/boot.img,if=none,id=drive0,format=raw
+>
+>>
+>
+>> smmuv3 event 0x10 log:
+>
+>> [...]
+>
+>> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
+>
+>> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
+>
+>> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
+>
+>> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
+>
+>> (1.07 GB/1.00 GiB)
+>
+>> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+>> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
+>
+>> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+>> [    1.968381] clk: Disabling unused clocks
+>
+>> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+>> [    1.968990] PM: genpd: Disabling unused power domains
+>
+>> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.969814] ALSA device list:
+>
+>> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.970471]   No soundcards found.
+>
+>> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+>> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+>> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+>> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
+>
+>> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
+>
+>> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
+>
+>> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.975005] Freeing unused kernel memory: 10112K
+>
+>> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
+>
+>> [    1.975442] Run init as init process
+>
+>>
+>
+>> Another information is that if "maxcpus=3" is removed from the kernel
+>
+>> command line,
+>
+>> it will be OK.
+>
+>>
+>
+>
+>
+> That's interesting, not sure how that would be related.
+>
+>
+>
+>> I am not sure if there is a bug about vsmmu. It will be very appreciated
+>
+>> if anyone
+>
+>> know this issue or can take a look at it.
+>
+>>
+>
+>
+>
+> Can you please provide logs with adding "-d trace:smmu*" to qemu invocation.
+>
+>
+Sure. Please see the attached log(using above qemu commit and command).
+>
+Thanks a lot, it seems the SMMUv3 indeed receives a translation
+request with addr 0x0 which causes this event.
+I don't see any kind of modification (alignment) of the address in this path.
+So my hunch it's not related to the SMMUv3 and the initiator is
+issuing bogus addresses.
+
+>
+>
+>
+> Also if possible, can you please provide which Linux kernel version
+>
+> you are using, I will see if I can repro.
+>
+>
+I just use the latest mainline kernel(commit b831f83e40a2) with defconfig.
+>
+I see, I can't repro in my setup which has no "--enable-kvm" and with
+"-cpu max" instead of host.
+I will try other options and see if I can repro.
+
+Thanks,
+Mostafa
+>
+Thanks,
+>
+Zhou
+>
+>
+>
+>
+> Thanks,
+>
+> Mostafa
+>
+>
+>
+>> Thanks,
+>
+>> Zhou
+>
+>>
+>
+>>
+>
+>>
+>
+>
+>
+> .
+
diff --git a/results/classifier/004/other/74715356 b/results/classifier/004/other/74715356
new file mode 100644
index 00000000..8725e032
--- /dev/null
+++ b/results/classifier/004/other/74715356
@@ -0,0 +1,134 @@
+other: 0.927
+semantic: 0.916
+instruction: 0.910
+assembly: 0.905
+device: 0.900
+graphic: 0.894
+boot: 0.881
+mistranslation: 0.870
+KVM: 0.863
+vnc: 0.850
+socket: 0.843
+network: 0.838
+
+[Bug] x86 EFLAGS refresh is not happening correctly
+
+Hello,
+I'm posting this here instead of opening an issue as it is not clear to me if this is a bug or not.
+The issue is located in function "cpu_compute_eflags" in target/i386/cpu.h
+(
+https://gitlab.com/qemu-project/qemu/-/blob/master/target/i386/cpu.h#L2071
+)
+This function is exectued in an out of cpu loop context.
+It is used to synchronize TCG internal eflags registers (CC_OP, CC_SRC,Â  etc...) with the CPU eflags field upon loop exit.
+It does:
+Â  Â  eflags
+|=
+cpu_cc_compute_all
+(
+env
+,
+CC_OP
+)
+|
+(
+env
+->
+df
+&
+DF_MASK
+);
+Shouldn't it be:
+Â  Â  Â
+eflags
+=
+cpu_cc_compute_all
+(
+env
+,
+CC_OP
+)
+|
+(
+env
+->
+df
+&
+DF_MASK
+);
+as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
+Thanks,
+Kind regards,
+Stevie
+
+On 05/08/21 11:51, Stevie Lavern wrote:
+Shouldn't it be:
+eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
+as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
+No, both are wrong.  env->eflags contains flags other than the
+arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
+The right code is in helper_read_eflags.  You can move it into
+cpu_compute_eflags, and make helper_read_eflags use it.
+Paolo
+
+On 05/08/21 13:24, Paolo Bonzini wrote:
+On 05/08/21 11:51, Stevie Lavern wrote:
+Shouldn't it be:
+eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
+as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
+No, both are wrong.Â  env->eflags contains flags other than the
+arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
+The right code is in helper_read_eflags.Â  You can move it into
+cpu_compute_eflags, and make helper_read_eflags use it.
+Ah, actually the two are really the same, the TF/VM bits do not apply to
+cpu_compute_eflags so it's correct.
+What seems wrong is migration of the EFLAGS register.  There should be
+code in cpu_pre_save and cpu_post_load to special-case it and setup
+CC_DST/CC_OP as done in cpu_load_eflags.
+Also, cpu_load_eflags should assert that update_mask does not include
+any of the arithmetic flags.
+Paolo
+
+Thank for your reply!
+It's still a bit cryptic for me.
+I think i need to precise that I'm using a x86_64 custom user-mode,base on linux user-mode, that i'm developing (unfortunately i cannot share the code) with modifications in the translation loop (I've added cpu loop exits on specific instructions which are not control flow instructions).
+If my understanding is correct, in the user-mode case 'cpu_compute_eflags' is called directly by 'x86_cpu_exec_exit' with the intention of synchronizing the CPU env->eflags field with its real value (represented by the CC_* fields).
+I'm not sure how 'cpu_pre_save' and 'cpu_post_load' are involved in this case.
+Â
+As you said in your first email, 'helper_read_eflags' seems to be the correct way to go.
+Here is some detail about my current experimentation/understanding of this "issue":
+With the current implementationÂ
+Â  Â  Â  Â  Â
+eflags |= cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
+if I exit the loop with a CC_OP different from CC_OP_EFLAGS, I found that the resulting env->eflags may be invalid.
+In my test case, the loop was exiting with eflags = 0x44 and CC_OP = CC_OP_SUBL with CC_DST=1, CC_SRC=258, CC_SRC2=0.
+While 'cpu_cc_compute_all' computes the correct flags (ZF:0, PF:0), the result will still be 0x44 (ZF:1, PF:1) due to the 'or' operation, thus leading to an incorrect eflags value loaded into the CPU env.Â
+In my case, after loop reentry, it led to an invalid branch to be taken.
+Thanks for your time!
+Regards
+Stevie
+Â
+On Thu, Aug 5, 2021 at 1:33 PM Paolo Bonzini <
+pbonzini@redhat.com
+> wrote:
+On 05/08/21 13:24, Paolo Bonzini wrote:
+> On 05/08/21 11:51, Stevie Lavern wrote:
+>>
+>> Shouldn't it be:
+>> eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
+>> as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
+>
+> No, both are wrong.Â  env->eflags contains flags other than the
+> arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
+>
+> The right code is in helper_read_eflags.Â  You can move it into
+> cpu_compute_eflags, and make helper_read_eflags use it.
+Ah, actually the two are really the same, the TF/VM bits do not apply to
+cpu_compute_eflags so it's correct.
+What seems wrong is migration of the EFLAGS register.Â  There should be
+code in cpu_pre_save and cpu_post_load to special-case it and setup
+CC_DST/CC_OP as done in cpu_load_eflags.
+Also, cpu_load_eflags should assert that update_mask does not include
+any of the arithmetic flags.
+Paolo
+
diff --git a/results/classifier/004/other/79834768 b/results/classifier/004/other/79834768
new file mode 100644
index 00000000..a773a498
--- /dev/null
+++ b/results/classifier/004/other/79834768
@@ -0,0 +1,417 @@
+other: 0.943
+graphic: 0.933
+semantic: 0.920
+assembly: 0.918
+device: 0.915
+socket: 0.912
+instruction: 0.911
+vnc: 0.885
+boot: 0.880
+mistranslation: 0.877
+KVM: 0.840
+network: 0.830
+
+[Qemu-devel] [BUG] Windows 7 got stuck easily while run PCMark10 application
+
+Hiï¼
+
+We hit a bug in our test while run PCMark 10 in a windows 7 VM,
+The VM got stuck and the wallclock was hang after several minutes running
+PCMark 10 in it.
+It is quite easily to reproduce the bug with the upstream KVM and Qemu.
+
+We found that KVM can not inject any RTC irq to VM after it was hang, it fails 
+to
+Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr.
+
+static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
+                  int irq_level, bool line_status)
+{
+â¦ â¦
+         if (!irq_level) {
+                  ioapic->irr &= ~mask;
+                  ret = 1;
+                  goto out;
+         }
+â¦ â¦
+         if ((edge && old_irr == ioapic->irr) ||
+             (!edge && entry.fields.remote_irr)) {
+                  ret = 0;
+                  goto out;
+         }
+
+According to RTC spec, after RTC injects a High level irq, OS will read CMOSâs
+register C to to clear the irq flag, and pull down the irq electric pin.
+
+For Qemu, we will emulate the reading operation in cmos_ioport_read(),
+but Guest OS will fire a write operation before to tell which register will be 
+read
+after this write, where we use s->cmos_index to record the following register 
+to read.
+
+But in our test, we found that there is a possible situation that Vcpu fails to 
+read
+RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading
+registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C,
+so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C,
+but before it tries to read register C, another vcpu1 is going to read RTC_YEAR,
+it changes s->cmos_index to RTC_YEAR by a writing action.
+The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we 
+will miss
+calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never 
+inject RTC irq,
+and Windows VM will hang.
+static void cmos_ioport_write(void *opaque, hwaddr addr,
+                              uint64_t data, unsigned size)
+{
+    RTCState *s = opaque;
+
+    if ((addr & 1) == 0) {
+        s->cmos_index = data & 0x7f;
+    }
+â¦â¦
+static uint64_t cmos_ioport_read(void *opaque, hwaddr addr,
+                                 unsigned size)
+{
+    RTCState *s = opaque;
+    int ret;
+    if ((addr & 1) == 0) {
+        return 0xff;
+    } else {
+        switch(s->cmos_index) {
+
+According to CMOS spec, âany write to PROT 0070h should be followed by an 
+action to PROT 0071h or the RTC
+Will be RTC will be left in an unknown stateâ, but it seems that we can not 
+ensure this sequence in qemu/kvm.
+
+Any ideas ?
+
+Thanks,
+Hailiang
+
+Pls see the trace of kvm_pio:
+
+       CPU 1/KVM-15567 [003] .... 209311.762579: kvm_pio: pio_read at 0x70 size 
+1 count 1 val 0xff
+       CPU 1/KVM-15567 [003] .... 209311.762582: kvm_pio: pio_write at 0x70 
+size 1 count 1 val 0x89
+       CPU 1/KVM-15567 [003] .... 209311.762590: kvm_pio: pio_read at 0x71 size 
+1 count 1 val 0x17
+       CPU 0/KVM-15566 [005] .... 209311.762611: kvm_pio: pio_write at 0x70 
+size 1 count 1 val 0xc
+       CPU 1/KVM-15567 [003] .... 209311.762615: kvm_pio: pio_read at 0x70 size 
+1 count 1 val 0xff
+       CPU 1/KVM-15567 [003] .... 209311.762619: kvm_pio: pio_write at 0x70 
+size 1 count 1 val 0x88
+       CPU 1/KVM-15567 [003] .... 209311.762627: kvm_pio: pio_read at 0x71 size 
+1 count 1 val 0x12
+       CPU 0/KVM-15566 [005] .... 209311.762632: kvm_pio: pio_read at 0x71 size 
+1 count 1 val 0x12
+       CPU 1/KVM-15567 [003] .... 209311.762633: kvm_pio: pio_read at 0x70 size 
+1 count 1 val 0xff
+       CPU 0/KVM-15566 [005] .... 209311.762634: kvm_pio: pio_write at 0x70 
+size 1 count 1 val 0xc           <--- Firstly write to 0x70, cmo_index = 0xc & 
+0x7f = 0xc
+       CPU 1/KVM-15567 [003] .... 209311.762636: kvm_pio: pio_write at 0x70 
+size 1 count 1 val 0x86       <-- Secondly write to 0x70, cmo_index = 0x86 & 
+0x7f = 0x6, cover the cmo_index result of first time
+       CPU 0/KVM-15566 [005] .... 209311.762641: kvm_pio: pio_read at 0x71 size 
+1 count 1 val 0x6      <--  vcpu0 read 0x6 because cmo_index is 0x6 now
+       CPU 1/KVM-15567 [003] .... 209311.762644: kvm_pio: pio_read at 0x71 size 
+1 count 1 val 0x6     <-  vcpu1 read 0x6
+       CPU 1/KVM-15567 [003] .... 209311.762649: kvm_pio: pio_read at 0x70 size 
+1 count 1 val 0xff
+       CPU 1/KVM-15567 [003] .... 209311.762669: kvm_pio: pio_write at 0x70 
+size 1 count 1 val 0x87
+       CPU 1/KVM-15567 [003] .... 209311.762678: kvm_pio: pio_read at 0x71 size 
+1 count 1 val 0x1
+       CPU 1/KVM-15567 [003] .... 209311.762683: kvm_pio: pio_read at 0x70 size 
+1 count 1 val 0xff
+       CPU 1/KVM-15567 [003] .... 209311.762686: kvm_pio: pio_write at 0x70 
+size 1 count 1 val 0x84
+       CPU 1/KVM-15567 [003] .... 209311.762693: kvm_pio: pio_read at 0x71 size 
+1 count 1 val 0x10
+       CPU 1/KVM-15567 [003] .... 209311.762699: kvm_pio: pio_read at 0x70 size 
+1 count 1 val 0xff
+       CPU 1/KVM-15567 [003] .... 209311.762702: kvm_pio: pio_write at 0x70 
+size 1 count 1 val 0x82
+       CPU 1/KVM-15567 [003] .... 209311.762709: kvm_pio: pio_read at 0x71 size 
+1 count 1 val 0x25
+       CPU 1/KVM-15567 [003] .... 209311.762714: kvm_pio: pio_read at 0x70 size 
+1 count 1 val 0xff
+       CPU 1/KVM-15567 [003] .... 209311.762717: kvm_pio: pio_write at 0x70 
+size 1 count 1 val 0x80
+
+
+Regards,
+-Gonglei
+
+From: Zhanghailiang
+Sent: Friday, December 01, 2017 3:03 AM
+To: address@hidden; address@hidden; Paolo Bonzini
+Cc: Huangweidong (C); Gonglei (Arei); wangxin (U); Xiexiangyou
+Subject: [BUG] Windows 7 got stuck easily while run PCMark10 application
+
+Hiï¼
+
+We hit a bug in our test while run PCMark 10 in a windows 7 VM,
+The VM got stuck and the wallclock was hang after several minutes running
+PCMark 10 in it.
+It is quite easily to reproduce the bug with the upstream KVM and Qemu.
+
+We found that KVM can not inject any RTC irq to VM after it was hang, it fails 
+to
+Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr.
+
+static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
+                  int irq_level, bool line_status)
+{
+â¦ â¦
+         if (!irq_level) {
+                  ioapic->irr &= ~mask;
+                  ret = 1;
+                  goto out;
+         }
+â¦ â¦
+         if ((edge && old_irr == ioapic->irr) ||
+             (!edge && entry.fields.remote_irr)) {
+                  ret = 0;
+                  goto out;
+         }
+
+According to RTC spec, after RTC injects a High level irq, OS will read CMOSâs
+register C to to clear the irq flag, and pull down the irq electric pin.
+
+For Qemu, we will emulate the reading operation in cmos_ioport_read(),
+but Guest OS will fire a write operation before to tell which register will be 
+read
+after this write, where we use s->cmos_index to record the following register 
+to read.
+
+But in our test, we found that there is a possible situation that Vcpu fails to 
+read
+RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading
+registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C,
+so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C,
+but before it tries to read register C, another vcpu1 is going to read RTC_YEAR,
+it changes s->cmos_index to RTC_YEAR by a writing action.
+The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we 
+will miss
+calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never 
+inject RTC irq,
+and Windows VM will hang.
+static void cmos_ioport_write(void *opaque, hwaddr addr,
+                              uint64_t data, unsigned size)
+{
+    RTCState *s = opaque;
+
+    if ((addr & 1) == 0) {
+        s->cmos_index = data & 0x7f;
+    }
+â¦â¦
+static uint64_t cmos_ioport_read(void *opaque, hwaddr addr,
+                                 unsigned size)
+{
+    RTCState *s = opaque;
+    int ret;
+    if ((addr & 1) == 0) {
+        return 0xff;
+    } else {
+        switch(s->cmos_index) {
+
+According to CMOS spec, âany write to PROT 0070h should be followed by an 
+action to PROT 0071h or the RTC
+Will be RTC will be left in an unknown stateâ, but it seems that we can not 
+ensure this sequence in qemu/kvm.
+
+Any ideas ?
+
+Thanks,
+Hailiang
+
+On 01/12/2017 08:08, Gonglei (Arei) wrote:
+>
+First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
+>
+Â Â Â Â Â Â  CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc>
+>
+Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6> Â Â Â Â Â Â  CPU 1/KVM-15567
+>
+kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because
+>
+cmos_index is 0x6 now:> Â Â Â Â Â Â  CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size
+>
+1 count 1 val 0x6> vcpu1 read 0x6:> Â Â Â Â Â Â  CPU 1/KVM-15567 kvm_pio: pio_read
+>
+at 0x71 size 1 count 1 val 0x6
+This seems to be a Windows bug.  The easiest workaround that I
+can think of is to clear the interrupts already when 0xc is written,
+without waiting for the read (because REG_C can only be read).
+
+What do you think?
+
+Thanks,
+
+Paolo
+
+I also think it's windows bug, the problem is that it doesn't occur on xen 
+platform. And there are some other works need to be done while reading REG_C. 
+So I wrote that patch.
+
+Thanks,
+Gonglei
+åä»¶äººï¼Paolo Bonzini
+æ¶ä»¶äººï¼é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin
+æéï¼é»ä¼æ ,çæ¬£,è°¢ç¥¥æ
+æ¶é´ï¼2017-12-02 01:10:08
+ä¸»é¢:Re: [BUG] Windows 7 got stuck easily while run PCMark10 application
+
+On 01/12/2017 08:08, Gonglei (Arei) wrote:
+>
+First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
+>
+CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc>
+>
+Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>        CPU 1/KVM-15567
+>
+kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because
+>
+cmos_index is 0x6 now:>        CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size
+>
+1 count 1 val 0x6> vcpu1 read 0x6:>        CPU 1/KVM-15567 kvm_pio: pio_read
+>
+at 0x71 size 1 count 1 val 0x6
+This seems to be a Windows bug.  The easiest workaround that I
+can think of is to clear the interrupts already when 0xc is written,
+without waiting for the read (because REG_C can only be read).
+
+What do you think?
+
+Thanks,
+
+Paolo
+
+On 01/12/2017 18:45, Gonglei (Arei) wrote:
+>
+I also think it's windows bug, the problem is that it doesn't occur on
+>
+xen platform.
+It's a race, it may just be that RTC PIO is faster in Xen because it's
+implemented in the hypervisor.
+
+I will try reporting it to Microsoft.
+
+Thanks,
+
+Paolo
+
+>
+Thanks,
+>
+Gonglei
+>
+*åä»¶äººï¼*Paolo Bonzini
+>
+*æ¶ä»¶äººï¼*é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin
+>
+*æéï¼*é»ä¼æ ,çæ¬£,è°¢ç¥¥æ
+>
+*æ¶é´ï¼*2017-12-02 01:10:08
+>
+*ä¸»é¢:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application
+>
+>
+On 01/12/2017 08:08, Gonglei (Arei) wrote:
+>
+> First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
+>
+> Â Â Â Â Â Â  CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc>
+>
+> Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6> Â Â Â Â Â Â  CPU 1/KVM-15567
+>
+> kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because
+>
+> cmos_index is 0x6 now:> Â Â Â Â Â Â  CPU 0/KVM-15566 kvm_pio: pio_read at 0x71
+>
+> size 1 count 1 val 0x6> vcpu1
+>
+read 0x6:> Â Â Â Â Â Â  CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count
+>
+1 val 0x6
+>
+This seems to be a Windows bug.Â  The easiest workaround that I
+>
+can think of is to clear the interrupts already when 0xc is written,
+>
+without waiting for the read (because REG_C can only be read).
+>
+>
+What do you think?
+>
+>
+Thanks,
+>
+>
+Paolo
+
+On 2017/12/2 2:37, Paolo Bonzini wrote:
+On 01/12/2017 18:45, Gonglei (Arei) wrote:
+I also think it's windows bug, the problem is that it doesn't occur on
+xen platform.
+It's a race, it may just be that RTC PIO is faster in Xen because it's
+implemented in the hypervisor.
+No, In Xen, it does not has such problem because it injects the RTC irq without
+checking whether its previous irq been cleared or not, which we do has such 
+checking
+in KVM.
+
+static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
+        int irq_level, bool line_status)
+{
+   ... ...
+    if (!irq_level) {
+        ioapic->irr &= ~mask; -->clear the RTC irq in irr, Or we will can not 
+inject RTC irq.
+        ret = 1;
+        goto out;
+    }
+
+I agree that we move the operation of clearing RTC irq from cmos_ioport_read() 
+to
+cmos_ioport_write() to ensure the action been done.
+
+Thanks,
+Hailiang
+I will try reporting it to Microsoft.
+
+Thanks,
+
+Paolo
+Thanks,
+Gonglei
+*åä»¶äººï¼*Paolo Bonzini
+*æ¶ä»¶äººï¼*é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin
+*æéï¼*é»ä¼æ ,çæ¬£,è°¢ç¥¥æ
+*æ¶é´ï¼*2017-12-02 01:10:08
+*ä¸»é¢:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application
+
+On 01/12/2017 08:08, Gonglei (Arei) wrote:
+First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
+        CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> Second write to 
+0x70, cmos_index = 0x86 & 0x7f = 0x6>        CPU 1/KVM-15567 kvm_pio: pio_write at 0x70 
+size 1 count 1 val 0x86> vcpu0 read 0x6 because cmos_index is 0x6 now:>        CPU 
+0/KVM-15566 kvm_pio: pio_read at 0x71 size 1 count 1 val 0x6> vcpu1
+read 0x6:>        CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count
+1 val 0x6
+This seems to be a Windows bug.  The easiest workaround that I
+can think of is to clear the interrupts already when 0xc is written,
+without waiting for the read (because REG_C can only be read).
+
+What do you think?
+
+Thanks,
+
+Paolo
+.
+
diff --git a/results/classifier/004/other/81775929 b/results/classifier/004/other/81775929
new file mode 100644
index 00000000..1688f663
--- /dev/null
+++ b/results/classifier/004/other/81775929
@@ -0,0 +1,243 @@
+other: 0.877
+assembly: 0.849
+vnc: 0.825
+semantic: 0.825
+graphic: 0.818
+instruction: 0.816
+KVM: 0.815
+mistranslation: 0.811
+socket: 0.810
+device: 0.777
+network: 0.759
+boot: 0.742
+
+[Qemu-devel] [BUG] Monitor QMP is broken ?
+
+Hello!
+
+ I have updated my qemu to the recent version and it seems to have lost 
+compatibility with
+libvirt. The error message is:
+--- cut ---
+internal error: unable to execute QEMU command 'qmp_capabilities': QMP input 
+object member
+'id' is unexpected
+--- cut ---
+ What does it mean? Is it intentional or not?
+
+Kind regards,
+Pavel Fedin
+Expert Engineer
+Samsung Electronics Research center Russia
+
+Hello! 
+
+>
+I have updated my qemu to the recent version and it seems to have lost
+>
+compatibility
+with
+>
+libvirt. The error message is:
+>
+--- cut ---
+>
+internal error: unable to execute QEMU command 'qmp_capabilities': QMP input
+>
+object
+>
+member
+>
+'id' is unexpected
+>
+--- cut ---
+>
+What does it mean? Is it intentional or not?
+I have found the problem. It is caused by commit
+65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to use the 
+removed
+asynchronous interface but it still feeds in JSONs with 'id' field set to 
+something. So i
+think the related fragment in qmp_check_input_obj() function should be brought 
+back
+
+Kind regards,
+Pavel Fedin
+Expert Engineer
+Samsung Electronics Research center Russia
+
+On Fri, Jun 05, 2015 at 04:58:46PM +0300, Pavel Fedin wrote:
+>
+Hello!
+>
+>
+>  I have updated my qemu to the recent version and it seems to have lost
+>
+> compatibility
+>
+with
+>
+> libvirt. The error message is:
+>
+> --- cut ---
+>
+> internal error: unable to execute QEMU command 'qmp_capabilities': QMP
+>
+> input object
+>
+> member
+>
+> 'id' is unexpected
+>
+> --- cut ---
+>
+>  What does it mean? Is it intentional or not?
+>
+>
+I have found the problem. It is caused by commit
+>
+65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to use the
+>
+removed
+>
+asynchronous interface but it still feeds in JSONs with 'id' field set to
+>
+something. So i
+>
+think the related fragment in qmp_check_input_obj() function should be
+>
+brought back
+If QMP is rejecting the 'id' parameter that is a regression bug.
+
+[quote]
+The QMP spec says
+
+2.3 Issuing Commands
+--------------------
+
+The format for command execution is:
+
+{ "execute": json-string, "arguments": json-object, "id": json-value }
+
+ Where,
+
+- The "execute" member identifies the command to be executed by the Server
+- The "arguments" member is used to pass any arguments required for the
+  execution of the command, it is optional when no arguments are
+  required. Each command documents what contents will be considered
+  valid when handling the json-argument
+- The "id" member is a transaction identification associated with the
+  command execution, it is optional and will be part of the response if
+  provided. The "id" member can be any json-value, although most
+  clients merely use a json-number incremented for each successive
+  command
+
+
+2.4 Commands Responses
+----------------------
+
+There are two possible responses which the Server will issue as the result
+of a command execution: success or error.
+
+2.4.1 success
+-------------
+
+The format of a success response is:
+
+{ "return": json-value, "id": json-value }
+
+ Where,
+
+- The "return" member contains the data returned by the command, which
+  is defined on a per-command basis (usually a json-object or
+  json-array of json-objects, but sometimes a json-number, json-string,
+  or json-array of json-strings); it is an empty json-object if the
+  command does not return data
+- The "id" member contains the transaction identification associated
+  with the command execution if issued by the Client
+
+[/quote]
+
+And as such, libvirt chose to /always/ send an 'id' parameter in all
+commands it issues.
+
+We don't however validate the id in the reply, though arguably we
+should have done so.
+
+Regards,
+Daniel
+-- 
+|:
+http://berrange.com
+-o-
+http://www.flickr.com/photos/dberrange/
+:|
+|:
+http://libvirt.org
+-o-
+http://virt-manager.org
+:|
+|:
+http://autobuild.org
+-o-
+http://search.cpan.org/~danberr/
+:|
+|:
+http://entangle-photo.org
+-o-
+http://live.gnome.org/gtk-vnc
+:|
+
+"Daniel P. Berrange" <address@hidden> writes:
+
+>
+On Fri, Jun 05, 2015 at 04:58:46PM +0300, Pavel Fedin wrote:
+>
+>  Hello!
+>
+>
+>
+> >  I have updated my qemu to the recent version and it seems to have
+>
+> > lost compatibility
+>
+> with
+>
+> > libvirt. The error message is:
+>
+> > --- cut ---
+>
+> > internal error: unable to execute QEMU command 'qmp_capabilities':
+>
+> > QMP input object
+>
+> > member
+>
+> > 'id' is unexpected
+>
+> > --- cut ---
+>
+> >  What does it mean? Is it intentional or not?
+>
+>
+>
+>  I have found the problem. It is caused by commit
+>
+> 65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to
+>
+> use the removed
+>
+> asynchronous interface but it still feeds in JSONs with 'id' field
+>
+> set to something. So i
+>
+> think the related fragment in qmp_check_input_obj() function should
+>
+> be brought back
+>
+>
+If QMP is rejecting the 'id' parameter that is a regression bug.
+It is definitely a regression, my fault, and I'll get it fixed a.s.a.p.
+
+[...]
+
diff --git a/results/classifier/004/other/85542195 b/results/classifier/004/other/85542195
new file mode 100644
index 00000000..314b8f44
--- /dev/null
+++ b/results/classifier/004/other/85542195
@@ -0,0 +1,128 @@
+other: 0.944
+semantic: 0.941
+assembly: 0.941
+graphic: 0.938
+device: 0.936
+instruction: 0.935
+boot: 0.932
+vnc: 0.923
+mistranslation: 0.907
+socket: 0.905
+network: 0.899
+KVM: 0.898
+
+[Qemu-devel] [Bug in qemu-system-ppc running Mac OS 9 on Windows 10]
+
+Hi all,
+
+I've been experiencing issues when installing Mac OS 9.x using
+qemu-system-ppc.exe in Windows 10. After booting from CD image,
+partitioning a fresh disk image often hangs Qemu. When using a
+pre-partitioned disk image, the OS installation process halts
+somewhere during the process. The issues can be resolved by setting
+qemu-system-ppc.exe to run in Windows 7 compatibility mode.
+AFAIK all Qemu builds for Windows since Mac OS 9 became available as
+guest are affected.
+The issue is reproducible by installing Qemu for Windows from Stephan
+Weil on Windows 10 and boot/install Mac OS 9.x
+
+Best regards and thanks for looking into this,
+Howard
+
+On Nov 25, 2016, at 9:26 AM, address@hidden wrote:
+Hi all,
+
+I've been experiencing issues when installing Mac OS 9.x using
+qemu-system-ppc.exe in Windows 10. After booting from CD image,
+partitioning a fresh disk image often hangs Qemu. When using a
+pre-partitioned disk image, the OS installation process halts
+somewhere during the process. The issues can be resolved by setting
+qemu-system-ppc.exe to run in Windows 7 compatibility mode.
+AFAIK all Qemu builds for Windows since Mac OS 9 became available as
+guest are affected.
+The issue is reproducible by installing Qemu for Windows from Stephan
+Weil on Windows 10 and boot/install Mac OS 9.x
+
+Best regards and thanks for looking into this,
+Howard
+I assume there was some kind of behavior change for some of the
+Windows API between Windows 7 and Windows 10, that is my guess as to
+why the compatibility mode works. Could you run 'make check' on your
+system, once in Windows 7 and once in Windows 10. Maybe the tests
+will tell us something. I'm hoping that one of the tests succeeds in
+Windows 7 and fails in Windows 10. That would help us pinpoint what
+the problem is.
+What I mean by run in Windows 7 is set the mingw environment to run
+in Windows 7 compatibility mode (if possible). If you have Windows 7
+on another partition you could boot from, that would be better.
+Good luck.
+p.s. use 'make check -k' to allow all the tests to run (even if one
+or more of the tests fails).
+
+>
+> Hi all,
+>
+>
+>
+> I've been experiencing issues when installing Mac OS 9.x using
+>
+> qemu-system-ppc.exe in Windows 10. After booting from CD image,
+>
+> partitioning a fresh disk image often hangs Qemu. When using a
+>
+> pre-partitioned disk image, the OS installation process halts
+>
+> somewhere during the process. The issues can be resolved by setting
+>
+> qemu-system-ppc.exe to run in Windows 7 compatibility mode.
+>
+> AFAIK all Qemu builds for Windows since Mac OS 9 became available as
+>
+> guest are affected.
+>
+> The issue is reproducible by installing Qemu for Windows from Stephan
+>
+> Weil on Windows 10 and boot/install Mac OS 9.x
+>
+>
+>
+> Best regards and thanks for looking into this,
+>
+> Howard
+>
+>
+>
+I assume there was some kind of behavior change for some of the Windows API
+>
+between Windows 7 and Windows 10, that is my guess as to why the
+>
+compatibility mode works. Could you run 'make check' on your system, once in
+>
+Windows 7 and once in Windows 10. Maybe the tests will tell us something.
+>
+I'm hoping that one of the tests succeeds in Windows 7 and fails in Windows
+>
+10. That would help us pinpoint what the problem is.
+>
+>
+What I mean by run in Windows 7 is set the mingw environment to run in
+>
+Windows 7 compatibility mode (if possible). If you have Windows 7 on another
+>
+partition you could boot from, that would be better.
+>
+>
+Good luck.
+>
+>
+p.s. use 'make check -k' to allow all the tests to run (even if one or more
+>
+of the tests fails).
+Hi,
+
+Thank you for you suggestion, but I have no means to run the check you
+suggest. I cross-compile from Linux.
+
+Best regards,
+Howard
+
diff --git a/results/classifier/004/other/88225572 b/results/classifier/004/other/88225572
new file mode 100644
index 00000000..e65837b5
--- /dev/null
+++ b/results/classifier/004/other/88225572
@@ -0,0 +1,2908 @@
+other: 0.987
+assembly: 0.981
+semantic: 0.976
+graphic: 0.974
+instruction: 0.974
+device: 0.970
+boot: 0.969
+vnc: 0.958
+socket: 0.955
+network: 0.950
+mistranslation: 0.942
+KVM: 0.924
+
+[BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
+
+Hi,
+
+I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
+think it's because io completion hits use-after-free when device is
+already gone. Is this a known bug that has been fixed? (I went through
+the git log but didn't find anything obvious).
+
+gdb backtrace is:
+
+Core was generated by `/usr/local/libexec/qemu-kvm -name 
+sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+Program terminated with signal 11, Segmentation fault.
+#0 object_get_class (obj=obj@entry=0x0) at 
+/usr/src/debug/qemu-4.0/qom/object.c:903
+903        return obj->class;
+(gdb) bt
+#0  object_get_class (obj=obj@entry=0x0) at 
+/usr/src/debug/qemu-4.0/qom/object.c:903
+#1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
+Â  Â  vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+#2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
+Â  Â  opaque=0x558a2f2fd420, ret=0)
+Â  Â  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+#3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
+Â  Â  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+#4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
+Â  Â  i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+#5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+#6 Â 0x00007fff9ed75780 in ?? ()
+#7 Â 0x0000000000000000 in ?? ()
+
+It seems like qemu was completing a discard/write_zero request, but
+parent BusState was already freed & set to NULL.
+
+Do we need to drain all pending request before unrealizing virtio-blk
+device? Like the following patch proposed?
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+If more info is needed, please let me know.
+
+Thanks,
+Eryu
+
+On Tue, 31 Dec 2019 18:34:34 +0800
+Eryu Guan <address@hidden> wrote:
+
+>
+Hi,
+>
+>
+I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
+>
+think it's because io completion hits use-after-free when device is
+>
+already gone. Is this a known bug that has been fixed? (I went through
+>
+the git log but didn't find anything obvious).
+>
+>
+gdb backtrace is:
+>
+>
+Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+>
+Program terminated with signal 11, Segmentation fault.
+>
+#0 object_get_class (obj=obj@entry=0x0) at
+>
+/usr/src/debug/qemu-4.0/qom/object.c:903
+>
+903        return obj->class;
+>
+(gdb) bt
+>
+#0  object_get_class (obj=obj@entry=0x0) at
+>
+/usr/src/debug/qemu-4.0/qom/object.c:903
+>
+#1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
+>
+Â  Â  vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+>
+#2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
+>
+Â  Â  opaque=0x558a2f2fd420, ret=0)
+>
+Â  Â  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+>
+#3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
+>
+Â  Â  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+>
+#4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
+>
+Â  Â  i1=<optimized out>) at
+>
+/usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+>
+#5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+>
+#6 Â 0x00007fff9ed75780 in ?? ()
+>
+#7 Â 0x0000000000000000 in ?? ()
+>
+>
+It seems like qemu was completing a discard/write_zero request, but
+>
+parent BusState was already freed & set to NULL.
+>
+>
+Do we need to drain all pending request before unrealizing virtio-blk
+>
+device? Like the following patch proposed?
+>
+>
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+>
+>
+If more info is needed, please let me know.
+may be this will help:
+https://patchwork.kernel.org/patch/11213047/
+>
+>
+Thanks,
+>
+Eryu
+>
+
+On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
+>
+On Tue, 31 Dec 2019 18:34:34 +0800
+>
+Eryu Guan <address@hidden> wrote:
+>
+>
+> Hi,
+>
+>
+>
+> I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
+>
+> think it's because io completion hits use-after-free when device is
+>
+> already gone. Is this a known bug that has been fixed? (I went through
+>
+> the git log but didn't find anything obvious).
+>
+>
+>
+> gdb backtrace is:
+>
+>
+>
+> Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+>
+> Program terminated with signal 11, Segmentation fault.
+>
+> #0 object_get_class (obj=obj@entry=0x0) at
+>
+> /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> 903        return obj->class;
+>
+> (gdb) bt
+>
+> #0  object_get_class (obj=obj@entry=0x0) at
+>
+> /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> #1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
+>
+> Â  Â  vector=<optimized out>) at
+>
+> /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+>
+> #2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
+>
+> Â  Â  opaque=0x558a2f2fd420, ret=0)
+>
+> Â  Â  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+>
+> #3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
+>
+> Â  Â  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+>
+> #4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
+>
+> Â  Â  i1=<optimized out>) at
+>
+> /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+>
+> #5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+>
+> #6 Â 0x00007fff9ed75780 in ?? ()
+>
+> #7 Â 0x0000000000000000 in ?? ()
+>
+>
+>
+> It seems like qemu was completing a discard/write_zero request, but
+>
+> parent BusState was already freed & set to NULL.
+>
+>
+>
+> Do we need to drain all pending request before unrealizing virtio-blk
+>
+> device? Like the following patch proposed?
+>
+>
+>
+>
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+>
+>
+>
+> If more info is needed, please let me know.
+>
+>
+may be this will help:
+https://patchwork.kernel.org/patch/11213047/
+Yeah, this looks promising! I'll try it out (though it's a one-time
+crash for me). Thanks!
+
+Eryu
+
+On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
+>
+On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
+>
+> On Tue, 31 Dec 2019 18:34:34 +0800
+>
+> Eryu Guan <address@hidden> wrote:
+>
+>
+>
+> > Hi,
+>
+> >
+>
+> > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
+>
+> > think it's because io completion hits use-after-free when device is
+>
+> > already gone. Is this a known bug that has been fixed? (I went through
+>
+> > the git log but didn't find anything obvious).
+>
+> >
+>
+> > gdb backtrace is:
+>
+> >
+>
+> > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+>
+> > Program terminated with signal 11, Segmentation fault.
+>
+> > #0 object_get_class (obj=obj@entry=0x0) at
+>
+> > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > 903        return obj->class;
+>
+> > (gdb) bt
+>
+> > #0  object_get_class (obj=obj@entry=0x0) at
+>
+> > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > #1 Â 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
+>
+> > Â  Â  vector=<optimized out>) at
+>
+> > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+>
+> > #2 Â 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
+>
+> > Â  Â  opaque=0x558a2f2fd420, ret=0)
+>
+> > Â  Â  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+>
+> > #3 Â 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
+>
+> > Â  Â  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+>
+> > #4 Â 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
+>
+> > Â  Â  i1=<optimized out>) at
+>
+> > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+>
+> > #5 Â 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+>
+> > #6 Â 0x00007fff9ed75780 in ?? ()
+>
+> > #7 Â 0x0000000000000000 in ?? ()
+>
+> >
+>
+> > It seems like qemu was completing a discard/write_zero request, but
+>
+> > parent BusState was already freed & set to NULL.
+>
+> >
+>
+> > Do we need to drain all pending request before unrealizing virtio-blk
+>
+> > device? Like the following patch proposed?
+>
+> >
+>
+> >
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+>
+> >
+>
+> > If more info is needed, please let me know.
+>
+>
+>
+> may be this will help:
+https://patchwork.kernel.org/patch/11213047/
+>
+>
+Yeah, this looks promising! I'll try it out (though it's a one-time
+>
+crash for me). Thanks!
+After applying this patch, I don't see the original segfaut and
+backtrace, but I see this crash
+
+[Thread debugging using libthread_db enabled]
+Using host libthread_db library "/lib64/libthread_db.so.1".
+Core was generated by `/usr/local/libexec/qemu-kvm -name 
+sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
+Program terminated with signal 11, Segmentation fault.
+#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, 
+addr=0, val=<optimized out>, size=<optimized out>) at 
+/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+1324        VirtIOPCIProxy *proxy = 
+VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
+Missing separate debuginfos, use: debuginfo-install 
+glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 
+libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 
+libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 
+pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
+(gdb) bt
+#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, 
+addr=0, val=<optimized out>, size=<optimized out>) at 
+/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+#1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, 
+addr=<optimized out>, value=<optimized out>, size=<optimized out>, 
+shift=<optimized out>, mask=<optimized out>, attrs=...) at 
+/usr/src/debug/qemu-4.0/memory.c:502
+#2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, 
+value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized 
+out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 
+<memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
+    at /usr/src/debug/qemu-4.0/memory.c:568
+#3  0x0000561216837c66 in memory_region_dispatch_write 
+(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, 
+attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
+#4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, 
+addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 
+0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, 
+l=<optimized out>, mr=0x56121846d340)
+    at /usr/src/debug/qemu-4.0/exec.c:3279
+#5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, 
+attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at 
+/usr/src/debug/qemu-4.0/exec.c:3318
+#6  0x00005612167e4a1b in address_space_write (as=<optimized out>, 
+addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at 
+/usr/src/debug/qemu-4.0/exec.c:3408
+#7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized 
+out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 
+0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) 
+at /usr/src/debug/qemu-4.0/exec.c:3419
+#8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at 
+/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
+#9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) 
+at /usr/src/debug/qemu-4.0/cpus.c:1281
+#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at 
+/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
+#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
+#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
+
+And I searched and found
+https://bugzilla.redhat.com/show_bug.cgi?id=1706759
+, which has the same
+backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
+blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
+bug.
+
+But I can still hit the bug even after applying the commit. Do I miss
+anything?
+
+Thanks,
+Eryu
+>
+Eryu
+
+On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
+>
+>
+On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
+>
+> On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
+>
+> > On Tue, 31 Dec 2019 18:34:34 +0800
+>
+> > Eryu Guan <address@hidden> wrote:
+>
+> >
+>
+> > > Hi,
+>
+> > >
+>
+> > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
+>
+> > > think it's because io completion hits use-after-free when device is
+>
+> > > already gone. Is this a known bug that has been fixed? (I went through
+>
+> > > the git log but didn't find anything obvious).
+>
+> > >
+>
+> > > gdb backtrace is:
+>
+> > >
+>
+> > > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+>
+> > > Program terminated with signal 11, Segmentation fault.
+>
+> > > #0 object_get_class (obj=obj@entry=0x0) at
+>
+> > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > 903        return obj->class;
+>
+> > > (gdb) bt
+>
+> > > #0  object_get_class (obj=obj@entry=0x0) at
+>
+> > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
+>
+> > >     vector=<optimized out>) at
+>
+> > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+>
+> > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
+>
+> > >     opaque=0x558a2f2fd420, ret=0)
+>
+> > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+>
+> > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
+>
+> > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+>
+> > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
+>
+> > >     i1=<optimized out>) at
+>
+> > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+>
+> > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+>
+> > > #6  0x00007fff9ed75780 in ?? ()
+>
+> > > #7  0x0000000000000000 in ?? ()
+>
+> > >
+>
+> > > It seems like qemu was completing a discard/write_zero request, but
+>
+> > > parent BusState was already freed & set to NULL.
+>
+> > >
+>
+> > > Do we need to drain all pending request before unrealizing virtio-blk
+>
+> > > device? Like the following patch proposed?
+>
+> > >
+>
+> > >
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+>
+> > >
+>
+> > > If more info is needed, please let me know.
+>
+> >
+>
+> > may be this will help:
+https://patchwork.kernel.org/patch/11213047/
+>
+>
+>
+> Yeah, this looks promising! I'll try it out (though it's a one-time
+>
+> crash for me). Thanks!
+>
+>
+After applying this patch, I don't see the original segfaut and
+>
+backtrace, but I see this crash
+>
+>
+[Thread debugging using libthread_db enabled]
+>
+Using host libthread_db library "/lib64/libthread_db.so.1".
+>
+Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
+>
+Program terminated with signal 11, Segmentation fault.
+>
+#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
+>
+addr=0, val=<optimized out>, size=<optimized out>) at
+>
+/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+1324        VirtIOPCIProxy *proxy =
+>
+VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
+>
+Missing separate debuginfos, use: debuginfo-install
+>
+glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
+>
+libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
+>
+libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
+>
+pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
+>
+(gdb) bt
+>
+#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
+>
+addr=0, val=<optimized out>, size=<optimized out>) at
+>
+/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+#1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>,
+>
+addr=<optimized out>, value=<optimized out>, size=<optimized out>,
+>
+shift=<optimized out>, mask=<optimized out>, attrs=...) at
+>
+/usr/src/debug/qemu-4.0/memory.c:502
+>
+#2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
+>
+value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
+>
+access_size_min=<optimized out>, access_size_max=<optimized out>,
+>
+access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340,
+>
+attrs=...)
+>
+at /usr/src/debug/qemu-4.0/memory.c:568
+>
+#3  0x0000561216837c66 in memory_region_dispatch_write
+>
+(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
+>
+attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
+>
+#4  0x00005612167e036f in flatview_write_continue
+>
+(fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
+>
+buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
+>
+at /usr/src/debug/qemu-4.0/exec.c:3279
+>
+#5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
+>
+addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out
+>
+of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
+>
+#6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
+>
+addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at
+>
+/usr/src/debug/qemu-4.0/exec.c:3408
+>
+#7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
+>
+addr=<optimized out>, attrs=..., attrs@entry=...,
+>
+buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+len=<optimized out>, is_write=<optimized out>) at
+>
+/usr/src/debug/qemu-4.0/exec.c:3419
+>
+#8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
+>
+/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
+>
+#9  0x000056121682255e in qemu_kvm_cpu_thread_fn
+>
+(arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
+>
+#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
+>
+/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
+>
+#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
+>
+#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
+>
+>
+And I searched and found
+>
+https://bugzilla.redhat.com/show_bug.cgi?id=1706759
+, which has the same
+>
+backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
+>
+blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
+>
+bug.
+>
+>
+But I can still hit the bug even after applying the commit. Do I miss
+>
+anything?
+Hi Eryu,
+This backtrace seems to be caused by this bug (there were two bugs in
+1706759):
+https://bugzilla.redhat.com/show_bug.cgi?id=1708480
+Although the solution hasn't been tested on virtio-blk yet, you may
+want to apply this patch:
+https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
+Let me know if this works.
+
+Best regards, Julia Suvorova.
+
+On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
+>
+On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
+>
+>
+>
+> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
+>
+> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
+>
+> > > On Tue, 31 Dec 2019 18:34:34 +0800
+>
+> > > Eryu Guan <address@hidden> wrote:
+>
+> > >
+>
+> > > > Hi,
+>
+> > > >
+>
+> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
+>
+> > > > think it's because io completion hits use-after-free when device is
+>
+> > > > already gone. Is this a known bug that has been fixed? (I went through
+>
+> > > > the git log but didn't find anything obvious).
+>
+> > > >
+>
+> > > > gdb backtrace is:
+>
+> > > >
+>
+> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+>
+> > > > Program terminated with signal 11, Segmentation fault.
+>
+> > > > #0 object_get_class (obj=obj@entry=0x0) at
+>
+> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > 903        return obj->class;
+>
+> > > > (gdb) bt
+>
+> > > > #0  object_get_class (obj=obj@entry=0x0) at
+>
+> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
+>
+> > > >     vector=<optimized out>) at
+>
+> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+>
+> > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
+>
+> > > >     opaque=0x558a2f2fd420, ret=0)
+>
+> > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+>
+> > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
+>
+> > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+>
+> > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
+>
+> > > >     i1=<optimized out>) at
+>
+> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+>
+> > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+>
+> > > > #6  0x00007fff9ed75780 in ?? ()
+>
+> > > > #7  0x0000000000000000 in ?? ()
+>
+> > > >
+>
+> > > > It seems like qemu was completing a discard/write_zero request, but
+>
+> > > > parent BusState was already freed & set to NULL.
+>
+> > > >
+>
+> > > > Do we need to drain all pending request before unrealizing virtio-blk
+>
+> > > > device? Like the following patch proposed?
+>
+> > > >
+>
+> > > >
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+>
+> > > >
+>
+> > > > If more info is needed, please let me know.
+>
+> > >
+>
+> > > may be this will help:
+https://patchwork.kernel.org/patch/11213047/
+>
+> >
+>
+> > Yeah, this looks promising! I'll try it out (though it's a one-time
+>
+> > crash for me). Thanks!
+>
+>
+>
+> After applying this patch, I don't see the original segfaut and
+>
+> backtrace, but I see this crash
+>
+>
+>
+> [Thread debugging using libthread_db enabled]
+>
+> Using host libthread_db library "/lib64/libthread_db.so.1".
+>
+> Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
+>
+> Program terminated with signal 11, Segmentation fault.
+>
+> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
+>
+> addr=0, val=<optimized out>, size=<optimized out>) at
+>
+> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> 1324        VirtIOPCIProxy *proxy =
+>
+> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
+>
+> Missing separate debuginfos, use: debuginfo-install
+>
+> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
+>
+> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
+>
+> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
+>
+> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
+>
+> (gdb) bt
+>
+> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
+>
+> addr=0, val=<optimized out>, size=<optimized out>) at
+>
+> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>,
+>
+> addr=<optimized out>, value=<optimized out>, size=<optimized out>,
+>
+> shift=<optimized out>, mask=<optimized out>, attrs=...) at
+>
+> /usr/src/debug/qemu-4.0/memory.c:502
+>
+> #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
+>
+> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
+>
+> access_size_min=<optimized out>, access_size_max=<optimized out>,
+>
+> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340,
+>
+> attrs=...)
+>
+>     at /usr/src/debug/qemu-4.0/memory.c:568
+>
+> #3  0x0000561216837c66 in memory_region_dispatch_write
+>
+> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
+>
+> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
+>
+> #4  0x00005612167e036f in flatview_write_continue
+>
+> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
+>
+> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+> len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
+>
+> mr=0x56121846d340)
+>
+>     at /usr/src/debug/qemu-4.0/exec.c:3279
+>
+> #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
+>
+> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028
+>
+> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
+>
+> #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
+>
+> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>)
+>
+> at /usr/src/debug/qemu-4.0/exec.c:3408
+>
+> #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
+>
+> addr=<optimized out>, attrs=..., attrs@entry=...,
+>
+> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+> len=<optimized out>, is_write=<optimized out>) at
+>
+> /usr/src/debug/qemu-4.0/exec.c:3419
+>
+> #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
+>
+> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
+>
+> #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
+>
+> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
+>
+> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
+>
+> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
+>
+> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
+>
+> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
+>
+>
+>
+> And I searched and found
+>
+>
+https://bugzilla.redhat.com/show_bug.cgi?id=1706759
+, which has the same
+>
+> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
+>
+> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
+>
+> bug.
+>
+>
+>
+> But I can still hit the bug even after applying the commit. Do I miss
+>
+> anything?
+>
+>
+Hi Eryu,
+>
+This backtrace seems to be caused by this bug (there were two bugs in
+>
+1706759):
+https://bugzilla.redhat.com/show_bug.cgi?id=1708480
+>
+Although the solution hasn't been tested on virtio-blk yet, you may
+>
+want to apply this patch:
+>
+https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
+>
+Let me know if this works.
+Will try it out, thanks a lot!
+
+Eryu
+
+On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
+>
+On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
+>
+>
+>
+> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
+>
+> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
+>
+> > > On Tue, 31 Dec 2019 18:34:34 +0800
+>
+> > > Eryu Guan <address@hidden> wrote:
+>
+> > >
+>
+> > > > Hi,
+>
+> > > >
+>
+> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
+>
+> > > > think it's because io completion hits use-after-free when device is
+>
+> > > > already gone. Is this a known bug that has been fixed? (I went through
+>
+> > > > the git log but didn't find anything obvious).
+>
+> > > >
+>
+> > > > gdb backtrace is:
+>
+> > > >
+>
+> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+>
+> > > > Program terminated with signal 11, Segmentation fault.
+>
+> > > > #0 object_get_class (obj=obj@entry=0x0) at
+>
+> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > 903        return obj->class;
+>
+> > > > (gdb) bt
+>
+> > > > #0  object_get_class (obj=obj@entry=0x0) at
+>
+> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
+>
+> > > >     vector=<optimized out>) at
+>
+> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+>
+> > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
+>
+> > > >     opaque=0x558a2f2fd420, ret=0)
+>
+> > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+>
+> > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
+>
+> > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+>
+> > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
+>
+> > > >     i1=<optimized out>) at
+>
+> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+>
+> > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+>
+> > > > #6  0x00007fff9ed75780 in ?? ()
+>
+> > > > #7  0x0000000000000000 in ?? ()
+>
+> > > >
+>
+> > > > It seems like qemu was completing a discard/write_zero request, but
+>
+> > > > parent BusState was already freed & set to NULL.
+>
+> > > >
+>
+> > > > Do we need to drain all pending request before unrealizing virtio-blk
+>
+> > > > device? Like the following patch proposed?
+>
+> > > >
+>
+> > > >
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+>
+> > > >
+>
+> > > > If more info is needed, please let me know.
+>
+> > >
+>
+> > > may be this will help:
+https://patchwork.kernel.org/patch/11213047/
+>
+> >
+>
+> > Yeah, this looks promising! I'll try it out (though it's a one-time
+>
+> > crash for me). Thanks!
+>
+>
+>
+> After applying this patch, I don't see the original segfaut and
+>
+> backtrace, but I see this crash
+>
+>
+>
+> [Thread debugging using libthread_db enabled]
+>
+> Using host libthread_db library "/lib64/libthread_db.so.1".
+>
+> Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
+>
+> Program terminated with signal 11, Segmentation fault.
+>
+> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
+>
+> addr=0, val=<optimized out>, size=<optimized out>) at
+>
+> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> 1324        VirtIOPCIProxy *proxy =
+>
+> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
+>
+> Missing separate debuginfos, use: debuginfo-install
+>
+> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
+>
+> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
+>
+> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
+>
+> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
+>
+> (gdb) bt
+>
+> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
+>
+> addr=0, val=<optimized out>, size=<optimized out>) at
+>
+> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>,
+>
+> addr=<optimized out>, value=<optimized out>, size=<optimized out>,
+>
+> shift=<optimized out>, mask=<optimized out>, attrs=...) at
+>
+> /usr/src/debug/qemu-4.0/memory.c:502
+>
+> #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
+>
+> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
+>
+> access_size_min=<optimized out>, access_size_max=<optimized out>,
+>
+> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340,
+>
+> attrs=...)
+>
+>     at /usr/src/debug/qemu-4.0/memory.c:568
+>
+> #3  0x0000561216837c66 in memory_region_dispatch_write
+>
+> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
+>
+> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
+>
+> #4  0x00005612167e036f in flatview_write_continue
+>
+> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
+>
+> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+> len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
+>
+> mr=0x56121846d340)
+>
+>     at /usr/src/debug/qemu-4.0/exec.c:3279
+>
+> #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
+>
+> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028
+>
+> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
+>
+> #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
+>
+> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>)
+>
+> at /usr/src/debug/qemu-4.0/exec.c:3408
+>
+> #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
+>
+> addr=<optimized out>, attrs=..., attrs@entry=...,
+>
+> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+> len=<optimized out>, is_write=<optimized out>) at
+>
+> /usr/src/debug/qemu-4.0/exec.c:3419
+>
+> #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
+>
+> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
+>
+> #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
+>
+> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
+>
+> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
+>
+> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
+>
+> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
+>
+> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
+>
+>
+>
+> And I searched and found
+>
+>
+https://bugzilla.redhat.com/show_bug.cgi?id=1706759
+, which has the same
+>
+> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
+>
+> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
+>
+> bug.
+>
+>
+>
+> But I can still hit the bug even after applying the commit. Do I miss
+>
+> anything?
+>
+>
+Hi Eryu,
+>
+This backtrace seems to be caused by this bug (there were two bugs in
+>
+1706759):
+https://bugzilla.redhat.com/show_bug.cgi?id=1708480
+>
+Although the solution hasn't been tested on virtio-blk yet, you may
+>
+want to apply this patch:
+>
+https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
+>
+Let me know if this works.
+Unfortunately, I still see the same segfault & backtrace after applying
+commit 421afd2fe8dd ("virtio: reset region cache when on queue
+deletion")
+
+Anything I can help to debug?
+
+Thanks,
+Eryu
+
+On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
+>
+On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
+>
+> On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
+>
+> >
+>
+> > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
+>
+> > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
+>
+> > > > On Tue, 31 Dec 2019 18:34:34 +0800
+>
+> > > > Eryu Guan <address@hidden> wrote:
+>
+> > > >
+>
+> > > > > Hi,
+>
+> > > > >
+>
+> > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox,
+>
+> > > > > I
+>
+> > > > > think it's because io completion hits use-after-free when device is
+>
+> > > > > already gone. Is this a known bug that has been fixed? (I went
+>
+> > > > > through
+>
+> > > > > the git log but didn't find anything obvious).
+>
+> > > > >
+>
+> > > > > gdb backtrace is:
+>
+> > > > >
+>
+> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+>
+> > > > > Program terminated with signal 11, Segmentation fault.
+>
+> > > > > #0 object_get_class (obj=obj@entry=0x0) at
+>
+> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > > 903        return obj->class;
+>
+> > > > > (gdb) bt
+>
+> > > > > #0  object_get_class (obj=obj@entry=0x0) at
+>
+> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
+>
+> > > > >     vector=<optimized out>) at
+>
+> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+>
+> > > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
+>
+> > > > >     opaque=0x558a2f2fd420, ret=0)
+>
+> > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+>
+> > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
+>
+> > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+>
+> > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
+>
+> > > > >     i1=<optimized out>) at
+>
+> > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+>
+> > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+>
+> > > > > #6  0x00007fff9ed75780 in ?? ()
+>
+> > > > > #7  0x0000000000000000 in ?? ()
+>
+> > > > >
+>
+> > > > > It seems like qemu was completing a discard/write_zero request, but
+>
+> > > > > parent BusState was already freed & set to NULL.
+>
+> > > > >
+>
+> > > > > Do we need to drain all pending request before unrealizing
+>
+> > > > > virtio-blk
+>
+> > > > > device? Like the following patch proposed?
+>
+> > > > >
+>
+> > > > >
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+>
+> > > > >
+>
+> > > > > If more info is needed, please let me know.
+>
+> > > >
+>
+> > > > may be this will help:
+https://patchwork.kernel.org/patch/11213047/
+>
+> > >
+>
+> > > Yeah, this looks promising! I'll try it out (though it's a one-time
+>
+> > > crash for me). Thanks!
+>
+> >
+>
+> > After applying this patch, I don't see the original segfaut and
+>
+> > backtrace, but I see this crash
+>
+> >
+>
+> > [Thread debugging using libthread_db enabled]
+>
+> > Using host libthread_db library "/lib64/libthread_db.so.1".
+>
+> > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
+>
+> > Program terminated with signal 11, Segmentation fault.
+>
+> > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
+>
+> > addr=0, val=<optimized out>, size=<optimized out>) at
+>
+> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> > 1324        VirtIOPCIProxy *proxy =
+>
+> > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
+>
+> > Missing separate debuginfos, use: debuginfo-install
+>
+> > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
+>
+> > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
+>
+> > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
+>
+> > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
+>
+> > (gdb) bt
+>
+> > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
+>
+> > addr=0, val=<optimized out>, size=<optimized out>) at
+>
+> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized
+>
+> > out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>,
+>
+> > shift=<optimized out>, mask=<optimized out>, attrs=...) at
+>
+> > /usr/src/debug/qemu-4.0/memory.c:502
+>
+> > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
+>
+> > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
+>
+> > access_size_min=<optimized out>, access_size_max=<optimized out>,
+>
+> > access_fn=0x561216835ac0 <memory_region_write_accessor>,
+>
+> > mr=0x56121846d340, attrs=...)
+>
+> >     at /usr/src/debug/qemu-4.0/memory.c:568
+>
+> > #3  0x0000561216837c66 in memory_region_dispatch_write
+>
+> > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
+>
+> > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
+>
+> > #4  0x00005612167e036f in flatview_write_continue
+>
+> > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
+>
+> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+> > len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
+>
+> > mr=0x56121846d340)
+>
+> >     at /usr/src/debug/qemu-4.0/exec.c:3279
+>
+> > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
+>
+> > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028
+>
+> > out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
+>
+> > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
+>
+> > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
+>
+> > out>) at /usr/src/debug/qemu-4.0/exec.c:3408
+>
+> > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
+>
+> > addr=<optimized out>, attrs=..., attrs@entry=...,
+>
+> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+> > len=<optimized out>, is_write=<optimized out>) at
+>
+> > /usr/src/debug/qemu-4.0/exec.c:3419
+>
+> > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
+>
+> > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
+>
+> > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
+>
+> > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
+>
+> > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
+>
+> > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
+>
+> > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
+>
+> > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
+>
+> >
+>
+> > And I searched and found
+>
+> >
+https://bugzilla.redhat.com/show_bug.cgi?id=1706759
+, which has the same
+>
+> > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
+>
+> > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
+>
+> > bug.
+>
+> >
+>
+> > But I can still hit the bug even after applying the commit. Do I miss
+>
+> > anything?
+>
+>
+>
+> Hi Eryu,
+>
+> This backtrace seems to be caused by this bug (there were two bugs in
+>
+> 1706759):
+https://bugzilla.redhat.com/show_bug.cgi?id=1708480
+>
+> Although the solution hasn't been tested on virtio-blk yet, you may
+>
+> want to apply this patch:
+>
+>
+https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
+>
+> Let me know if this works.
+>
+>
+Unfortunately, I still see the same segfault & backtrace after applying
+>
+commit 421afd2fe8dd ("virtio: reset region cache when on queue
+>
+deletion")
+>
+>
+Anything I can help to debug?
+Please post the QEMU command-line and the QMP commands use to remove the
+device.
+
+The backtrace shows a vcpu thread submitting a request.  The device
+seems to be partially destroyed.  That's surprising because the monitor
+and the vcpu thread should use the QEMU global mutex to avoid race
+conditions.  Maybe seeing the QMP commands will make it clearer...
+
+Stefan
+signature.asc
+Description:
+PGP signature
+
+On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
+>
+On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
+>
+> On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
+>
+> > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
+>
+> > >
+>
+> > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
+>
+> > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
+>
+> > > > > On Tue, 31 Dec 2019 18:34:34 +0800
+>
+> > > > > Eryu Guan <address@hidden> wrote:
+>
+> > > > >
+>
+> > > > > > Hi,
+>
+> > > > > >
+>
+> > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata
+>
+> > > > > > sandbox, I
+>
+> > > > > > think it's because io completion hits use-after-free when device
+>
+> > > > > > is
+>
+> > > > > > already gone. Is this a known bug that has been fixed? (I went
+>
+> > > > > > through
+>
+> > > > > > the git log but didn't find anything obvious).
+>
+> > > > > >
+>
+> > > > > > gdb backtrace is:
+>
+> > > > > >
+>
+> > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+>
+> > > > > > Program terminated with signal 11, Segmentation fault.
+>
+> > > > > > #0 object_get_class (obj=obj@entry=0x0) at
+>
+> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > > > 903        return obj->class;
+>
+> > > > > > (gdb) bt
+>
+> > > > > > #0  object_get_class (obj=obj@entry=0x0) at
+>
+> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector
+>
+> > > > > > (vdev=0x558a2e7751d0,
+>
+> > > > > >     vector=<optimized out>) at
+>
+> > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+>
+> > > > > > #2  0x0000558a2bfdcb1e in
+>
+> > > > > > virtio_blk_discard_write_zeroes_complete (
+>
+> > > > > >     opaque=0x558a2f2fd420, ret=0)
+>
+> > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+>
+> > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
+>
+> > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+>
+> > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized
+>
+> > > > > > out>,
+>
+> > > > > >     i1=<optimized out>) at
+>
+> > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+>
+> > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+>
+> > > > > > #6  0x00007fff9ed75780 in ?? ()
+>
+> > > > > > #7  0x0000000000000000 in ?? ()
+>
+> > > > > >
+>
+> > > > > > It seems like qemu was completing a discard/write_zero request,
+>
+> > > > > > but
+>
+> > > > > > parent BusState was already freed & set to NULL.
+>
+> > > > > >
+>
+> > > > > > Do we need to drain all pending request before unrealizing
+>
+> > > > > > virtio-blk
+>
+> > > > > > device? Like the following patch proposed?
+>
+> > > > > >
+>
+> > > > > >
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+>
+> > > > > >
+>
+> > > > > > If more info is needed, please let me know.
+>
+> > > > >
+>
+> > > > > may be this will help:
+https://patchwork.kernel.org/patch/11213047/
+>
+> > > >
+>
+> > > > Yeah, this looks promising! I'll try it out (though it's a one-time
+>
+> > > > crash for me). Thanks!
+>
+> > >
+>
+> > > After applying this patch, I don't see the original segfaut and
+>
+> > > backtrace, but I see this crash
+>
+> > >
+>
+> > > [Thread debugging using libthread_db enabled]
+>
+> > > Using host libthread_db library "/lib64/libthread_db.so.1".
+>
+> > > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
+>
+> > > Program terminated with signal 11, Segmentation fault.
+>
+> > > #0  0x0000561216a57609 in virtio_pci_notify_write
+>
+> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
+>
+> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> > > 1324        VirtIOPCIProxy *proxy =
+>
+> > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
+>
+> > > Missing separate debuginfos, use: debuginfo-install
+>
+> > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
+>
+> > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
+>
+> > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
+>
+> > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
+>
+> > > (gdb) bt
+>
+> > > #0  0x0000561216a57609 in virtio_pci_notify_write
+>
+> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
+>
+> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized
+>
+> > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized
+>
+> > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at
+>
+> > > /usr/src/debug/qemu-4.0/memory.c:502
+>
+> > > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
+>
+> > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
+>
+> > > access_size_min=<optimized out>, access_size_max=<optimized out>,
+>
+> > > access_fn=0x561216835ac0 <memory_region_write_accessor>,
+>
+> > > mr=0x56121846d340, attrs=...)
+>
+> > >     at /usr/src/debug/qemu-4.0/memory.c:568
+>
+> > > #3  0x0000561216837c66 in memory_region_dispatch_write
+>
+> > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
+>
+> > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
+>
+> > > #4  0x00005612167e036f in flatview_write_continue
+>
+> > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
+>
+> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+> > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
+>
+> > > mr=0x56121846d340)
+>
+> > >     at /usr/src/debug/qemu-4.0/exec.c:3279
+>
+> > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
+>
+> > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address
+>
+> > > 0x7fce7dd97028 out of bounds>, len=2) at
+>
+> > > /usr/src/debug/qemu-4.0/exec.c:3318
+>
+> > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
+>
+> > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
+>
+> > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408
+>
+> > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
+>
+> > > addr=<optimized out>, attrs=..., attrs@entry=...,
+>
+> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+> > > len=<optimized out>, is_write=<optimized out>) at
+>
+> > > /usr/src/debug/qemu-4.0/exec.c:3419
+>
+> > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00)
+>
+> > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
+>
+> > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
+>
+> > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
+>
+> > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
+>
+> > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
+>
+> > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
+>
+> > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
+>
+> > >
+>
+> > > And I searched and found
+>
+> > >
+https://bugzilla.redhat.com/show_bug.cgi?id=1706759
+, which has the same
+>
+> > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
+>
+> > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
+>
+> > > bug.
+>
+> > >
+>
+> > > But I can still hit the bug even after applying the commit. Do I miss
+>
+> > > anything?
+>
+> >
+>
+> > Hi Eryu,
+>
+> > This backtrace seems to be caused by this bug (there were two bugs in
+>
+> > 1706759):
+https://bugzilla.redhat.com/show_bug.cgi?id=1708480
+>
+> > Although the solution hasn't been tested on virtio-blk yet, you may
+>
+> > want to apply this patch:
+>
+> >
+https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
+>
+> > Let me know if this works.
+>
+>
+>
+> Unfortunately, I still see the same segfault & backtrace after applying
+>
+> commit 421afd2fe8dd ("virtio: reset region cache when on queue
+>
+> deletion")
+>
+>
+>
+> Anything I can help to debug?
+>
+>
+Please post the QEMU command-line and the QMP commands use to remove the
+>
+device.
+It's a normal kata instance using virtio-fs as rootfs.
+
+/usr/local/libexec/qemu-kvm -name 
+sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
+ -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine 
+q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
+ -cpu host -qmp 
+unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
+ \
+ -qmp 
+unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
+ \
+ -m 2048M,slots=10,maxmem=773893M -device 
+pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
+ -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device 
+virtconsole,chardev=charconsole0,id=console0 \
+ -chardev 
+socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait
+ \
+ -device 
+virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \
+ -chardev 
+socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait
+ \
+ -device nvdimm,id=nv0,memdev=mem0 -object 
+memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456
+ \
+ -object rng-random,id=rng0,filename=/dev/urandom -device 
+virtio-rng,rng=rng0,romfile= \
+ -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
+ -chardev 
+socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait
+ \
+ -chardev 
+socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock
+ \
+ -device 
+vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M 
+-netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
+ -device 
+driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile=
+ \
+ -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults 
+-nographic -daemonize \
+ -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on 
+-numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
+ -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 
+i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 
+console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 
+root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro 
+rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 
+agent.use_vsock=false init=/usr/lib/systemd/systemd 
+systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service 
+systemd.mask=systemd-networkd.socket \
+ -pidfile 
+/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid 
+\
+ -smp 1,cores=1,threads=1,sockets=96,maxcpus=96
+
+QMP command to delete device (the device id is just an example, not the
+one caused the crash):
+
+"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
+
+which has been hot plugged by:
+"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
+"{\"return\": {}}"
+"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
+"{\"return\": {}}"
+
+>
+>
+The backtrace shows a vcpu thread submitting a request.  The device
+>
+seems to be partially destroyed.  That's surprising because the monitor
+>
+and the vcpu thread should use the QEMU global mutex to avoid race
+>
+conditions.  Maybe seeing the QMP commands will make it clearer...
+>
+>
+Stefan
+Thanks!
+
+Eryu
+
+On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote:
+>
+On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
+>
+> On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
+>
+> > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
+>
+> > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
+>
+> > > >
+>
+> > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
+>
+> > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
+>
+> > > > > > On Tue, 31 Dec 2019 18:34:34 +0800
+>
+> > > > > > Eryu Guan <address@hidden> wrote:
+>
+> > > > > >
+>
+> > > > > > > Hi,
+>
+> > > > > > >
+>
+> > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata
+>
+> > > > > > > sandbox, I
+>
+> > > > > > > think it's because io completion hits use-after-free when
+>
+> > > > > > > device is
+>
+> > > > > > > already gone. Is this a known bug that has been fixed? (I went
+>
+> > > > > > > through
+>
+> > > > > > > the git log but didn't find anything obvious).
+>
+> > > > > > >
+>
+> > > > > > > gdb backtrace is:
+>
+> > > > > > >
+>
+> > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+>
+> > > > > > > Program terminated with signal 11, Segmentation fault.
+>
+> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at
+>
+> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > > > > 903        return obj->class;
+>
+> > > > > > > (gdb) bt
+>
+> > > > > > > #0  object_get_class (obj=obj@entry=0x0) at
+>
+> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector
+>
+> > > > > > > (vdev=0x558a2e7751d0,
+>
+> > > > > > >     vector=<optimized out>) at
+>
+> > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+>
+> > > > > > > #2  0x0000558a2bfdcb1e in
+>
+> > > > > > > virtio_blk_discard_write_zeroes_complete (
+>
+> > > > > > >     opaque=0x558a2f2fd420, ret=0)
+>
+> > > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+>
+> > > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
+>
+> > > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+>
+> > > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized
+>
+> > > > > > > out>,
+>
+> > > > > > >     i1=<optimized out>) at
+>
+> > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+>
+> > > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+>
+> > > > > > > #6  0x00007fff9ed75780 in ?? ()
+>
+> > > > > > > #7  0x0000000000000000 in ?? ()
+>
+> > > > > > >
+>
+> > > > > > > It seems like qemu was completing a discard/write_zero request,
+>
+> > > > > > > but
+>
+> > > > > > > parent BusState was already freed & set to NULL.
+>
+> > > > > > >
+>
+> > > > > > > Do we need to drain all pending request before unrealizing
+>
+> > > > > > > virtio-blk
+>
+> > > > > > > device? Like the following patch proposed?
+>
+> > > > > > >
+>
+> > > > > > >
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+>
+> > > > > > >
+>
+> > > > > > > If more info is needed, please let me know.
+>
+> > > > > >
+>
+> > > > > > may be this will help:
+>
+> > > > > >
+https://patchwork.kernel.org/patch/11213047/
+>
+> > > > >
+>
+> > > > > Yeah, this looks promising! I'll try it out (though it's a one-time
+>
+> > > > > crash for me). Thanks!
+>
+> > > >
+>
+> > > > After applying this patch, I don't see the original segfaut and
+>
+> > > > backtrace, but I see this crash
+>
+> > > >
+>
+> > > > [Thread debugging using libthread_db enabled]
+>
+> > > > Using host libthread_db library "/lib64/libthread_db.so.1".
+>
+> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
+>
+> > > > Program terminated with signal 11, Segmentation fault.
+>
+> > > > #0  0x0000561216a57609 in virtio_pci_notify_write
+>
+> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
+>
+> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> > > > 1324        VirtIOPCIProxy *proxy =
+>
+> > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
+>
+> > > > Missing separate debuginfos, use: debuginfo-install
+>
+> > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
+>
+> > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
+>
+> > > > libstdc++-4.8.5-28.alios7.1.x86_64
+>
+> > > > numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64
+>
+> > > > zlib-1.2.7-16.2.alios7.x86_64
+>
+> > > > (gdb) bt
+>
+> > > > #0  0x0000561216a57609 in virtio_pci_notify_write
+>
+> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
+>
+> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> > > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized
+>
+> > > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized
+>
+> > > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at
+>
+> > > > /usr/src/debug/qemu-4.0/memory.c:502
+>
+> > > > #2  0x0000561216833c5d in access_with_adjusted_size
+>
+> > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8,
+>
+> > > > size=size@entry=2, access_size_min=<optimized out>,
+>
+> > > > access_size_max=<optimized out>, access_fn=0x561216835ac0
+>
+> > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
+>
+> > > >     at /usr/src/debug/qemu-4.0/memory.c:568
+>
+> > > > #3  0x0000561216837c66 in memory_region_dispatch_write
+>
+> > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
+>
+> > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
+>
+> > > > #4  0x00005612167e036f in flatview_write_continue
+>
+> > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
+>
+> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+> > > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
+>
+> > > > mr=0x56121846d340)
+>
+> > > >     at /usr/src/debug/qemu-4.0/exec.c:3279
+>
+> > > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
+>
+> > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address
+>
+> > > > 0x7fce7dd97028 out of bounds>, len=2) at
+>
+> > > > /usr/src/debug/qemu-4.0/exec.c:3318
+>
+> > > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
+>
+> > > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
+>
+> > > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408
+>
+> > > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
+>
+> > > > addr=<optimized out>, attrs=..., attrs@entry=...,
+>
+> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
+>
+> > > > len=<optimized out>, is_write=<optimized out>) at
+>
+> > > > /usr/src/debug/qemu-4.0/exec.c:3419
+>
+> > > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00)
+>
+> > > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
+>
+> > > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
+>
+> > > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
+>
+> > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
+>
+> > > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
+>
+> > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
+>
+> > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
+>
+> > > >
+>
+> > > > And I searched and found
+>
+> > > >
+https://bugzilla.redhat.com/show_bug.cgi?id=1706759
+, which has the
+>
+> > > > same
+>
+> > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
+>
+> > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this
+>
+> > > > particular
+>
+> > > > bug.
+>
+> > > >
+>
+> > > > But I can still hit the bug even after applying the commit. Do I miss
+>
+> > > > anything?
+>
+> > >
+>
+> > > Hi Eryu,
+>
+> > > This backtrace seems to be caused by this bug (there were two bugs in
+>
+> > > 1706759):
+https://bugzilla.redhat.com/show_bug.cgi?id=1708480
+>
+> > > Although the solution hasn't been tested on virtio-blk yet, you may
+>
+> > > want to apply this patch:
+>
+> > >
+>
+> > >
+https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
+>
+> > > Let me know if this works.
+>
+> >
+>
+> > Unfortunately, I still see the same segfault & backtrace after applying
+>
+> > commit 421afd2fe8dd ("virtio: reset region cache when on queue
+>
+> > deletion")
+>
+> >
+>
+> > Anything I can help to debug?
+>
+>
+>
+> Please post the QEMU command-line and the QMP commands use to remove the
+>
+> device.
+>
+>
+It's a normal kata instance using virtio-fs as rootfs.
+>
+>
+/usr/local/libexec/qemu-kvm -name
+>
+sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
+>
+-uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine
+>
+q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
+>
+-cpu host -qmp
+>
+unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
+>
+\
+>
+-qmp
+>
+unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
+>
+\
+>
+-m 2048M,slots=10,maxmem=773893M -device
+>
+pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
+>
+-device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device
+>
+virtconsole,chardev=charconsole0,id=console0 \
+>
+-chardev
+>
+socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait
+>
+\
+>
+-device
+>
+virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \
+>
+-chardev
+>
+socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait
+>
+\
+>
+-device nvdimm,id=nv0,memdev=mem0 -object
+>
+memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456
+>
+\
+>
+-object rng-random,id=rng0,filename=/dev/urandom -device
+>
+virtio-rng,rng=rng0,romfile= \
+>
+-device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
+>
+-chardev
+>
+socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait
+>
+\
+>
+-chardev
+>
+socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock
+>
+\
+>
+-device
+>
+vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M
+>
+-netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
+>
+-device
+>
+driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile=
+>
+\
+>
+-global kvm-pit.lost_tick_policy=discard -vga none -no-user-config
+>
+-nodefaults -nographic -daemonize \
+>
+-object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on
+>
+-numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
+>
+-append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1
+>
+i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k
+>
+console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0
+>
+pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro
+>
+ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96
+>
+agent.use_vsock=false init=/usr/lib/systemd/systemd
+>
+systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service
+>
+systemd.mask=systemd-networkd.socket \
+>
+-pidfile
+>
+/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid
+>
+\
+>
+-smp 1,cores=1,threads=1,sockets=96,maxcpus=96
+>
+>
+QMP command to delete device (the device id is just an example, not the
+>
+one caused the crash):
+>
+>
+"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
+>
+>
+which has been hot plugged by:
+>
+"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
+>
+"{\"return\": {}}"
+>
+"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
+>
+"{\"return\": {}}"
+Thanks.  I wasn't able to reproduce this crash with qemu.git/master.
+
+One thing that is strange about the latest backtrace you posted: QEMU is
+dispatching the memory access instead of using the ioeventfd code that
+that virtio-blk-pci normally takes when a virtqueue is notified.  I
+guess this means ioeventfd has already been disabled due to the hot
+unplug.
+
+Could you try with machine type "i440fx" instead of "q35"?  I wonder if
+pci-bridge/shpc is part of the problem.
+
+Stefan
+signature.asc
+Description:
+PGP signature
+
+On Tue, Jan 14, 2020 at 04:16:24PM +0000, Stefan Hajnoczi wrote:
+>
+On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote:
+>
+> On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
+>
+> > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
+>
+> > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
+>
+> > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
+>
+> > > > >
+>
+> > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
+>
+> > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
+>
+> > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800
+>
+> > > > > > > Eryu Guan <address@hidden> wrote:
+>
+> > > > > > >
+>
+> > > > > > > > Hi,
+>
+> > > > > > > >
+>
+> > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata
+>
+> > > > > > > > sandbox, I
+>
+> > > > > > > > think it's because io completion hits use-after-free when
+>
+> > > > > > > > device is
+>
+> > > > > > > > already gone. Is this a known bug that has been fixed? (I
+>
+> > > > > > > > went through
+>
+> > > > > > > > the git log but didn't find anything obvious).
+>
+> > > > > > > >
+>
+> > > > > > > > gdb backtrace is:
+>
+> > > > > > > >
+>
+> > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
+>
+> > > > > > > > Program terminated with signal 11, Segmentation fault.
+>
+> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at
+>
+> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > > > > > 903        return obj->class;
+>
+> > > > > > > > (gdb) bt
+>
+> > > > > > > > #0  object_get_class (obj=obj@entry=0x0) at
+>
+> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
+>
+> > > > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector
+>
+> > > > > > > > (vdev=0x558a2e7751d0,
+>
+> > > > > > > >     vector=<optimized out>) at
+>
+> > > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
+>
+> > > > > > > > #2  0x0000558a2bfdcb1e in
+>
+> > > > > > > > virtio_blk_discard_write_zeroes_complete (
+>
+> > > > > > > >     opaque=0x558a2f2fd420, ret=0)
+>
+> > > > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
+>
+> > > > > > > > #3  0x0000558a2c261c7e in blk_aio_complete
+>
+> > > > > > > > (acb=0x558a2eed7420)
+>
+> > > > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
+>
+> > > > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized
+>
+> > > > > > > > out>,
+>
+> > > > > > > >     i1=<optimized out>) at
+>
+> > > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
+>
+> > > > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
+>
+> > > > > > > > #6  0x00007fff9ed75780 in ?? ()
+>
+> > > > > > > > #7  0x0000000000000000 in ?? ()
+>
+> > > > > > > >
+>
+> > > > > > > > It seems like qemu was completing a discard/write_zero
+>
+> > > > > > > > request, but
+>
+> > > > > > > > parent BusState was already freed & set to NULL.
+>
+> > > > > > > >
+>
+> > > > > > > > Do we need to drain all pending request before unrealizing
+>
+> > > > > > > > virtio-blk
+>
+> > > > > > > > device? Like the following patch proposed?
+>
+> > > > > > > >
+>
+> > > > > > > >
+https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
+>
+> > > > > > > >
+>
+> > > > > > > > If more info is needed, please let me know.
+>
+> > > > > > >
+>
+> > > > > > > may be this will help:
+>
+> > > > > > >
+https://patchwork.kernel.org/patch/11213047/
+>
+> > > > > >
+>
+> > > > > > Yeah, this looks promising! I'll try it out (though it's a
+>
+> > > > > > one-time
+>
+> > > > > > crash for me). Thanks!
+>
+> > > > >
+>
+> > > > > After applying this patch, I don't see the original segfaut and
+>
+> > > > > backtrace, but I see this crash
+>
+> > > > >
+>
+> > > > > [Thread debugging using libthread_db enabled]
+>
+> > > > > Using host libthread_db library "/lib64/libthread_db.so.1".
+>
+> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
+>
+> > > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
+>
+> > > > > Program terminated with signal 11, Segmentation fault.
+>
+> > > > > #0  0x0000561216a57609 in virtio_pci_notify_write
+>
+> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>,
+>
+> > > > > size=<optimized out>) at
+>
+> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> > > > > 1324        VirtIOPCIProxy *proxy =
+>
+> > > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
+>
+> > > > > Missing separate debuginfos, use: debuginfo-install
+>
+> > > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
+>
+> > > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
+>
+> > > > > libstdc++-4.8.5-28.alios7.1.x86_64
+>
+> > > > > numactl-libs-2.0.9-5.1.alios7.x86_64
+>
+> > > > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
+>
+> > > > > (gdb) bt
+>
+> > > > > #0  0x0000561216a57609 in virtio_pci_notify_write
+>
+> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>,
+>
+> > > > > size=<optimized out>) at
+>
+> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
+>
+> > > > > #1  0x0000561216835b22 in memory_region_write_accessor
+>
+> > > > > (mr=<optimized out>, addr=<optimized out>, value=<optimized out>,
+>
+> > > > > size=<optimized out>, shift=<optimized out>, mask=<optimized out>,
+>
+> > > > > attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
+>
+> > > > > #2  0x0000561216833c5d in access_with_adjusted_size
+>
+> > > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8,
+>
+> > > > > size=size@entry=2, access_size_min=<optimized out>,
+>
+> > > > > access_size_max=<optimized out>, access_fn=0x561216835ac0
+>
+> > > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
+>
+> > > > >     at /usr/src/debug/qemu-4.0/memory.c:568
+>
+> > > > > #3  0x0000561216837c66 in memory_region_dispatch_write
+>
+> > > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
+>
+> > > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
+>
+> > > > > #4  0x00005612167e036f in flatview_write_continue
+>
+> > > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304,
+>
+> > > > > attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out
+>
+> > > > > of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized
+>
+> > > > > out>, mr=0x56121846d340)
+>
+> > > > >     at /usr/src/debug/qemu-4.0/exec.c:3279
+>
+> > > > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
+>
+> > > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address
+>
+> > > > > 0x7fce7dd97028 out of bounds>, len=2) at
+>
+> > > > > /usr/src/debug/qemu-4.0/exec.c:3318
+>
+> > > > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
+>
+> > > > > addr=<optimized out>, attrs=..., buf=<optimized out>,
+>
+> > > > > len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
+>
+> > > > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
+>
+> > > > > addr=<optimized out>, attrs=..., attrs@entry=...,
+>
+> > > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of
+>
+> > > > > bounds>, len=<optimized out>, is_write=<optimized out>) at
+>
+> > > > > /usr/src/debug/qemu-4.0/exec.c:3419
+>
+> > > > > #8  0x0000561216849da1 in kvm_cpu_exec
+>
+> > > > > (cpu=cpu@entry=0x56121849aa00) at
+>
+> > > > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
+>
+> > > > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
+>
+> > > > > (arg=arg@entry=0x56121849aa00) at
+>
+> > > > > /usr/src/debug/qemu-4.0/cpus.c:1281
+>
+> > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>)
+>
+> > > > > at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
+>
+> > > > > #11 0x00007fce7bef6e25 in start_thread () from
+>
+> > > > > /lib64/libpthread.so.0
+>
+> > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
+>
+> > > > >
+>
+> > > > > And I searched and found
+>
+> > > > >
+https://bugzilla.redhat.com/show_bug.cgi?id=1706759
+, which has the
+>
+> > > > > same
+>
+> > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk:
+>
+> > > > > Add
+>
+> > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this
+>
+> > > > > particular
+>
+> > > > > bug.
+>
+> > > > >
+>
+> > > > > But I can still hit the bug even after applying the commit. Do I
+>
+> > > > > miss
+>
+> > > > > anything?
+>
+> > > >
+>
+> > > > Hi Eryu,
+>
+> > > > This backtrace seems to be caused by this bug (there were two bugs in
+>
+> > > > 1706759):
+https://bugzilla.redhat.com/show_bug.cgi?id=1708480
+>
+> > > > Although the solution hasn't been tested on virtio-blk yet, you may
+>
+> > > > want to apply this patch:
+>
+> > > >
+>
+> > > >
+https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
+>
+> > > > Let me know if this works.
+>
+> > >
+>
+> > > Unfortunately, I still see the same segfault & backtrace after applying
+>
+> > > commit 421afd2fe8dd ("virtio: reset region cache when on queue
+>
+> > > deletion")
+>
+> > >
+>
+> > > Anything I can help to debug?
+>
+> >
+>
+> > Please post the QEMU command-line and the QMP commands use to remove the
+>
+> > device.
+>
+>
+>
+> It's a normal kata instance using virtio-fs as rootfs.
+>
+>
+>
+> /usr/local/libexec/qemu-kvm -name
+>
+> sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
+>
+>  -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine
+>
+> q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
+>
+>  -cpu host -qmp
+>
+> unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
+>
+>  \
+>
+>  -qmp
+>
+> unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
+>
+>  \
+>
+>  -m 2048M,slots=10,maxmem=773893M -device
+>
+> pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
+>
+>  -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device
+>
+> virtconsole,chardev=charconsole0,id=console0 \
+>
+>  -chardev
+>
+> socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait
+>
+>  \
+>
+>  -device
+>
+> virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10
+>
+>  \
+>
+>  -chardev
+>
+> socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait
+>
+>  \
+>
+>  -device nvdimm,id=nv0,memdev=mem0 -object
+>
+> memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456
+>
+>  \
+>
+>  -object rng-random,id=rng0,filename=/dev/urandom -device
+>
+> virtio-rng,rng=rng0,romfile= \
+>
+>  -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
+>
+>  -chardev
+>
+> socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait
+>
+>  \
+>
+>  -chardev
+>
+> socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock
+>
+>  \
+>
+>  -device
+>
+> vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M
+>
+>  -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
+>
+>  -device
+>
+> driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile=
+>
+>  \
+>
+>  -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config
+>
+> -nodefaults -nographic -daemonize \
+>
+>  -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on
+>
+> -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
+>
+>  -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1
+>
+> i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp
+>
+> reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests
+>
+> net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1
+>
+> rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet
+>
+> systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false
+>
+> init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target
+>
+> systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \
+>
+>  -pidfile
+>
+> /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid
+>
+>  \
+>
+>  -smp 1,cores=1,threads=1,sockets=96,maxcpus=96
+>
+>
+>
+> QMP command to delete device (the device id is just an example, not the
+>
+> one caused the crash):
+>
+>
+>
+> "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
+>
+>
+>
+> which has been hot plugged by:
+>
+> "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
+>
+> "{\"return\": {}}"
+>
+> "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
+>
+> "{\"return\": {}}"
+>
+>
+Thanks.  I wasn't able to reproduce this crash with qemu.git/master.
+>
+>
+One thing that is strange about the latest backtrace you posted: QEMU is
+>
+dispatching the memory access instead of using the ioeventfd code that
+>
+that virtio-blk-pci normally takes when a virtqueue is notified.  I
+>
+guess this means ioeventfd has already been disabled due to the hot
+>
+unplug.
+>
+>
+Could you try with machine type "i440fx" instead of "q35"?  I wonder if
+>
+pci-bridge/shpc is part of the problem.
+Sure, will try it. But it may take some time, as the test bed is busy
+with other testing tasks. I'll report back once I got the results.
+
+Thanks,
+Eryu
+
diff --git a/results/classifier/004/other/88281850 b/results/classifier/004/other/88281850
new file mode 100644
index 00000000..10bfcf8b
--- /dev/null
+++ b/results/classifier/004/other/88281850
@@ -0,0 +1,289 @@
+other: 0.983
+instruction: 0.978
+graphic: 0.974
+network: 0.973
+assembly: 0.972
+device: 0.970
+semantic: 0.968
+boot: 0.967
+socket: 0.966
+mistranslation: 0.948
+vnc: 0.945
+KVM: 0.881
+
+[Bug] Take more 150s to boot qemu on ARM64
+
+Hi all,
+I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes
+about 150s between beginning to run qemu command and beginng to boot
+Linux kernel ("EFI stub: Booting Linux Kernel...").
+But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel
+code and it finds c2445d387850 ("srcu: Add contention check to
+call_srcu() srcu_data ->lock acquisition").
+The qemu (qemu version is 6.2.92) command i run is :
+
+./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
+--trace "kvm*" \
+-cpu host \
+-machine virt,accel=kvm,gic-version=3  \
+-machine smp.cpus=2,smp.sockets=2 \
+-no-reboot \
+-nographic \
+-monitor unix:/home/cx/qmp-test,server,nowait \
+-bios /home/cx/boot/QEMU_EFI.fd \
+-kernel /home/cx/boot/Image  \
+-device
+pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
+\
+-device vfio-pci,host=7d:01.3,id=net0 \
+-device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
+-drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
+-append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
+-net none \
+-D /home/cx/qemu_log.txt
+I am not familiar with rcu code, and don't know how it causes the issue.
+Do you have any idea about this issue?
+Best Regard,
+
+Xiang Chen
+
+On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote:
+>
+Hi all,
+>
+>
+I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes about
+>
+150s between beginning to run qemu command and beginng to boot Linux kernel
+>
+("EFI stub: Booting Linux Kernel...").
+>
+>
+But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code
+>
+and it finds c2445d387850 ("srcu: Add contention check to call_srcu()
+>
+srcu_data ->lock acquisition").
+>
+>
+The qemu (qemu version is 6.2.92) command i run is :
+>
+>
+./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
+>
+--trace "kvm*" \
+>
+-cpu host \
+>
+-machine virt,accel=kvm,gic-version=3  \
+>
+-machine smp.cpus=2,smp.sockets=2 \
+>
+-no-reboot \
+>
+-nographic \
+>
+-monitor unix:/home/cx/qmp-test,server,nowait \
+>
+-bios /home/cx/boot/QEMU_EFI.fd \
+>
+-kernel /home/cx/boot/Image  \
+>
+-device
+>
+pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
+>
+\
+>
+-device vfio-pci,host=7d:01.3,id=net0 \
+>
+-device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
+>
+-drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
+>
+-append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
+>
+-net none \
+>
+-D /home/cx/qemu_log.txt
+>
+>
+I am not familiar with rcu code, and don't know how it causes the issue. Do
+>
+you have any idea about this issue?
+Please see the discussion here:
+https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
+Though that report requires ACPI to be forced on to get the
+delay, which results in more than 9,000 back-to-back calls to
+synchronize_srcu_expedited().  I cannot reproduce this on my setup, even
+with an artificial tight loop invoking synchronize_srcu_expedited(),
+but then again I don't have ARM hardware.
+
+My current guess is that the following patch, but with larger values for
+SRCU_MAX_NODELAY_PHASE.  Here "larger" might well be up in the hundreds,
+or perhaps even larger.
+
+If you get a chance to experiment with this, could you please reply
+to the discussion at the above URL?  (Or let me know, and I can CC
+you on the next message in that thread.)
+
+                                                Thanx, Paul
+
+------------------------------------------------------------------------
+
+diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
+index 50ba70f019dea..0db7873f4e95b 100644
+--- a/kernel/rcu/srcutree.c
++++ b/kernel/rcu/srcutree.c
+@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
+ 
+ #define SRCU_INTERVAL          1       // Base delay if no expedited GPs 
+pending.
+ #define SRCU_MAX_INTERVAL      10      // Maximum incremental delay from slow 
+readers.
+-#define SRCU_MAX_NODELAY_PHASE 1       // Maximum per-GP-phase consecutive 
+no-delay instances.
++#define SRCU_MAX_NODELAY_PHASE 3       // Maximum per-GP-phase consecutive 
+no-delay instances.
+ #define SRCU_MAX_NODELAY       100     // Maximum consecutive no-delay 
+instances.
+ 
+ /*
+@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
+  */
+ static unsigned long srcu_get_delay(struct srcu_struct *ssp)
+ {
++       unsigned long gpstart;
++       unsigned long j;
+        unsigned long jbase = SRCU_INTERVAL;
+ 
+        if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), 
+READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
+                jbase = 0;
+-       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
+-               jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
+-       if (!jbase) {
+-               WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
+READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
+-               if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
+-                       jbase = 1;
++       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
++               j = jiffies - 1;
++               gpstart = READ_ONCE(ssp->srcu_gp_start);
++               if (time_after(j, gpstart))
++                       jbase += j - gpstart;
++               if (!jbase) {
++                       WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
+READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
++                       if (READ_ONCE(ssp->srcu_n_exp_nodelay) > 
+SRCU_MAX_NODELAY_PHASE)
++                               jbase = 1;
++               }
+        }
+        return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
+ }
+
+å¨ 2022/6/13 21:22, Paul E. McKenney åé:
+On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote:
+Hi all,
+
+I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes about
+150s between beginning to run qemu command and beginng to boot Linux kernel
+("EFI stub: Booting Linux Kernel...").
+
+But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code
+and it finds c2445d387850 ("srcu: Add contention check to call_srcu()
+srcu_data ->lock acquisition").
+
+The qemu (qemu version is 6.2.92) command i run is :
+
+./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
+--trace "kvm*" \
+-cpu host \
+-machine virt,accel=kvm,gic-version=3  \
+-machine smp.cpus=2,smp.sockets=2 \
+-no-reboot \
+-nographic \
+-monitor unix:/home/cx/qmp-test,server,nowait \
+-bios /home/cx/boot/QEMU_EFI.fd \
+-kernel /home/cx/boot/Image  \
+-device 
+pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
+\
+-device vfio-pci,host=7d:01.3,id=net0 \
+-device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
+-drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
+-append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
+-net none \
+-D /home/cx/qemu_log.txt
+
+I am not familiar with rcu code, and don't know how it causes the issue. Do
+you have any idea about this issue?
+Please see the discussion here:
+https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
+Though that report requires ACPI to be forced on to get the
+delay, which results in more than 9,000 back-to-back calls to
+synchronize_srcu_expedited().  I cannot reproduce this on my setup, even
+with an artificial tight loop invoking synchronize_srcu_expedited(),
+but then again I don't have ARM hardware.
+
+My current guess is that the following patch, but with larger values for
+SRCU_MAX_NODELAY_PHASE.  Here "larger" might well be up in the hundreds,
+or perhaps even larger.
+
+If you get a chance to experiment with this, could you please reply
+to the discussion at the above URL?  (Or let me know, and I can CC
+you on the next message in that thread.)
+Ok, thanks, i will reply it on above URL.
+Thanx, Paul
+
+------------------------------------------------------------------------
+
+diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
+index 50ba70f019dea..0db7873f4e95b 100644
+--- a/kernel/rcu/srcutree.c
++++ b/kernel/rcu/srcutree.c
+@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
+#define SRCU_INTERVAL		1	// Base delay if no expedited GPs pending.
+#define SRCU_MAX_INTERVAL     10      // Maximum incremental delay from slow 
+readers.
+-#define SRCU_MAX_NODELAY_PHASE 1       // Maximum per-GP-phase consecutive 
+no-delay instances.
++#define SRCU_MAX_NODELAY_PHASE 3       // Maximum per-GP-phase consecutive 
+no-delay instances.
+  #define SRCU_MAX_NODELAY      100     // Maximum consecutive no-delay 
+instances.
+/*
+@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
+   */
+  static unsigned long srcu_get_delay(struct srcu_struct *ssp)
+  {
++       unsigned long gpstart;
++       unsigned long j;
+        unsigned long jbase = SRCU_INTERVAL;
+if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
+jbase = 0;
+-       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
+-               jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
+-       if (!jbase) {
+-               WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
+READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
+-               if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
+-                       jbase = 1;
++       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
++               j = jiffies - 1;
++               gpstart = READ_ONCE(ssp->srcu_gp_start);
++               if (time_after(j, gpstart))
++                       jbase += j - gpstart;
++               if (!jbase) {
++                       WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
+READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
++                       if (READ_ONCE(ssp->srcu_n_exp_nodelay) > 
+SRCU_MAX_NODELAY_PHASE)
++                               jbase = 1;
++               }
+        }
+        return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
+  }
+.
+
diff --git a/results/classifier/004/other/92957605 b/results/classifier/004/other/92957605
new file mode 100644
index 00000000..fe9db1c7
--- /dev/null
+++ b/results/classifier/004/other/92957605
@@ -0,0 +1,426 @@
+other: 0.997
+semantic: 0.995
+device: 0.993
+socket: 0.993
+instruction: 0.993
+assembly: 0.992
+boot: 0.992
+network: 0.989
+graphic: 0.986
+KVM: 0.982
+vnc: 0.981
+mistranslation: 0.974
+
+[Qemu-devel] Fwd:  [BUG] Failed to compile using gcc7.1
+
+Hi all,
+I encountered the same problem on gcc 7.1.1 and found Qu's mail in
+this list from google search.
+
+Temporarily fix it by specifying the string length in snprintf
+directive. Hope this is helpful to other people encountered the same
+problem.
+
+@@ -1,9 +1,7 @@
+---
+--- a/block/blkdebug.c
+-                 "blkdebug:%s:%s", s->config_file ?: "",
+--- a/block/blkverify.c
+-                 "blkverify:%s:%s",
+--- a/hw/usb/bus.c
+-        snprintf(downstream->path, sizeof(downstream->path), "%s.%d",
+-        snprintf(downstream->path, sizeof(downstream->path), "%d", portnr);
+--
++++ b/block/blkdebug.c
++                 "blkdebug:%.2037s:%.2037s", s->config_file ?: "",
++++ b/block/blkverify.c
++                 "blkverify:%.2038s:%.2038s",
++++ b/hw/usb/bus.c
++        snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d",
++        snprintf(downstream->path, sizeof(downstream->path), "%.12d", portnr);
+
+Tsung-en Hsiao
+
+>
+Qu Wenruo Wrote:
+>
+>
+Hi all,
+>
+>
+After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc.
+>
+>
+The error is:
+>
+>
+------
+>
+CC      block/blkdebug.o
+>
+block/blkdebug.c: In function 'blkdebug_refresh_filename':
+>
+>
+block/blkdebug.c:693:31: error: '%s' directive output may be truncated
+>
+writing up to 4095 bytes into a region of size 4086
+>
+[-Werror=format-truncation=]
+>
+>
+"blkdebug:%s:%s", s->config_file ?: "",
+>
+^~
+>
+In file included from /usr/include/stdio.h:939:0,
+>
+from /home/adam/qemu/include/qemu/osdep.h:68,
+>
+from block/blkdebug.c:25:
+>
+>
+/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 11
+>
+or more bytes (assuming 4106) into a destination of size 4096
+>
+>
+return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
+>
+^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+>
+__bos (__s), __fmt, __va_arg_pack ());
+>
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+>
+cc1: all warnings being treated as errors
+>
+make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1
+>
+------
+>
+>
+It seems that gcc 7 is introducing more restrict check for printf.
+>
+>
+If using clang, although there are some extra warning, it can at least pass
+>
+the compile.
+>
+>
+Thanks,
+>
+Qu
+
+Hi Tsung-en,
+
+On 06/11/2017 04:08 PM, Tsung-en Hsiao wrote:
+Hi all,
+I encountered the same problem on gcc 7.1.1 and found Qu's mail in
+this list from google search.
+
+Temporarily fix it by specifying the string length in snprintf
+directive. Hope this is helpful to other people encountered the same
+problem.
+Thank your for sharing this.
+@@ -1,9 +1,7 @@
+---
+--- a/block/blkdebug.c
+-                 "blkdebug:%s:%s", s->config_file ?: "",
+--- a/block/blkverify.c
+-                 "blkverify:%s:%s",
+--- a/hw/usb/bus.c
+-        snprintf(downstream->path, sizeof(downstream->path), "%s.%d",
+-        snprintf(downstream->path, sizeof(downstream->path), "%d", portnr);
+--
++++ b/block/blkdebug.c
++                 "blkdebug:%.2037s:%.2037s", s->config_file ?: "",
+It is a rather funny way to silent this warning :) Truncating the
+filename until it fits.
+However I don't think it is the correct way since there is indeed an
+overflow of bs->exact_filename.
+Apparently exact_filename from "block/block_int.h" is defined to hold a
+pathname:
+char exact_filename[PATH_MAX];
+but is used for more than that (for example in blkdebug.c it might use
+until 10+2*PATH_MAX chars).
+I suppose it started as a buffer to hold a pathname then more block
+drivers were added and this buffer ended used differently.
+If it is a multi-purpose buffer one safer option might be to declare it
+as a GString* and use g_string_printf().
+I CC'ed the block folks to have their feedback.
+
+Regards,
+
+Phil.
++++ b/block/blkverify.c
++                 "blkverify:%.2038s:%.2038s",
++++ b/hw/usb/bus.c
++        snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d",
++        snprintf(downstream->path, sizeof(downstream->path), "%.12d", portnr);
+
+Tsung-en Hsiao
+Qu Wenruo Wrote:
+
+Hi all,
+
+After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc.
+
+The error is:
+
+------
+ CC      block/blkdebug.o
+block/blkdebug.c: In function 'blkdebug_refresh_filename':
+
+block/blkdebug.c:693:31: error: '%s' directive output may be truncated writing 
+up to 4095 bytes into a region of size 4086 [-Werror=format-truncation=]
+
+                 "blkdebug:%s:%s", s->config_file ?: "",
+                              ^~
+In file included from /usr/include/stdio.h:939:0,
+                from /home/adam/qemu/include/qemu/osdep.h:68,
+                from block/blkdebug.c:25:
+
+/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 11 or 
+more bytes (assuming 4106) into a destination of size 4096
+
+  return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
+         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+       __bos (__s), __fmt, __va_arg_pack ());
+       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+cc1: all warnings being treated as errors
+make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1
+------
+
+It seems that gcc 7 is introducing more restrict check for printf.
+
+If using clang, although there are some extra warning, it can at least pass the 
+compile.
+
+Thanks,
+Qu
+
+On 2017-06-12 05:19, Philippe Mathieu-DaudÃ© wrote:
+>
+Hi Tsung-en,
+>
+>
+On 06/11/2017 04:08 PM, Tsung-en Hsiao wrote:
+>
+> Hi all,
+>
+> I encountered the same problem on gcc 7.1.1 and found Qu's mail in
+>
+> this list from google search.
+>
+>
+>
+> Temporarily fix it by specifying the string length in snprintf
+>
+> directive. Hope this is helpful to other people encountered the same
+>
+> problem.
+>
+>
+Thank your for sharing this.
+>
+>
+>
+>
+> @@ -1,9 +1,7 @@
+>
+> ---
+>
+> --- a/block/blkdebug.c
+>
+> -                 "blkdebug:%s:%s", s->config_file ?: "",
+>
+> --- a/block/blkverify.c
+>
+> -                 "blkverify:%s:%s",
+>
+> --- a/hw/usb/bus.c
+>
+> -        snprintf(downstream->path, sizeof(downstream->path), "%s.%d",
+>
+> -        snprintf(downstream->path, sizeof(downstream->path), "%d",
+>
+> portnr);
+>
+> --
+>
+> +++ b/block/blkdebug.c
+>
+> +                 "blkdebug:%.2037s:%.2037s", s->config_file ?: "",
+>
+>
+It is a rather funny way to silent this warning :) Truncating the
+>
+filename until it fits.
+>
+>
+However I don't think it is the correct way since there is indeed an
+>
+overflow of bs->exact_filename.
+>
+>
+Apparently exact_filename from "block/block_int.h" is defined to hold a
+>
+pathname:
+>
+char exact_filename[PATH_MAX];
+>
+>
+but is used for more than that (for example in blkdebug.c it might use
+>
+until 10+2*PATH_MAX chars).
+In any case, truncating the filenames will do just as much as truncating
+the result: You'll get an unusable filename.
+
+>
+I suppose it started as a buffer to hold a pathname then more block
+>
+drivers were added and this buffer ended used differently.
+>
+>
+If it is a multi-purpose buffer one safer option might be to declare it
+>
+as a GString* and use g_string_printf().
+What it is supposed to be now is just an information string we can print
+to the user, because strings are nicer than JSON objects. There are some
+commands that take a filename for identifying a block node, but I dream
+we can get rid of them in 3.0...
+
+The right solution is to remove it altogether and have a
+"char *bdrv_filename(BlockDriverState *bs)" function (which generates
+the filename every time it's called). I've been working on this for some
+years now, actually, but it was never pressing enough to get it finished
+(so I never had enough time).
+
+What we can do in the meantime is to not generate a plain filename if it
+won't fit into bs->exact_filename.
+
+(The easiest way to do this probably would be to truncate
+bs->exact_filename back to an empty string if snprintf() returns a value
+greater than or equal to the length of bs->exact_filename.)
+
+What to do about hw/usb/bus.c I don't know (I guess the best solution
+would be to ignore the warning, but I don't suppose that is going to work).
+
+Max
+
+>
+>
+I CC'ed the block folks to have their feedback.
+>
+>
+Regards,
+>
+>
+Phil.
+>
+>
+> +++ b/block/blkverify.c
+>
+> +                 "blkverify:%.2038s:%.2038s",
+>
+> +++ b/hw/usb/bus.c
+>
+> +        snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d",
+>
+> +        snprintf(downstream->path, sizeof(downstream->path), "%.12d",
+>
+> portnr);
+>
+>
+>
+> Tsung-en Hsiao
+>
+>
+>
+>> Qu Wenruo Wrote:
+>
+>>
+>
+>> Hi all,
+>
+>>
+>
+>> After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with
+>
+>> gcc.
+>
+>>
+>
+>> The error is:
+>
+>>
+>
+>> ------
+>
+>>  CC      block/blkdebug.o
+>
+>> block/blkdebug.c: In function 'blkdebug_refresh_filename':
+>
+>>
+>
+>> block/blkdebug.c:693:31: error: '%s' directive output may be
+>
+>> truncated writing up to 4095 bytes into a region of size 4086
+>
+>> [-Werror=format-truncation=]
+>
+>>
+>
+>>                  "blkdebug:%s:%s", s->config_file ?: "",
+>
+>>                               ^~
+>
+>> In file included from /usr/include/stdio.h:939:0,
+>
+>>                 from /home/adam/qemu/include/qemu/osdep.h:68,
+>
+>>                 from block/blkdebug.c:25:
+>
+>>
+>
+>> /usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk'
+>
+>> output 11 or more bytes (assuming 4106) into a destination of size 4096
+>
+>>
+>
+>>   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
+>
+>>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+>
+>>        __bos (__s), __fmt, __va_arg_pack ());
+>
+>>        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+>
+>> cc1: all warnings being treated as errors
+>
+>> make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1
+>
+>> ------
+>
+>>
+>
+>> It seems that gcc 7 is introducing more restrict check for printf.
+>
+>>
+>
+>> If using clang, although there are some extra warning, it can at
+>
+>> least pass the compile.
+>
+>>
+>
+>> Thanks,
+>
+>> Qu
+>
+>
+signature.asc
+Description:
+OpenPGP digital signature
+
diff --git a/results/classifier/004/other/95154278 b/results/classifier/004/other/95154278
new file mode 100644
index 00000000..d23c902d
--- /dev/null
+++ b/results/classifier/004/other/95154278
@@ -0,0 +1,163 @@
+other: 0.953
+device: 0.951
+graphic: 0.950
+vnc: 0.948
+assembly: 0.944
+instruction: 0.938
+semantic: 0.937
+KVM: 0.916
+socket: 0.913
+network: 0.913
+boot: 0.902
+mistranslation: 0.897
+
+[Qemu-devel] [BUG] checkpatch.pl hangs on target/mips/msa_helper.c
+
+If  checkpatch.pl is applied (using switch "-f") on file 
+target/mips/msa_helper.c, it will hang.
+
+There is a workaround for this particular file:
+
+These lines in msa_helper.c:
+
+        uint## BITS ##_t S = _S, T = _T;                            \
+        uint## BITS ##_t as, at, xs, xt, xd;                        \
+
+should be replaced with:
+
+        uint## BITS ## _t S = _S, T = _T;                           \
+        uint## BITS ## _t as, at, xs, xt, xd;                       \
+
+(a space is added after the second "##" in each line)
+
+The workaround is found by partial deleting and undeleting of the code in 
+msa_helper.c in binary search fashion.
+
+This workaround will soon be submitted by me as a patch within a series on misc 
+MIPS issues.
+
+I took a look at checkpatch.pl code, and it looks it is fairly complicated to 
+fix the issue, since it happens in the code segment involving intricate logic 
+conditions.
+
+Regards,
+Aleksandar
+
+On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote:
+>
+If  checkpatch.pl is applied (using switch "-f") on file
+>
+target/mips/msa_helper.c, it will hang.
+>
+>
+There is a workaround for this particular file:
+>
+>
+These lines in msa_helper.c:
+>
+>
+uint## BITS ##_t S = _S, T = _T;                            \
+>
+uint## BITS ##_t as, at, xs, xt, xd;                        \
+>
+>
+should be replaced with:
+>
+>
+uint## BITS ## _t S = _S, T = _T;                           \
+>
+uint## BITS ## _t as, at, xs, xt, xd;                       \
+>
+>
+(a space is added after the second "##" in each line)
+>
+>
+The workaround is found by partial deleting and undeleting of the code in
+>
+msa_helper.c in binary search fashion.
+>
+>
+This workaround will soon be submitted by me as a patch within a series on
+>
+misc MIPS issues.
+>
+>
+I took a look at checkpatch.pl code, and it looks it is fairly complicated to
+>
+fix the issue, since it happens in the code segment involving intricate logic
+>
+conditions.
+Thanks for figuring this out, Aleksandar.  Not sure if anyone else has
+the apetite to fix checkpatch.pl.
+
+Stefan
+signature.asc
+Description:
+PGP signature
+
+On 07/11/2018 09:36 AM, Stefan Hajnoczi wrote:
+>
+On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote:
+>
+> If  checkpatch.pl is applied (using switch "-f") on file
+>
+> target/mips/msa_helper.c, it will hang.
+>
+>
+>
+> There is a workaround for this particular file:
+>
+>
+>
+> These lines in msa_helper.c:
+>
+>
+>
+>         uint## BITS ##_t S = _S, T = _T;                            \
+>
+>         uint## BITS ##_t as, at, xs, xt, xd;                        \
+>
+>
+>
+> should be replaced with:
+>
+>
+>
+>         uint## BITS ## _t S = _S, T = _T;                           \
+>
+>         uint## BITS ## _t as, at, xs, xt, xd;                       \
+>
+>
+>
+> (a space is added after the second "##" in each line)
+>
+>
+>
+> The workaround is found by partial deleting and undeleting of the code in
+>
+> msa_helper.c in binary search fashion.
+>
+>
+>
+> This workaround will soon be submitted by me as a patch within a series on
+>
+> misc MIPS issues.
+>
+>
+>
+> I took a look at checkpatch.pl code, and it looks it is fairly complicated
+>
+> to fix the issue, since it happens in the code segment involving intricate
+>
+> logic conditions.
+>
+>
+Thanks for figuring this out, Aleksandar.  Not sure if anyone else has
+>
+the apetite to fix checkpatch.pl.
+Anyone else but Paolo ;P
+http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01250.html
+signature.asc
+Description:
+OpenPGP digital signature
+
diff --git a/results/classifier/004/other/96782458 b/results/classifier/004/other/96782458
new file mode 100644
index 00000000..dabee5fb
--- /dev/null
+++ b/results/classifier/004/other/96782458
@@ -0,0 +1,1007 @@
+semantic: 0.984
+other: 0.982
+assembly: 0.982
+boot: 0.980
+socket: 0.976
+vnc: 0.976
+device: 0.974
+instruction: 0.974
+graphic: 0.973
+network: 0.967
+KVM: 0.963
+mistranslation: 0.949
+
+[Qemu-devel] [BUG] Migrate failes between boards with different PMC counts
+
+Hi all,
+
+Recently, I found migration failed when enable vPMU.
+
+migrate vPMU state was introduced in linux-3.10 + qemu-1.7.
+
+As long as enable vPMU, qemu will save / load the
+vmstate_msr_architectural_pmu(msr_global_ctrl) register during the migration.
+But global_ctrl generated based on cpuid(0xA), the number of general-purpose 
+performance
+monitoring counters(PMC) can vary according to Intel SDN. The number of PMC 
+presented
+to vm, does not support configuration currently, it depend on host cpuid, and 
+enable all pmc
+defaultly at KVM. It cause migration to fail between boards with different PMC 
+counts.
+
+The return value of cpuid (0xA) is different dur to cpu, according to Intel 
+SDNï¼18-10 Vol. 3B:
+
+Note: The number of general-purpose performance monitoring counters (i.e. N in 
+Figure 18-9)
+can vary across processor generations within a processor family, across 
+processor families, or
+could be different depending on the configuration chosen at boot time in the 
+BIOS regarding
+Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom processors; N 
+=4 for processors
+based on the Nehalem microarchitecture; for processors based on the Sandy Bridge
+microarchitecture, N = 4 if Intel Hyper Threading Technology is active and N=8 
+if not active).
+
+Also I found, N=8 if HT is not active based on the broadwellï¼,
+such as CPU E7-8890 v4 @ 2.20GHz   
+
+# ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda
+/data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming 
+tcp::8888
+Completed 100 %
+qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff
+qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: 
+kvm_put_msrs: 
+Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
+Aborted
+
+So make number of pmc configurable to vm ? Any better idea ?
+
+
+Regards,
+-Zhuang Yanying
+
+* Zhuangyanying (address@hidden) wrote:
+>
+Hi all,
+>
+>
+Recently, I found migration failed when enable vPMU.
+>
+>
+migrate vPMU state was introduced in linux-3.10 + qemu-1.7.
+>
+>
+As long as enable vPMU, qemu will save / load the
+>
+vmstate_msr_architectural_pmu(msr_global_ctrl) register during the migration.
+>
+But global_ctrl generated based on cpuid(0xA), the number of general-purpose
+>
+performance
+>
+monitoring counters(PMC) can vary according to Intel SDN. The number of PMC
+>
+presented
+>
+to vm, does not support configuration currently, it depend on host cpuid, and
+>
+enable all pmc
+>
+defaultly at KVM. It cause migration to fail between boards with different
+>
+PMC counts.
+>
+>
+The return value of cpuid (0xA) is different dur to cpu, according to Intel
+>
+SDNï¼18-10 Vol. 3B:
+>
+>
+Note: The number of general-purpose performance monitoring counters (i.e. N
+>
+in Figure 18-9)
+>
+can vary across processor generations within a processor family, across
+>
+processor families, or
+>
+could be different depending on the configuration chosen at boot time in the
+>
+BIOS regarding
+>
+Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom processors;
+>
+N =4 for processors
+>
+based on the Nehalem microarchitecture; for processors based on the Sandy
+>
+Bridge
+>
+microarchitecture, N = 4 if Intel Hyper Threading Technology is active and
+>
+N=8 if not active).
+>
+>
+Also I found, N=8 if HT is not active based on the broadwellï¼,
+>
+such as CPU E7-8890 v4 @ 2.20GHz
+>
+>
+# ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda
+>
+/data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming
+>
+tcp::8888
+>
+Completed 100 %
+>
+qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff
+>
+qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833:
+>
+kvm_put_msrs:
+>
+Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
+>
+Aborted
+>
+>
+So make number of pmc configurable to vm ? Any better idea ?
+Coincidentally we hit a similar problem a few days ago with -cpu host  - it 
+took me
+quite a while to spot the difference between the machines was the source
+had hyperthreading disabled.
+
+An option to set the number of counters makes sense to me; but I wonder
+how many other options we need as well.  Also, I'm not sure there's any
+easy way for libvirt etc to figure out how many counters a host supports - it's
+not in /proc/cpuinfo.
+
+Dave
+
+>
+>
+Regards,
+>
+-Zhuang Yanying
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote:
+>
+* Zhuangyanying (address@hidden) wrote:
+>
+> Hi all,
+>
+>
+>
+> Recently, I found migration failed when enable vPMU.
+>
+>
+>
+> migrate vPMU state was introduced in linux-3.10 + qemu-1.7.
+>
+>
+>
+> As long as enable vPMU, qemu will save / load the
+>
+> vmstate_msr_architectural_pmu(msr_global_ctrl) register during the
+>
+> migration.
+>
+> But global_ctrl generated based on cpuid(0xA), the number of
+>
+> general-purpose performance
+>
+> monitoring counters(PMC) can vary according to Intel SDN. The number of PMC
+>
+> presented
+>
+> to vm, does not support configuration currently, it depend on host cpuid,
+>
+> and enable all pmc
+>
+> defaultly at KVM. It cause migration to fail between boards with different
+>
+> PMC counts.
+>
+>
+>
+> The return value of cpuid (0xA) is different dur to cpu, according to Intel
+>
+> SDNï¼18-10 Vol. 3B:
+>
+>
+>
+> Note: The number of general-purpose performance monitoring counters (i.e. N
+>
+> in Figure 18-9)
+>
+> can vary across processor generations within a processor family, across
+>
+> processor families, or
+>
+> could be different depending on the configuration chosen at boot time in
+>
+> the BIOS regarding
+>
+> Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom
+>
+> processors; N =4 for processors
+>
+> based on the Nehalem microarchitecture; for processors based on the Sandy
+>
+> Bridge
+>
+> microarchitecture, N = 4 if Intel Hyper Threading Technology is active and
+>
+> N=8 if not active).
+>
+>
+>
+> Also I found, N=8 if HT is not active based on the broadwellï¼,
+>
+> such as CPU E7-8890 v4 @ 2.20GHz
+>
+>
+>
+> # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda
+>
+> /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming
+>
+> tcp::8888
+>
+> Completed 100 %
+>
+> qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff
+>
+> qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833:
+>
+> kvm_put_msrs:
+>
+> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
+>
+> Aborted
+>
+>
+>
+> So make number of pmc configurable to vm ? Any better idea ?
+>
+>
+Coincidentally we hit a similar problem a few days ago with -cpu host  - it
+>
+took me
+>
+quite a while to spot the difference between the machines was the source
+>
+had hyperthreading disabled.
+>
+>
+An option to set the number of counters makes sense to me; but I wonder
+>
+how many other options we need as well.  Also, I'm not sure there's any
+>
+easy way for libvirt etc to figure out how many counters a host supports -
+>
+it's not in /proc/cpuinfo.
+We actually try to avoid /proc/cpuinfo whereever possible. We do direct
+CPUID asm instructions to identify features, and prefer to use
+/sys/devices/system/cpu if that has suitable data
+
+Where do the PMC counts come from originally ? CPUID or something else ?
+
+Regards,
+Daniel
+-- 
+|:
+https://berrange.com
+-o-
+https://www.flickr.com/photos/dberrange
+:|
+|:
+https://libvirt.org
+-o-
+https://fstop138.berrange.com
+:|
+|:
+https://entangle-photo.org
+-o-
+https://www.instagram.com/dberrange
+:|
+
+* Daniel P. Berrange (address@hidden) wrote:
+>
+On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote:
+>
+> * Zhuangyanying (address@hidden) wrote:
+>
+> > Hi all,
+>
+> >
+>
+> > Recently, I found migration failed when enable vPMU.
+>
+> >
+>
+> > migrate vPMU state was introduced in linux-3.10 + qemu-1.7.
+>
+> >
+>
+> > As long as enable vPMU, qemu will save / load the
+>
+> > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the
+>
+> > migration.
+>
+> > But global_ctrl generated based on cpuid(0xA), the number of
+>
+> > general-purpose performance
+>
+> > monitoring counters(PMC) can vary according to Intel SDN. The number of
+>
+> > PMC presented
+>
+> > to vm, does not support configuration currently, it depend on host cpuid,
+>
+> > and enable all pmc
+>
+> > defaultly at KVM. It cause migration to fail between boards with
+>
+> > different PMC counts.
+>
+> >
+>
+> > The return value of cpuid (0xA) is different dur to cpu, according to
+>
+> > Intel SDNï¼18-10 Vol. 3B:
+>
+> >
+>
+> > Note: The number of general-purpose performance monitoring counters (i.e.
+>
+> > N in Figure 18-9)
+>
+> > can vary across processor generations within a processor family, across
+>
+> > processor families, or
+>
+> > could be different depending on the configuration chosen at boot time in
+>
+> > the BIOS regarding
+>
+> > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom
+>
+> > processors; N =4 for processors
+>
+> > based on the Nehalem microarchitecture; for processors based on the Sandy
+>
+> > Bridge
+>
+> > microarchitecture, N = 4 if Intel Hyper Threading Technology is active
+>
+> > and N=8 if not active).
+>
+> >
+>
+> > Also I found, N=8 if HT is not active based on the broadwellï¼,
+>
+> > such as CPU E7-8890 v4 @ 2.20GHz
+>
+> >
+>
+> > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda
+>
+> > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming
+>
+> > tcp::8888
+>
+> > Completed 100 %
+>
+> > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff
+>
+> > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833:
+>
+> > kvm_put_msrs:
+>
+> > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
+>
+> > Aborted
+>
+> >
+>
+> > So make number of pmc configurable to vm ? Any better idea ?
+>
+>
+>
+> Coincidentally we hit a similar problem a few days ago with -cpu host  - it
+>
+> took me
+>
+> quite a while to spot the difference between the machines was the source
+>
+> had hyperthreading disabled.
+>
+>
+>
+> An option to set the number of counters makes sense to me; but I wonder
+>
+> how many other options we need as well.  Also, I'm not sure there's any
+>
+> easy way for libvirt etc to figure out how many counters a host supports -
+>
+> it's not in /proc/cpuinfo.
+>
+>
+We actually try to avoid /proc/cpuinfo whereever possible. We do direct
+>
+CPUID asm instructions to identify features, and prefer to use
+>
+/sys/devices/system/cpu if that has suitable data
+>
+>
+Where do the PMC counts come from originally ? CPUID or something else ?
+Yes, they're bits 8..15 of CPUID leaf 0xa
+
+Dave
+
+>
+Regards,
+>
+Daniel
+>
+--
+>
+|:
+https://berrange.com
+-o-
+https://www.flickr.com/photos/dberrange
+:|
+>
+|:
+https://libvirt.org
+-o-
+https://fstop138.berrange.com
+:|
+>
+|:
+https://entangle-photo.org
+-o-
+https://www.instagram.com/dberrange
+:|
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
+On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote:
+>
+* Daniel P. Berrange (address@hidden) wrote:
+>
+> On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote:
+>
+> > * Zhuangyanying (address@hidden) wrote:
+>
+> > > Hi all,
+>
+> > >
+>
+> > > Recently, I found migration failed when enable vPMU.
+>
+> > >
+>
+> > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7.
+>
+> > >
+>
+> > > As long as enable vPMU, qemu will save / load the
+>
+> > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the
+>
+> > > migration.
+>
+> > > But global_ctrl generated based on cpuid(0xA), the number of
+>
+> > > general-purpose performance
+>
+> > > monitoring counters(PMC) can vary according to Intel SDN. The number of
+>
+> > > PMC presented
+>
+> > > to vm, does not support configuration currently, it depend on host
+>
+> > > cpuid, and enable all pmc
+>
+> > > defaultly at KVM. It cause migration to fail between boards with
+>
+> > > different PMC counts.
+>
+> > >
+>
+> > > The return value of cpuid (0xA) is different dur to cpu, according to
+>
+> > > Intel SDNï¼18-10 Vol. 3B:
+>
+> > >
+>
+> > > Note: The number of general-purpose performance monitoring counters
+>
+> > > (i.e. N in Figure 18-9)
+>
+> > > can vary across processor generations within a processor family, across
+>
+> > > processor families, or
+>
+> > > could be different depending on the configuration chosen at boot time
+>
+> > > in the BIOS regarding
+>
+> > > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom
+>
+> > > processors; N =4 for processors
+>
+> > > based on the Nehalem microarchitecture; for processors based on the
+>
+> > > Sandy Bridge
+>
+> > > microarchitecture, N = 4 if Intel Hyper Threading Technology is active
+>
+> > > and N=8 if not active).
+>
+> > >
+>
+> > > Also I found, N=8 if HT is not active based on the broadwellï¼,
+>
+> > > such as CPU E7-8890 v4 @ 2.20GHz
+>
+> > >
+>
+> > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda
+>
+> > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true
+>
+> > > -incoming tcp::8888
+>
+> > > Completed 100 %
+>
+> > > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff
+>
+> > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833:
+>
+> > > kvm_put_msrs:
+>
+> > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
+>
+> > > Aborted
+>
+> > >
+>
+> > > So make number of pmc configurable to vm ? Any better idea ?
+>
+> >
+>
+> > Coincidentally we hit a similar problem a few days ago with -cpu host  -
+>
+> > it took me
+>
+> > quite a while to spot the difference between the machines was the source
+>
+> > had hyperthreading disabled.
+>
+> >
+>
+> > An option to set the number of counters makes sense to me; but I wonder
+>
+> > how many other options we need as well.  Also, I'm not sure there's any
+>
+> > easy way for libvirt etc to figure out how many counters a host supports -
+>
+> > it's not in /proc/cpuinfo.
+>
+>
+>
+> We actually try to avoid /proc/cpuinfo whereever possible. We do direct
+>
+> CPUID asm instructions to identify features, and prefer to use
+>
+> /sys/devices/system/cpu if that has suitable data
+>
+>
+>
+> Where do the PMC counts come from originally ? CPUID or something else ?
+>
+>
+Yes, they're bits 8..15 of CPUID leaf 0xa
+Ok, that's easy enough for libvirt to detect then. More a question of what
+libvirt should then do this with the info....
+
+Regards,
+Daniel
+-- 
+|:
+https://berrange.com
+-o-
+https://www.flickr.com/photos/dberrange
+:|
+|:
+https://libvirt.org
+-o-
+https://fstop138.berrange.com
+:|
+|:
+https://entangle-photo.org
+-o-
+https://www.instagram.com/dberrange
+:|
+
+>
+-----Original Message-----
+>
+From: Daniel P. Berrange [
+mailto:address@hidden
+>
+Sent: Monday, April 24, 2017 6:34 PM
+>
+To: Dr. David Alan Gilbert
+>
+Cc: Zhuangyanying; Zhanghailiang; wangxin (U); address@hidden;
+>
+Gonglei (Arei); Huangzhichao; address@hidden
+>
+Subject: Re: [Qemu-devel] [BUG] Migrate failes between boards with different
+>
+PMC counts
+>
+>
+On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote:
+>
+> * Daniel P. Berrange (address@hidden) wrote:
+>
+> > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote:
+>
+> > > * Zhuangyanying (address@hidden) wrote:
+>
+> > > > Hi all,
+>
+> > > >
+>
+> > > > Recently, I found migration failed when enable vPMU.
+>
+> > > >
+>
+> > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7.
+>
+> > > >
+>
+> > > > As long as enable vPMU, qemu will save / load the
+>
+> > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the
+>
+migration.
+>
+> > > > But global_ctrl generated based on cpuid(0xA), the number of
+>
+> > > > general-purpose performance monitoring counters(PMC) can vary
+>
+> > > > according to Intel SDN. The number of PMC presented to vm, does
+>
+> > > > not support configuration currently, it depend on host cpuid, and
+>
+> > > > enable
+>
+all pmc defaultly at KVM. It cause migration to fail between boards with
+>
+different PMC counts.
+>
+> > > >
+>
+> > > > The return value of cpuid (0xA) is different dur to cpu, according to
+>
+> > > > Intel
+>
+SDNï¼18-10 Vol. 3B:
+>
+> > > >
+>
+> > > > Note: The number of general-purpose performance monitoring
+>
+> > > > counters (i.e. N in Figure 18-9) can vary across processor
+>
+> > > > generations within a processor family, across processor
+>
+> > > > families, or could be different depending on the configuration
+>
+> > > > chosen at boot time in the BIOS regarding Intel Hyper Threading
+>
+> > > > Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for
+>
+processors based on the Nehalem microarchitecture; for processors based on
+>
+the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading Technology
+>
+is active and N=8 if not active).
+>
+> > > >
+>
+> > > > Also I found, N=8 if HT is not active based on the broadwellï¼,
+>
+> > > > such as CPU E7-8890 v4 @ 2.20GHz
+>
+> > > >
+>
+> > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m
+>
+> > > > 4096 -hda
+>
+> > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true
+>
+> > > > -incoming tcp::8888 Completed 100 %
+>
+> > > > qemu-system-x86_64: error: failed to set MSR 0x38f to
+>
+> > > > 0x7000000ff
+>
+> > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833:
+>
+kvm_put_msrs:
+>
+> > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
+>
+> > > > Aborted
+>
+> > > >
+>
+> > > > So make number of pmc configurable to vm ? Any better idea ?
+>
+> > >
+>
+> > > Coincidentally we hit a similar problem a few days ago with -cpu
+>
+> > > host  - it took me quite a while to spot the difference between
+>
+> > > the machines was the source had hyperthreading disabled.
+>
+> > >
+>
+> > > An option to set the number of counters makes sense to me; but I
+>
+> > > wonder how many other options we need as well.  Also, I'm not sure
+>
+> > > there's any easy way for libvirt etc to figure out how many
+>
+> > > counters a host supports - it's not in /proc/cpuinfo.
+>
+> >
+>
+> > We actually try to avoid /proc/cpuinfo whereever possible. We do
+>
+> > direct CPUID asm instructions to identify features, and prefer to
+>
+> > use /sys/devices/system/cpu if that has suitable data
+>
+> >
+>
+> > Where do the PMC counts come from originally ? CPUID or something
+>
+else ?
+>
+>
+>
+> Yes, they're bits 8..15 of CPUID leaf 0xa
+>
+>
+Ok, that's easy enough for libvirt to detect then. More a question of what
+>
+libvirt
+>
+should then do this with the info....
+>
+Do you mean to do a validation at the begining of migration? in 
+qemuMigrationBakeCookie() & qemuMigrationEatCookie(), if the PMC numbers are 
+not equal, just quit migration?
+It maybe a good enough first edition.
+But for a further better edition, maybe it's better to support Heterogeneous 
+migration I think, so we might need to make PMC number configrable, then we 
+need to modify KVM/qemu as well.
+
+Regards,
+-Zhuang Yanying
+
+* Zhuangyanying (address@hidden) wrote:
+>
+>
+>
+> -----Original Message-----
+>
+> From: Daniel P. Berrange [
+mailto:address@hidden
+>
+> Sent: Monday, April 24, 2017 6:34 PM
+>
+> To: Dr. David Alan Gilbert
+>
+> Cc: Zhuangyanying; Zhanghailiang; wangxin (U); address@hidden;
+>
+> Gonglei (Arei); Huangzhichao; address@hidden
+>
+> Subject: Re: [Qemu-devel] [BUG] Migrate failes between boards with different
+>
+> PMC counts
+>
+>
+>
+> On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote:
+>
+> > * Daniel P. Berrange (address@hidden) wrote:
+>
+> > > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote:
+>
+> > > > * Zhuangyanying (address@hidden) wrote:
+>
+> > > > > Hi all,
+>
+> > > > >
+>
+> > > > > Recently, I found migration failed when enable vPMU.
+>
+> > > > >
+>
+> > > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7.
+>
+> > > > >
+>
+> > > > > As long as enable vPMU, qemu will save / load the
+>
+> > > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the
+>
+> migration.
+>
+> > > > > But global_ctrl generated based on cpuid(0xA), the number of
+>
+> > > > > general-purpose performance monitoring counters(PMC) can vary
+>
+> > > > > according to Intel SDN. The number of PMC presented to vm, does
+>
+> > > > > not support configuration currently, it depend on host cpuid, and
+>
+> > > > > enable
+>
+> all pmc defaultly at KVM. It cause migration to fail between boards with
+>
+> different PMC counts.
+>
+> > > > >
+>
+> > > > > The return value of cpuid (0xA) is different dur to cpu, according
+>
+> > > > > to Intel
+>
+> SDNï¼18-10 Vol. 3B:
+>
+> > > > >
+>
+> > > > > Note: The number of general-purpose performance monitoring
+>
+> > > > > counters (i.e. N in Figure 18-9) can vary across processor
+>
+> > > > > generations within a processor family, across processor
+>
+> > > > > families, or could be different depending on the configuration
+>
+> > > > > chosen at boot time in the BIOS regarding Intel Hyper Threading
+>
+> > > > > Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for
+>
+> processors based on the Nehalem microarchitecture; for processors based on
+>
+> the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading
+>
+> Technology
+>
+> is active and N=8 if not active).
+>
+> > > > >
+>
+> > > > > Also I found, N=8 if HT is not active based on the broadwellï¼,
+>
+> > > > > such as CPU E7-8890 v4 @ 2.20GHz
+>
+> > > > >
+>
+> > > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m
+>
+> > > > > 4096 -hda
+>
+> > > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true
+>
+> > > > > -incoming tcp::8888 Completed 100 %
+>
+> > > > > qemu-system-x86_64: error: failed to set MSR 0x38f to
+>
+> > > > > 0x7000000ff
+>
+> > > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833:
+>
+> kvm_put_msrs:
+>
+> > > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed.
+>
+> > > > > Aborted
+>
+> > > > >
+>
+> > > > > So make number of pmc configurable to vm ? Any better idea ?
+>
+> > > >
+>
+> > > > Coincidentally we hit a similar problem a few days ago with -cpu
+>
+> > > > host  - it took me quite a while to spot the difference between
+>
+> > > > the machines was the source had hyperthreading disabled.
+>
+> > > >
+>
+> > > > An option to set the number of counters makes sense to me; but I
+>
+> > > > wonder how many other options we need as well.  Also, I'm not sure
+>
+> > > > there's any easy way for libvirt etc to figure out how many
+>
+> > > > counters a host supports - it's not in /proc/cpuinfo.
+>
+> > >
+>
+> > > We actually try to avoid /proc/cpuinfo whereever possible. We do
+>
+> > > direct CPUID asm instructions to identify features, and prefer to
+>
+> > > use /sys/devices/system/cpu if that has suitable data
+>
+> > >
+>
+> > > Where do the PMC counts come from originally ? CPUID or something
+>
+> else ?
+>
+> >
+>
+> > Yes, they're bits 8..15 of CPUID leaf 0xa
+>
+>
+>
+> Ok, that's easy enough for libvirt to detect then. More a question of what
+>
+> libvirt
+>
+> should then do this with the info....
+>
+>
+>
+>
+Do you mean to do a validation at the begining of migration? in
+>
+qemuMigrationBakeCookie() & qemuMigrationEatCookie(), if the PMC numbers are
+>
+not equal, just quit migration?
+>
+It maybe a good enough first edition.
+>
+But for a further better edition, maybe it's better to support Heterogeneous
+>
+migration I think, so we might need to make PMC number configrable, then we
+>
+need to modify KVM/qemu as well.
+Yes agreed; the only thing I wanted to check was that libvirt would have enough
+information to be able to use any feature we added to QEMU.
+
+Dave
+
+>
+Regards,
+>
+-Zhuang Yanying
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
author	Christian Krinitsin <mail@krinitsin.com>	2025-06-03 12:04:13 +0000
committer	Christian Krinitsin <mail@krinitsin.com>	2025-06-03 12:04:13 +0000
commit	256709d2eb3fd80d768a99964be5caa61effa2a0 (patch)
tree	05b2352fba70923126836a64b6a0de43902e976a /results/classifier/004/other
parent	2ab14fa96a6c5484b5e4ba8337551bb8dcc79cc5 (diff)
download	emulator-bug-study-256709d2eb3fd80d768a99964be5caa61effa2a0.tar.gz emulator-bug-study-256709d2eb3fd80d768a99964be5caa61effa2a0.zip