summary refs log tree commit diff stats
path: root/results/classifier/008/other
diff options
context:
space:
mode:
authorChristian Krinitsin <mail@krinitsin.com>2025-07-03 19:39:53 +0200
committerChristian Krinitsin <mail@krinitsin.com>2025-07-03 19:39:53 +0200
commitdee4dcba78baf712cab403d47d9db319ab7f95d6 (patch)
tree418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/008/other
parent4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff)
downloadqemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz
qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip
restructure results
Diffstat (limited to 'results/classifier/008/other')
-rw-r--r--results/classifier/008/other/02364653373
-rw-r--r--results/classifier/008/other/02572177431
-rw-r--r--results/classifier/008/other/04472277586
-rw-r--r--results/classifier/008/other/119335241135
-rw-r--r--results/classifier/008/other/1286920998
-rw-r--r--results/classifier/008/other/13442371379
-rw-r--r--results/classifier/008/other/16201167110
-rw-r--r--results/classifier/008/other/212470351331
-rw-r--r--results/classifier/008/other/2221921053
-rw-r--r--results/classifier/008/other/23270873702
-rw-r--r--results/classifier/008/other/241903402066
-rw-r--r--results/classifier/008/other/2493082643
-rw-r--r--results/classifier/008/other/25842545212
-rw-r--r--results/classifier/008/other/258928271087
-rw-r--r--results/classifier/008/other/28596630123
-rw-r--r--results/classifier/008/other/31349848164
-rw-r--r--results/classifier/008/other/32484936233
-rw-r--r--results/classifier/008/other/338021944949
-rw-r--r--results/classifier/008/other/35170175531
-rw-r--r--results/classifier/008/other/42974450439
-rw-r--r--results/classifier/008/other/43643137548
-rw-r--r--results/classifier/008/other/50773216120
-rw-r--r--results/classifier/008/other/55367348542
-rw-r--r--results/classifier/008/other/55753058303
-rw-r--r--results/classifier/008/other/5596133449
-rw-r--r--results/classifier/008/other/56309929190
-rw-r--r--results/classifier/008/other/56937788354
-rw-r--r--results/classifier/008/other/57195159325
-rw-r--r--results/classifier/008/other/57231878252
-rw-r--r--results/classifier/008/other/577565891431
-rw-r--r--results/classifier/008/other/6033945371
-rw-r--r--results/classifier/008/other/6356565359
-rw-r--r--results/classifier/008/other/657819932803
-rw-r--r--results/classifier/008/other/66743673374
-rw-r--r--results/classifier/008/other/68897003726
-rw-r--r--results/classifier/008/other/700212717458
-rw-r--r--results/classifier/008/other/702942551071
-rw-r--r--results/classifier/008/other/704164881189
-rw-r--r--results/classifier/008/other/7086826750
-rw-r--r--results/classifier/008/other/714562931496
-rw-r--r--results/classifier/008/other/7366072941
-rw-r--r--results/classifier/008/other/744669631888
-rw-r--r--results/classifier/008/other/74545755354
-rw-r--r--results/classifier/008/other/806043141490
-rw-r--r--results/classifier/008/other/80615920358
-rw-r--r--results/classifier/008/other/81775929245
-rw-r--r--results/classifier/008/other/99674399158
47 files changed, 0 insertions, 38990 deletions
diff --git a/results/classifier/008/other/02364653 b/results/classifier/008/other/02364653
deleted file mode 100644
index e6bddb4a4..000000000
--- a/results/classifier/008/other/02364653
+++ /dev/null
@@ -1,373 +0,0 @@
-other: 0.956
-graphic: 0.948
-permissions: 0.944
-semantic: 0.942
-debug: 0.940
-PID: 0.938
-performance: 0.934
-device: 0.928
-boot: 0.925
-socket: 0.924
-vnc: 0.922
-KVM: 0.911
-files: 0.908
-network: 0.881
-
-[Qemu-devel] [BUG] Inappropriate size of target_sigset_t
-
-Hello, Peter, Laurent,
-
-While working on another problem yesterday, I think I discovered a 
-long-standing bug in QEMU Linux user mode: our target_sigset_t structure is 
-eight times smaller as it should be!
-
-In this code segment from syscalls_def.h:
-
-#ifdef TARGET_MIPS
-#define TARGET_NSIG        128
-#else
-#define TARGET_NSIG        64
-#endif
-#define TARGET_NSIG_BPW    TARGET_ABI_BITS
-#define TARGET_NSIG_WORDS  (TARGET_NSIG / TARGET_NSIG_BPW)
-
-typedef struct {
-    abi_ulong sig[TARGET_NSIG_WORDS];
-} target_sigset_t;
-
-... TARGET_ABI_BITS should be replaced by eight times smaller constant (in 
-fact, semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is 
-needed is actually "a byte per signal" in target_sigset_t, and we allow "a bit 
-per signal").
-
-All this probably sounds to you like something impossible, since this code is 
-in QEMU "since forever", but I checked everything, and the bug seems real. I 
-wish you can prove me wrong.
-
-I just wanted to let you know about this, given the sensitive timing of current 
-softfreeze, and the fact that I won't be able to do more investigation on this 
-in coming weeks, since I am busy with other tasks, but perhaps you can analyze 
-and do something which you consider appropriate.
-
-Yours,
-Aleksandar
-
-Le 03/07/2019 à 21:46, Aleksandar Markovic a écrit :
->
-Hello, Peter, Laurent,
->
->
-While working on another problem yesterday, I think I discovered a
->
-long-standing bug in QEMU Linux user mode: our target_sigset_t structure is
->
-eight times smaller as it should be!
->
->
-In this code segment from syscalls_def.h:
->
->
-#ifdef TARGET_MIPS
->
-#define TARGET_NSIG      128
->
-#else
->
-#define TARGET_NSIG      64
->
-#endif
->
-#define TARGET_NSIG_BPW          TARGET_ABI_BITS
->
-#define TARGET_NSIG_WORDS  (TARGET_NSIG / TARGET_NSIG_BPW)
->
->
-typedef struct {
->
-abi_ulong sig[TARGET_NSIG_WORDS];
->
-} target_sigset_t;
->
->
-... TARGET_ABI_BITS should be replaced by eight times smaller constant (in
->
-fact, semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is
->
-needed is actually "a byte per signal" in target_sigset_t, and we allow "a
->
-bit per signal").
-TARGET_NSIG is divided by TARGET_ABI_BITS which gives you the number of
-abi_ulong words we need in target_sigset_t.
-
->
-All this probably sounds to you like something impossible, since this code is
->
-in QEMU "since forever", but I checked everything, and the bug seems real. I
->
-wish you can prove me wrong.
->
->
-I just wanted to let you know about this, given the sensitive timing of
->
-current softfreeze, and the fact that I won't be able to do more
->
-investigation on this in coming weeks, since I am busy with other tasks, but
->
-perhaps you can analyze and do something which you consider appropriate.
-If I compare with kernel, it looks good:
-
-In Linux:
-
-  arch/mips/include/uapi/asm/signal.h
-
-  #define _NSIG           128
-  #define _NSIG_BPW       (sizeof(unsigned long) * 8)
-  #define _NSIG_WORDS     (_NSIG / _NSIG_BPW)
-
-  typedef struct {
-          unsigned long sig[_NSIG_WORDS];
-  } sigset_t;
-
-_NSIG_BPW is 8 * 8 = 64 on MIPS64 or 4 * 8 = 32 on MIPS
-
-In QEMU:
-
-TARGET_NSIG_BPW is TARGET_ABI_BITS which is  TARGET_LONG_BITS which is
-64 on MIPS64 and 32 on MIPS.
-
-I think there is no problem.
-
-Thanks,
-Laurent
-
-From: Laurent Vivier <address@hidden>
->
-If I compare with kernel, it looks good:
->
-...
->
-I think there is no problem.
-Sure, thanks for such fast response - again, I am glad if you are right. 
-However, for some reason, glibc (and musl too) define sigset_t differently than 
-kernel. Please take a look. I am not sure if this is covered fine in our code.
-
-Yours,
-Aleksandar
-
->
-Thanks,
->
-Laurent
-
-On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote:
->
->
-From: Laurent Vivier <address@hidden>
->
-> If I compare with kernel, it looks good:
->
-> ...
->
-> I think there is no problem.
->
->
-Sure, thanks for such fast response - again, I am glad if you are right.
->
-However, for some reason, glibc (and musl too) define sigset_t differently
->
-than kernel. Please take a look. I am not sure if this is covered fine in our
->
-code.
-Yeah, the libc definitions of sigset_t don't match the
-kernel ones (this is for obscure historical reasons IIRC).
-We're providing implementations of the target
-syscall interface, so our target_sigset_t should be the
-target kernel's version (and the target libc's version doesn't
-matter to us). On the other hand we will be using the
-host libc version, I think, so a little caution is required
-and it's possible we have some bugs in our code.
-
-thanks
--- PMM
-
->
-From: Peter Maydell <address@hidden>
->
->
-On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote:
->
->
->
-> From: Laurent Vivier <address@hidden>
->
-> > If I compare with kernel, it looks good:
->
-> > ...
->
-> > I think there is no problem.
->
->
->
-> Sure, thanks for such fast response - again, I am glad if you are right.
->
-> However, for some reason, glibc (and musl too) define sigset_t differently
->
-> than kernel. Please take a look. I am not sure if this is covered fine in
->
-> our code.
->
->
-Yeah, the libc definitions of sigset_t don't match the
->
-kernel ones (this is for obscure historical reasons IIRC).
->
-We're providing implementations of the target
->
-syscall interface, so our target_sigset_t should be the
->
-target kernel's version (and the target libc's version doesn't
->
-matter to us). On the other hand we will be using the
->
-host libc version, I think, so a little caution is required
->
-and it's possible we have some bugs in our code.
-OK, I gather than this is not something that requires our immediate attention 
-(for 4.1), but we can analyze it later on.
-
-Thanks for response!!
-
-Sincerely,
-Aleksandar
-
->
-thanks
->
--- PMM
-
-Le 03/07/2019 à 22:28, Peter Maydell a écrit :
->
-On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote:
->
->
->
-> From: Laurent Vivier <address@hidden>
->
->> If I compare with kernel, it looks good:
->
->> ...
->
->> I think there is no problem.
->
->
->
-> Sure, thanks for such fast response - again, I am glad if you are right.
->
-> However, for some reason, glibc (and musl too) define sigset_t differently
->
-> than kernel. Please take a look. I am not sure if this is covered fine in
->
-> our code.
->
->
-Yeah, the libc definitions of sigset_t don't match the
->
-kernel ones (this is for obscure historical reasons IIRC).
->
-We're providing implementations of the target
->
-syscall interface, so our target_sigset_t should be the
->
-target kernel's version (and the target libc's version doesn't
->
-matter to us). On the other hand we will be using the
->
-host libc version, I think, so a little caution is required
->
-and it's possible we have some bugs in our code.
-It's why we need host_to_target_sigset_internal() and
-target_to_host_sigset_internal() that translates bits and bytes between
-guest kernel interface and host libc interface.
-
-void host_to_target_sigset_internal(target_sigset_t *d,
-                                    const sigset_t *s)
-{
-    int i;
-    target_sigemptyset(d);
-    for (i = 1; i <= TARGET_NSIG; i++) {
-        if (sigismember(s, i)) {
-            target_sigaddset(d, host_to_target_signal(i));
-        }
-    }
-}
-
-void target_to_host_sigset_internal(sigset_t *d,
-                                    const target_sigset_t *s)
-{
-    int i;
-    sigemptyset(d);
-    for (i = 1; i <= TARGET_NSIG; i++) {
-        if (target_sigismember(s, i)) {
-            sigaddset(d, target_to_host_signal(i));
-        }
-    }
-}
-
-Thanks,
-Laurent
-
-Hi Aleksandar,
-
-On Wed, Jul 3, 2019 at 12:48 PM Aleksandar Markovic
-<address@hidden> wrote:
->
-#define TARGET_NSIG_BPW    TARGET_ABI_BITS
->
-#define TARGET_NSIG_WORDS  (TARGET_NSIG / TARGET_NSIG_BPW)
->
->
-typedef struct {
->
-abi_ulong sig[TARGET_NSIG_WORDS];
->
-} target_sigset_t;
->
->
-... TARGET_ABI_BITS should be replaced by eight times smaller constant (in
->
-fact,
->
-semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is needed
->
-is actually "a byte per signal" in target_sigset_t, and we allow "a bit per
->
-signal").
-Why do we need a byte per target signal, if the functions in linux-user/signal.c
-operate with bits?
-
--- 
-Thanks.
--- Max
-
->
-Why do we need a byte per target signal, if the functions in
->
-linux-user/signal.c
->
-operate with bits?
-Max,
-
-I did not base my findings on code analysis, but on dumping size/offsets of 
-elements of some structures, as they are emulated in QEMU, and in real systems. 
-So, I can't really answer your question.
-
-Yours,
-Aleksandar
-
->
---
->
-Thanks.
->
--- Max
-
diff --git a/results/classifier/008/other/02572177 b/results/classifier/008/other/02572177
deleted file mode 100644
index 96f1989e9..000000000
--- a/results/classifier/008/other/02572177
+++ /dev/null
@@ -1,431 +0,0 @@
-other: 0.869
-permissions: 0.812
-device: 0.791
-performance: 0.781
-semantic: 0.770
-debug: 0.756
-graphic: 0.747
-socket: 0.742
-PID: 0.731
-network: 0.708
-vnc: 0.706
-KVM: 0.669
-boot: 0.658
-files: 0.640
-
-[Qemu-devel] 答复: Re:  [BUG]COLO failover hang
-
-hi.
-
-
-I test the git qemu master have the same problem.
-
-
-(gdb) bt
-
-
-#0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, niov=1, 
-fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-
-
-#1  0x00007f658e4aa0c2 in qio_channel_read (address@hidden, address@hidden "", 
-address@hidden, address@hidden) at io/channel.c:114
-
-
-#2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>, 
-buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at 
-migration/qemu-file-channel.c:78
-
-
-#3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
-migration/qemu-file.c:295
-
-
-#4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, address@hidden) at 
-migration/qemu-file.c:555
-
-
-#5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at 
-migration/qemu-file.c:568
-
-
-#6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at 
-migration/qemu-file.c:648
-
-
-#7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
-address@hidden) at migration/colo.c:244
-
-
-#8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized out>, 
-address@hidden, address@hidden)
-
-
-    at migration/colo.c:264
-
-
-#9  0x00007f658e3e740e in colo_process_incoming_thread (opaque=0x7f658eb30360 
-<mis_current.31286>) at migration/colo.c:577
-
-
-#10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-
-
-#11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-
-
-(gdb) p ioc->name
-
-
-$2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-
-
-(gdb) p ioc->features          Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-
-
-$3 = 0
-
-
-
-
-
-(gdb) bt
-
-
-#0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, condition=G_IO_IN, 
-opaque=0x7fdcceeafa90) at migration/socket.c:137
-
-
-#1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at 
-gmain.c:3054
-
-
-#2  g_main_context_dispatch (context=<optimized out>, address@hidden) at 
-gmain.c:3630
-
-
-#3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-
-
-#4  os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:258
-
-
-#5  main_loop_wait (address@hidden) at util/main-loop.c:506
-
-
-#6  0x00007fdccb526187 in main_loop () at vl.c:1898
-
-
-#7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at 
-vl.c:4709
-
-
-(gdb) p ioc->features
-
-
-$1 = 6
-
-
-(gdb) p ioc->name
-
-
-$2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-
-
-
-
-
-May be socket_accept_incoming_migration should call 
-qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-
-
-
-
-
-thank you.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-原始邮件
-
-
-
-发件人: address@hidden
-收件人:王广10165992 address@hidden
-抄送人: address@hidden address@hidden
-日 期 :2017年03月16日 14:46
-主 题 :Re: [Qemu-devel] COLO failover hang
-
-
-
-
-
-
-
-On 03/15/2017 05:06 PM, wangguang wrote:
->   am testing QEMU COLO feature described here [QEMU
-> Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
->
-> When the Primary Node panic,the Secondary Node qemu hang.
-> hang at recvmsg in qio_channel_socket_readv.
-> And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-> "x-colo-lost-heartbeat" } in Secondary VM's
-> monitor,the  Secondary Node qemu still hang at recvmsg .
->
-> I found that the colo in qemu is not complete yet.
-> Do the colo have any plan for development?
-
-Yes, We are developing. You can see some of patch we pushing.
-
-> Has anyone ever run it successfully? Any help is appreciated!
-
-In our internal version can run it successfully,
-The failover detail you can ask Zhanghailiang for help.
-Next time if you have some question about COLO,
-please cc me and zhanghailiang address@hidden
-
-
-Thanks
-Zhang Chen
-
-
->
->
->
-> centos7.2+qemu2.7.50
-> (gdb) bt
-> #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-> #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
-> iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) at
-> io/channel-socket.c:497
-> #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-> address@hidden "", address@hidden,
-> address@hidden) at io/channel.c:97
-> #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
-> buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
-> migration/qemu-file-channel.c:78
-> #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-> migration/qemu-file.c:257
-> #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-> address@hidden) at migration/qemu-file.c:510
-> #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-> migration/qemu-file.c:523
-> #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-> migration/qemu-file.c:603
-> #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-> address@hidden) at migration/colo.c:215
-> #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-> checkpoint_request=<synthetic pointer>, f=<optimized out>) at
-> migration/colo.c:546
-> #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-> migration/colo.c:649
-> #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-> #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
->
->
->
->
->
-> --
-> View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-> Sent from the Developer mailing list archive at Nabble.com.
->
->
->
->
-
--- 
-Thanks
-Zhang Chen
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
-hi.
-
-I test the git qemu master have the same problem.
-
-(gdb) bt
-#0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-#1  0x00007f658e4aa0c2 in qio_channel_read
-(address@hidden, address@hidden "",
-address@hidden, address@hidden) at io/channel.c:114
-#2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
-buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
-migration/qemu-file-channel.c:78
-#3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-migration/qemu-file.c:295
-#4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-address@hidden) at migration/qemu-file.c:555
-#5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-migration/qemu-file.c:568
-#6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-migration/qemu-file.c:648
-#7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-address@hidden) at migration/colo.c:244
-#8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
-out>, address@hidden,
-address@hidden)
-at migration/colo.c:264
-#9  0x00007f658e3e740e in colo_process_incoming_thread
-(opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
-#10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-
-#11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-
-(gdb) p ioc->name
-
-$2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-
-(gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-
-$3 = 0
-
-
-(gdb) bt
-#0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-#1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
-gmain.c:3054
-#2  g_main_context_dispatch (context=<optimized out>,
-address@hidden) at gmain.c:3630
-#3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-#4  os_host_main_loop_wait (timeout=<optimized out>) at
-util/main-loop.c:258
-#5  main_loop_wait (address@hidden) at
-util/main-loop.c:506
-#6  0x00007fdccb526187 in main_loop () at vl.c:1898
-#7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
-out>) at vl.c:4709
-(gdb) p ioc->features
-
-$1 = 6
-
-(gdb) p ioc->name
-
-$2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-May be socket_accept_incoming_migration should
-call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-thank you.
-
-
-
-
-
-原始邮件
-address@hidden;
-*收件人:*王广10165992;address@hidden;
-address@hidden;address@hidden;
-*日 期 :*2017年03月16日 14:46
-*主 题 :**Re: [Qemu-devel] COLO failover hang*
-
-
-
-
-On 03/15/2017 05:06 PM, wangguang wrote:
->   am testing QEMU COLO feature described here [QEMU
-> Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
->
-> When the Primary Node panic,the Secondary Node qemu hang.
-> hang at recvmsg in qio_channel_socket_readv.
-> And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-> "x-colo-lost-heartbeat" } in Secondary VM's
-> monitor,the  Secondary Node qemu still hang at recvmsg .
->
-> I found that the colo in qemu is not complete yet.
-> Do the colo have any plan for development?
-
-Yes, We are developing. You can see some of patch we pushing.
-
-> Has anyone ever run it successfully? Any help is appreciated!
-
-In our internal version can run it successfully,
-The failover detail you can ask Zhanghailiang for help.
-Next time if you have some question about COLO,
-please cc me and zhanghailiang address@hidden
-
-
-Thanks
-Zhang Chen
-
-
->
->
->
-> centos7.2+qemu2.7.50
-> (gdb) bt
-> #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-> #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
-> iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) at
-> io/channel-socket.c:497
-> #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-> address@hidden "", address@hidden,
-> address@hidden) at io/channel.c:97
-> #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
-> buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
-> migration/qemu-file-channel.c:78
-> #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-> migration/qemu-file.c:257
-> #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-> address@hidden) at migration/qemu-file.c:510
-> #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-> migration/qemu-file.c:523
-> #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-> migration/qemu-file.c:603
-> #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-> address@hidden) at migration/colo.c:215
-> #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-> checkpoint_request=<synthetic pointer>, f=<optimized out>) at
-> migration/colo.c:546
-> #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-> migration/colo.c:649
-> #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-> #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
->
->
->
->
->
-> --
-> View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-> Sent from the Developer mailing list archive at Nabble.com.
->
->
->
->
-
---
-Thanks
-Zhang Chen
---
-Thanks
-Zhang Chen
-
diff --git a/results/classifier/008/other/04472277 b/results/classifier/008/other/04472277
deleted file mode 100644
index 9cc865b2f..000000000
--- a/results/classifier/008/other/04472277
+++ /dev/null
@@ -1,586 +0,0 @@
-KVM: 0.890
-permissions: 0.851
-device: 0.849
-debug: 0.849
-network: 0.847
-graphic: 0.846
-other: 0.846
-performance: 0.841
-boot: 0.831
-vnc: 0.828
-PID: 0.826
-socket: 0.824
-semantic: 0.815
-files: 0.790
-
-[BUG][KVM_SET_USER_MEMORY_REGION] KVM_SET_USER_MEMORY_REGION failed
-
-Hi all,
-I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR.
-Is there any one know this?
-The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log
-```
-2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument
-kvm_set_phys_mem: error registering slot: Invalid argument
-2023-03-14 10:09:18.198+0000: shutting down, reason=crashed
-```
-The xml file
-```
-root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml
-<!--
-WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
-OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
-  virsh edit instance-0000000e
-or other application using the libvirt API.
--->
-<domain type='kvm'>
-  <name>instance-0000000e</name>
-  <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid>
-  <metadata>
-    <nova:instance xmlns:nova="
-http://openstack.org/xmlns/libvirt/nova/1.1
-">
-      <nova:package version="25.1.0"/>
-      <nova:name>provider-instance</nova:name>
-      <nova:creationTime>2023-03-14 10:09:13</nova:creationTime>
-      <nova:flavor name="cirros-os-dpu-test-1">
-        <nova:memory>64</nova:memory>
-        <nova:disk>1</nova:disk>
-        <nova:swap>0</nova:swap>
-        <nova:ephemeral>0</nova:ephemeral>
-        <nova:vcpus>1</nova:vcpus>
-      </nova:flavor>
-      <nova:owner>
-        <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user>
-        <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project>
-      </nova:owner>
-      <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/>
-      <nova:ports>
-        <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340">
-          <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/>
-        </nova:port>
-      </nova:ports>
-    </nova:instance>
-  </metadata>
-  <memory unit='KiB'>65536</memory>
-  <currentMemory unit='KiB'>65536</currentMemory>
-  <vcpu placement='static'>1</vcpu>
-  <sysinfo type='smbios'>
-    <system>
-      <entry name='manufacturer'>OpenStack Foundation</entry>
-      <entry name='product'>OpenStack Nova</entry>
-      <entry name='version'>25.1.0</entry>
-      <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry>
-      <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry>
-      <entry name='family'>Virtual Machine</entry>
-    </system>
-  </sysinfo>
-  <os>
-    <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type>
-    <boot dev='hd'/>
-    <smbios mode='sysinfo'/>
-  </os>
-  <features>
-    <acpi/>
-    <apic/>
-    <vmcoreinfo state='on'/>
-  </features>
-  <cpu mode='host-model' check='partial'>
-    <topology sockets='1' dies='1' cores='1' threads='1'/>
-  </cpu>
-  <clock offset='utc'>
-    <timer name='pit' tickpolicy='delay'/>
-    <timer name='rtc' tickpolicy='catchup'/>
-    <timer name='hpet' present='no'/>
-  </clock>
-  <on_poweroff>destroy</on_poweroff>
-  <on_reboot>restart</on_reboot>
-  <on_crash>destroy</on_crash>
-  <devices>
-    <emulator>/usr/bin/qemu-system-x86_64</emulator>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='none'/>
-      <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/>
-      <target dev='vda' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
-    </disk>
-    <controller type='usb' index='0' model='piix3-uhci'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
-    </controller>
-    <controller type='pci' index='0' model='pci-root'/>
-    <interface type='hostdev' managed='yes'>
-      <mac address='fa:16:3e:aa:d9:23'/>
-      <source>
-        <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/>
-      </source>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
-    </interface>
-    <serial type='pty'>
-      <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/>
-      <target type='isa-serial' port='0'>
-        <model name='isa-serial'/>
-      </target>
-    </serial>
-    <console type='pty'>
-      <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/>
-      <target type='serial' port='0'/>
-    </console>
-    <input type='tablet' bus='usb'>
-      <address type='usb' bus='0' port='1'/>
-    </input>
-    <input type='mouse' bus='ps2'/>
-    <input type='keyboard' bus='ps2'/>
-    <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'>
-      <listen type='address' address='0.0.0.0'/>
-    </graphics>
-    <audio id='1' type='none'/>
-    <video>
-      <model type='virtio' heads='1' primary='yes'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
-    </video>
-    <hostdev mode='subsystem' type='pci' managed='yes'>
-      <source>
-        <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/>
-      </source>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
-    </hostdev>
-    <memballoon model='virtio'>
-      <stats period='10'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
-    </memballoon>
-    <rng model='virtio'>
-      <backend model='random'>/dev/urandom</backend>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
-    </rng>
-  </devices>
-</domain>
-```
-----
-Simon Jones
-
-This is happened in ubuntu22.04.
-QEMU is install by apt like this:
-apt install -y qemu qemu-kvm qemu-system
-and QEMU version is 6.2.0
-----
-Simon Jones
-Simon Jones <
-batmanustc@gmail.com
-> 于2023年3月21日周二 08:40写道:
-Hi all,
-I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR.
-Is there any one know this?
-The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log
-```
-2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument
-kvm_set_phys_mem: error registering slot: Invalid argument
-2023-03-14 10:09:18.198+0000: shutting down, reason=crashed
-```
-The xml file
-```
-root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml
-<!--
-WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
-OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
-  virsh edit instance-0000000e
-or other application using the libvirt API.
--->
-<domain type='kvm'>
-  <name>instance-0000000e</name>
-  <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid>
-  <metadata>
-    <nova:instance xmlns:nova="
-http://openstack.org/xmlns/libvirt/nova/1.1
-">
-      <nova:package version="25.1.0"/>
-      <nova:name>provider-instance</nova:name>
-      <nova:creationTime>2023-03-14 10:09:13</nova:creationTime>
-      <nova:flavor name="cirros-os-dpu-test-1">
-        <nova:memory>64</nova:memory>
-        <nova:disk>1</nova:disk>
-        <nova:swap>0</nova:swap>
-        <nova:ephemeral>0</nova:ephemeral>
-        <nova:vcpus>1</nova:vcpus>
-      </nova:flavor>
-      <nova:owner>
-        <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user>
-        <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project>
-      </nova:owner>
-      <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/>
-      <nova:ports>
-        <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340">
-          <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/>
-        </nova:port>
-      </nova:ports>
-    </nova:instance>
-  </metadata>
-  <memory unit='KiB'>65536</memory>
-  <currentMemory unit='KiB'>65536</currentMemory>
-  <vcpu placement='static'>1</vcpu>
-  <sysinfo type='smbios'>
-    <system>
-      <entry name='manufacturer'>OpenStack Foundation</entry>
-      <entry name='product'>OpenStack Nova</entry>
-      <entry name='version'>25.1.0</entry>
-      <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry>
-      <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry>
-      <entry name='family'>Virtual Machine</entry>
-    </system>
-  </sysinfo>
-  <os>
-    <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type>
-    <boot dev='hd'/>
-    <smbios mode='sysinfo'/>
-  </os>
-  <features>
-    <acpi/>
-    <apic/>
-    <vmcoreinfo state='on'/>
-  </features>
-  <cpu mode='host-model' check='partial'>
-    <topology sockets='1' dies='1' cores='1' threads='1'/>
-  </cpu>
-  <clock offset='utc'>
-    <timer name='pit' tickpolicy='delay'/>
-    <timer name='rtc' tickpolicy='catchup'/>
-    <timer name='hpet' present='no'/>
-  </clock>
-  <on_poweroff>destroy</on_poweroff>
-  <on_reboot>restart</on_reboot>
-  <on_crash>destroy</on_crash>
-  <devices>
-    <emulator>/usr/bin/qemu-system-x86_64</emulator>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='none'/>
-      <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/>
-      <target dev='vda' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
-    </disk>
-    <controller type='usb' index='0' model='piix3-uhci'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
-    </controller>
-    <controller type='pci' index='0' model='pci-root'/>
-    <interface type='hostdev' managed='yes'>
-      <mac address='fa:16:3e:aa:d9:23'/>
-      <source>
-        <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/>
-      </source>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
-    </interface>
-    <serial type='pty'>
-      <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/>
-      <target type='isa-serial' port='0'>
-        <model name='isa-serial'/>
-      </target>
-    </serial>
-    <console type='pty'>
-      <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/>
-      <target type='serial' port='0'/>
-    </console>
-    <input type='tablet' bus='usb'>
-      <address type='usb' bus='0' port='1'/>
-    </input>
-    <input type='mouse' bus='ps2'/>
-    <input type='keyboard' bus='ps2'/>
-    <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'>
-      <listen type='address' address='0.0.0.0'/>
-    </graphics>
-    <audio id='1' type='none'/>
-    <video>
-      <model type='virtio' heads='1' primary='yes'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
-    </video>
-    <hostdev mode='subsystem' type='pci' managed='yes'>
-      <source>
-        <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/>
-      </source>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
-    </hostdev>
-    <memballoon model='virtio'>
-      <stats period='10'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
-    </memballoon>
-    <rng model='virtio'>
-      <backend model='random'>/dev/urandom</backend>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
-    </rng>
-  </devices>
-</domain>
-```
-----
-Simon Jones
-
-This is full ERROR log
-2023-03-23 08:00:52.362+0000: starting up libvirt version: 8.0.0, package: 1ubuntu7.4 (Christian Ehrhardt <
-christian.ehrhardt@canonical.com
-> Tue, 22 Nov 2022 15:59:28 +0100), qemu version: 6.2.0Debian 1:6.2+dfsg-2ubuntu6.6, kernel: 5.19.0-35-generic, hostname: c1c2
-LC_ALL=C \
-PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
-HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e \
-XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.local/share \
-XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.cache \
-XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.config \
-/usr/bin/qemu-system-x86_64 \
--name guest=instance-0000000e,debug-threads=on \
--S \
--object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-4-instance-0000000e/master-key.aes"}' \
--machine pc-i440fx-6.2,usb=off,dump-guest-core=off,memory-backend=pc.ram \
--accel kvm \
--cpu Cooperlake,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,sha-ni=on,umip=on,waitpkg=on,gfni=on,vaes=on,vpclmulqdq=on,rdpid=on,movdiri=on,movdir64b=on,fsrm=on,md-clear=on,avx-vnni=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,hle=off,rtm=off,avx512f=off,avx512dq=off,avx512cd=off,avx512bw=off,avx512vl=off,avx512vnni=off,avx512-bf16=off,taa-no=off \
--m 64 \
--object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":67108864}' \
--overcommit mem-lock=off \
--smp 1,sockets=1,dies=1,cores=1,threads=1 \
--uuid ff91d2dc-69a1-43ef-abde-c9e4e9a0305b \
--smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=25.1.0,serial=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,uuid=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,family=Virtual Machine' \
--no-user-config \
--nodefaults \
--chardev socket,id=charmonitor,fd=33,server=on,wait=off \
--mon chardev=charmonitor,id=monitor,mode=control \
--rtc base=utc,driftfix=slew \
--global kvm-pit.lost_tick_policy=delay \
--no-hpet \
--no-shutdown \
--boot strict=on \
--device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
--blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/8b58db82a488248e7c5e769599954adaa47a5314","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
--blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \
--blockdev '{"driver":"file","filename":"/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
--blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \
--device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
--add-fd set=1,fd=34 \
--chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on \
--device isa-serial,chardev=charserial0,id=serial0 \
--device usb-tablet,id=input0,bus=usb.0,port=1 \
--audiodev '{"id":"audio1","driver":"none"}' \
--vnc
-0.0.0.0:0
-,audiodev=audio1 \
--device virtio-vga,id=video0,max_outputs=1,bus=pci.0,addr=0x2 \
--device vfio-pci,host=0000:01:00.5,id=hostdev0,bus=pci.0,addr=0x4 \
--device vfio-pci,host=0000:01:00.6,id=hostdev1,bus=pci.0,addr=0x5 \
--device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \
--object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
--device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 \
--device vmcoreinfo \
--sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
--msg timestamp=on
-char device redirected to /dev/pts/3 (label charserial0)
-2023-03-23T08:00:53.728550Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument
-kvm_set_phys_mem: error registering slot: Invalid argument
-2023-03-23 08:00:54.201+0000: shutting down, reason=crashed
-2023-03-23 08:54:43.468+0000: starting up libvirt version: 8.0.0, package: 1ubuntu7.4 (Christian Ehrhardt <
-christian.ehrhardt@canonical.com
-> Tue, 22 Nov 2022 15:59:28 +0100), qemu version: 6.2.0Debian 1:6.2+dfsg-2ubuntu6.6, kernel: 5.19.0-35-generic, hostname: c1c2
-LC_ALL=C \
-PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
-HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e \
-XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.local/share \
-XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.cache \
-XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.config \
-/usr/bin/qemu-system-x86_64 \
--name guest=instance-0000000e,debug-threads=on \
--S \
--object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-5-instance-0000000e/master-key.aes"}' \
--machine pc-i440fx-6.2,usb=off,dump-guest-core=off,memory-backend=pc.ram \
--accel kvm \
--cpu Cooperlake,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,sha-ni=on,umip=on,waitpkg=on,gfni=on,vaes=on,vpclmulqdq=on,rdpid=on,movdiri=on,movdir64b=on,fsrm=on,md-clear=on,avx-vnni=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,hle=off,rtm=off,avx512f=off,avx512dq=off,avx512cd=off,avx512bw=off,avx512vl=off,avx512vnni=off,avx512-bf16=off,taa-no=off \
--m 64 \
--object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":67108864}' \
--overcommit mem-lock=off \
--smp 1,sockets=1,dies=1,cores=1,threads=1 \
--uuid ff91d2dc-69a1-43ef-abde-c9e4e9a0305b \
--smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=25.1.0,serial=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,uuid=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,family=Virtual Machine' \
--no-user-config \
--nodefaults \
--chardev socket,id=charmonitor,fd=33,server=on,wait=off \
--mon chardev=charmonitor,id=monitor,mode=control \
--rtc base=utc,driftfix=slew \
--global kvm-pit.lost_tick_policy=delay \
--no-hpet \
--no-shutdown \
--boot strict=on \
--device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
--blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/8b58db82a488248e7c5e769599954adaa47a5314","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
--blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \
--blockdev '{"driver":"file","filename":"/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
--blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \
--device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
--add-fd set=1,fd=34 \
--chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on \
--device isa-serial,chardev=charserial0,id=serial0 \
--device usb-tablet,id=input0,bus=usb.0,port=1 \
--audiodev '{"id":"audio1","driver":"none"}' \
--vnc
-0.0.0.0:0
-,audiodev=audio1 \
--device virtio-vga,id=video0,max_outputs=1,bus=pci.0,addr=0x2 \
--device vfio-pci,host=0000:01:00.5,id=hostdev0,bus=pci.0,addr=0x4 \
--device vfio-pci,host=0000:01:00.6,id=hostdev1,bus=pci.0,addr=0x5 \
--device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \
--object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
--device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 \
--device vmcoreinfo \
--sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
--msg timestamp=on
-char device redirected to /dev/pts/3 (label charserial0)
-2023-03-23T08:54:44.755039Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument
-kvm_set_phys_mem: error registering slot: Invalid argument
-2023-03-23 08:54:45.230+0000: shutting down, reason=crashed
-----
-Simon Jones
-Simon Jones <
-batmanustc@gmail.com
-> 于2023年3月23日周四 05:49写道:
-This is happened in ubuntu22.04.
-QEMU is install by apt like this:
-apt install -y qemu qemu-kvm qemu-system
-and QEMU version is 6.2.0
-----
-Simon Jones
-Simon Jones <
-batmanustc@gmail.com
-> 于2023年3月21日周二 08:40写道:
-Hi all,
-I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR.
-Is there any one know this?
-The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log
-```
-2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument
-kvm_set_phys_mem: error registering slot: Invalid argument
-2023-03-14 10:09:18.198+0000: shutting down, reason=crashed
-```
-The xml file
-```
-root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml
-<!--
-WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
-OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
-  virsh edit instance-0000000e
-or other application using the libvirt API.
--->
-<domain type='kvm'>
-  <name>instance-0000000e</name>
-  <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid>
-  <metadata>
-    <nova:instance xmlns:nova="
-http://openstack.org/xmlns/libvirt/nova/1.1
-">
-      <nova:package version="25.1.0"/>
-      <nova:name>provider-instance</nova:name>
-      <nova:creationTime>2023-03-14 10:09:13</nova:creationTime>
-      <nova:flavor name="cirros-os-dpu-test-1">
-        <nova:memory>64</nova:memory>
-        <nova:disk>1</nova:disk>
-        <nova:swap>0</nova:swap>
-        <nova:ephemeral>0</nova:ephemeral>
-        <nova:vcpus>1</nova:vcpus>
-      </nova:flavor>
-      <nova:owner>
-        <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user>
-        <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project>
-      </nova:owner>
-      <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/>
-      <nova:ports>
-        <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340">
-          <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/>
-        </nova:port>
-      </nova:ports>
-    </nova:instance>
-  </metadata>
-  <memory unit='KiB'>65536</memory>
-  <currentMemory unit='KiB'>65536</currentMemory>
-  <vcpu placement='static'>1</vcpu>
-  <sysinfo type='smbios'>
-    <system>
-      <entry name='manufacturer'>OpenStack Foundation</entry>
-      <entry name='product'>OpenStack Nova</entry>
-      <entry name='version'>25.1.0</entry>
-      <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry>
-      <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry>
-      <entry name='family'>Virtual Machine</entry>
-    </system>
-  </sysinfo>
-  <os>
-    <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type>
-    <boot dev='hd'/>
-    <smbios mode='sysinfo'/>
-  </os>
-  <features>
-    <acpi/>
-    <apic/>
-    <vmcoreinfo state='on'/>
-  </features>
-  <cpu mode='host-model' check='partial'>
-    <topology sockets='1' dies='1' cores='1' threads='1'/>
-  </cpu>
-  <clock offset='utc'>
-    <timer name='pit' tickpolicy='delay'/>
-    <timer name='rtc' tickpolicy='catchup'/>
-    <timer name='hpet' present='no'/>
-  </clock>
-  <on_poweroff>destroy</on_poweroff>
-  <on_reboot>restart</on_reboot>
-  <on_crash>destroy</on_crash>
-  <devices>
-    <emulator>/usr/bin/qemu-system-x86_64</emulator>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='none'/>
-      <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/>
-      <target dev='vda' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
-    </disk>
-    <controller type='usb' index='0' model='piix3-uhci'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
-    </controller>
-    <controller type='pci' index='0' model='pci-root'/>
-    <interface type='hostdev' managed='yes'>
-      <mac address='fa:16:3e:aa:d9:23'/>
-      <source>
-        <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/>
-      </source>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
-    </interface>
-    <serial type='pty'>
-      <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/>
-      <target type='isa-serial' port='0'>
-        <model name='isa-serial'/>
-      </target>
-    </serial>
-    <console type='pty'>
-      <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/>
-      <target type='serial' port='0'/>
-    </console>
-    <input type='tablet' bus='usb'>
-      <address type='usb' bus='0' port='1'/>
-    </input>
-    <input type='mouse' bus='ps2'/>
-    <input type='keyboard' bus='ps2'/>
-    <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'>
-      <listen type='address' address='0.0.0.0'/>
-    </graphics>
-    <audio id='1' type='none'/>
-    <video>
-      <model type='virtio' heads='1' primary='yes'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
-    </video>
-    <hostdev mode='subsystem' type='pci' managed='yes'>
-      <source>
-        <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/>
-      </source>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
-    </hostdev>
-    <memballoon model='virtio'>
-      <stats period='10'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
-    </memballoon>
-    <rng model='virtio'>
-      <backend model='random'>/dev/urandom</backend>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
-    </rng>
-  </devices>
-</domain>
-```
-----
-Simon Jones
-
diff --git a/results/classifier/008/other/11933524 b/results/classifier/008/other/11933524
deleted file mode 100644
index c8313c023..000000000
--- a/results/classifier/008/other/11933524
+++ /dev/null
@@ -1,1135 +0,0 @@
-PID: 0.791
-other: 0.771
-device: 0.762
-permissions: 0.752
-debug: 0.752
-socket: 0.751
-boot: 0.743
-graphic: 0.737
-performance: 0.736
-vnc: 0.695
-KVM: 0.689
-semantic: 0.673
-network: 0.662
-files: 0.660
-
-[BUG] hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios
-
-Early-boot e820 records will be inserted by the bios/efi/early boot
-software and be reported to the kernel via insert_resource.  Later, when
-CXL drivers iterate through the regions again, they will insert another
-resource and make the RESERVED memory area a child.
-
-This RESERVED memory area causes the memory region to become unusable,
-and as a result attempting to create memory regions with
-
-    `cxl create-region ...`
-
-Will fail due to the RESERVED area intersecting with the CXL window.
-
-
-During boot the following traceback is observed:
-
-0xffffffff81101650 in insert_resource_expand_to_fit ()
-0xffffffff83d964c5 in e820__reserve_resources_late ()
-0xffffffff83e03210 in pcibios_resource_survey ()
-0xffffffff83e04f4a in pcibios_init ()
-
-Which produces a call to reserve the CFMWS area:
-
-(gdb) p *new
-$54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved",
-       flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0,
-       child = 0x0}
-
-Later the Kernel parses ACPI tables and reserves the exact same area as
-the CXL Fixed Memory Window.  The use of `insert_resource_conflict`
-retains the RESERVED region and makes it a child of the new region.
-
-0xffffffff811016a4 in insert_resource_conflict ()
-                      insert_resource ()
-0xffffffff81a81389 in cxl_parse_cfmws ()
-0xffffffff818c4a81 in call_handler ()
-                      acpi_parse_entries_array ()
-
-(gdb) p/x *new
-$59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0",
-       flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0,
-       child = 0x0}
-
-This produces the following output in /proc/iomem:
-
-590000000-68fffffff : CXL Window 0
-  590000000-68fffffff : Reserved
-
-This reserved area causes `get_free_mem_region()` to fail due to a check
-against `__region_intersects()`.  Due to this reserved area, the
-intersect check will only ever return REGION_INTERSECTS, which causes
-`cxl create-region` to always fail.
-
-Signed-off-by: Gregory Price <gregory.price@memverge.com>
----
- hw/i386/pc.c | 2 --
- 1 file changed, 2 deletions(-)
-
-diff --git a/hw/i386/pc.c b/hw/i386/pc.c
-index 566accf7e6..5bf5465a21 100644
---- a/hw/i386/pc.c
-+++ b/hw/i386/pc.c
-@@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms,
-         hwaddr cxl_size = MiB;
- 
-         cxl_base = pc_get_cxl_range_start(pcms);
--        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
-         memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
-         memory_region_add_subregion(system_memory, cxl_base, mr);
-         cxl_resv_end = cxl_base + cxl_size;
-@@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms,
-                 memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
-                                       "cxl-fixed-memory-region", fw->size);
-                 memory_region_add_subregion(system_memory, fw->base, &fw->mr);
--                e820_add_entry(fw->base, fw->size, E820_RESERVED);
-                 cxl_fmw_base += fw->size;
-                 cxl_resv_end = cxl_fmw_base;
-             }
--- 
-2.37.3
-
-Early-boot e820 records will be inserted by the bios/efi/early boot
-software and be reported to the kernel via insert_resource.  Later, when
-CXL drivers iterate through the regions again, they will insert another
-resource and make the RESERVED memory area a child.
-
-This RESERVED memory area causes the memory region to become unusable,
-and as a result attempting to create memory regions with
-
-     `cxl create-region ...`
-
-Will fail due to the RESERVED area intersecting with the CXL window.
-
-
-During boot the following traceback is observed:
-
-0xffffffff81101650 in insert_resource_expand_to_fit ()
-0xffffffff83d964c5 in e820__reserve_resources_late ()
-0xffffffff83e03210 in pcibios_resource_survey ()
-0xffffffff83e04f4a in pcibios_init ()
-
-Which produces a call to reserve the CFMWS area:
-
-(gdb) p *new
-$54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved",
-        flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0,
-        child = 0x0}
-
-Later the Kernel parses ACPI tables and reserves the exact same area as
-the CXL Fixed Memory Window.  The use of `insert_resource_conflict`
-retains the RESERVED region and makes it a child of the new region.
-
-0xffffffff811016a4 in insert_resource_conflict ()
-                       insert_resource ()
-0xffffffff81a81389 in cxl_parse_cfmws ()
-0xffffffff818c4a81 in call_handler ()
-                       acpi_parse_entries_array ()
-
-(gdb) p/x *new
-$59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0",
-        flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0,
-        child = 0x0}
-
-This produces the following output in /proc/iomem:
-
-590000000-68fffffff : CXL Window 0
-   590000000-68fffffff : Reserved
-
-This reserved area causes `get_free_mem_region()` to fail due to a check
-against `__region_intersects()`.  Due to this reserved area, the
-intersect check will only ever return REGION_INTERSECTS, which causes
-`cxl create-region` to always fail.
-
-Signed-off-by: Gregory Price <gregory.price@memverge.com>
----
-  hw/i386/pc.c | 2 --
-  1 file changed, 2 deletions(-)
-
-diff --git a/hw/i386/pc.c b/hw/i386/pc.c
-index 566accf7e6..5bf5465a21 100644
---- a/hw/i386/pc.c
-+++ b/hw/i386/pc.c
-@@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms,
-          hwaddr cxl_size = MiB;
-cxl_base = pc_get_cxl_range_start(pcms);
--        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
-          memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
-          memory_region_add_subregion(system_memory, cxl_base, mr);
-          cxl_resv_end = cxl_base + cxl_size;
-@@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms,
-                  memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, 
-fw,
-                                        "cxl-fixed-memory-region", fw->size);
-                  memory_region_add_subregion(system_memory, fw->base, &fw->mr);
-Or will this be subregion of cxl_base?
-
-Thanks,
-Pankaj
--                e820_add_entry(fw->base, fw->size, E820_RESERVED);
-                  cxl_fmw_base += fw->size;
-                  cxl_resv_end = cxl_fmw_base;
-              }
-
->
-> -        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
->
->           memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
->
->           memory_region_add_subregion(system_memory, cxl_base, mr);
->
->           cxl_resv_end = cxl_base + cxl_size;
->
-> @@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms,
->
->                   memory_region_init_io(&fw->mr, OBJECT(machine),
->
-> &cfmws_ops, fw,
->
->                                         "cxl-fixed-memory-region",
->
-> fw->size);
->
->                   memory_region_add_subregion(system_memory, fw->base,
->
-> &fw->mr);
->
->
-Or will this be subregion of cxl_base?
->
->
-Thanks,
->
-Pankaj
-The memory region backing this memory area still has to be initialized
-and added in the QEMU system, but it will now be initialized for use by
-linux after PCI/ACPI setup occurs and the CXL driver discovers it via
-CDAT.
-
-It's also still possible to assign this area a static memory region at
-bool by setting up the SRATs in the ACPI tables, but that patch is not
-upstream yet.
-
-On Tue, Oct 18, 2022 at 5:14 AM Gregory Price <gourry.memverge@gmail.com> wrote:
->
->
-Early-boot e820 records will be inserted by the bios/efi/early boot
->
-software and be reported to the kernel via insert_resource.  Later, when
->
-CXL drivers iterate through the regions again, they will insert another
->
-resource and make the RESERVED memory area a child.
-I have already sent a patch
-https://www.mail-archive.com/qemu-devel@nongnu.org/msg882012.html
-.
-When the patch is applied, there would not be any reserved entries
-even with passing E820_RESERVED .
-So this patch needs to be evaluated in the light of the above patch I
-sent. Once you apply my patch, does the issue still exist?
-
->
->
-This RESERVED memory area causes the memory region to become unusable,
->
-and as a result attempting to create memory regions with
->
->
-`cxl create-region ...`
->
->
-Will fail due to the RESERVED area intersecting with the CXL window.
->
->
->
-During boot the following traceback is observed:
->
->
-0xffffffff81101650 in insert_resource_expand_to_fit ()
->
-0xffffffff83d964c5 in e820__reserve_resources_late ()
->
-0xffffffff83e03210 in pcibios_resource_survey ()
->
-0xffffffff83e04f4a in pcibios_init ()
->
->
-Which produces a call to reserve the CFMWS area:
->
->
-(gdb) p *new
->
-$54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved",
->
-flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0,
->
-child = 0x0}
->
->
-Later the Kernel parses ACPI tables and reserves the exact same area as
->
-the CXL Fixed Memory Window.  The use of `insert_resource_conflict`
->
-retains the RESERVED region and makes it a child of the new region.
->
->
-0xffffffff811016a4 in insert_resource_conflict ()
->
-insert_resource ()
->
-0xffffffff81a81389 in cxl_parse_cfmws ()
->
-0xffffffff818c4a81 in call_handler ()
->
-acpi_parse_entries_array ()
->
->
-(gdb) p/x *new
->
-$59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0",
->
-flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0,
->
-child = 0x0}
->
->
-This produces the following output in /proc/iomem:
->
->
-590000000-68fffffff : CXL Window 0
->
-590000000-68fffffff : Reserved
->
->
-This reserved area causes `get_free_mem_region()` to fail due to a check
->
-against `__region_intersects()`.  Due to this reserved area, the
->
-intersect check will only ever return REGION_INTERSECTS, which causes
->
-`cxl create-region` to always fail.
->
->
-Signed-off-by: Gregory Price <gregory.price@memverge.com>
->
----
->
-hw/i386/pc.c | 2 --
->
-1 file changed, 2 deletions(-)
->
->
-diff --git a/hw/i386/pc.c b/hw/i386/pc.c
->
-index 566accf7e6..5bf5465a21 100644
->
---- a/hw/i386/pc.c
->
-+++ b/hw/i386/pc.c
->
-@@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms,
->
-hwaddr cxl_size = MiB;
->
->
-cxl_base = pc_get_cxl_range_start(pcms);
->
--        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
->
-memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
->
-memory_region_add_subregion(system_memory, cxl_base, mr);
->
-cxl_resv_end = cxl_base + cxl_size;
->
-@@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms,
->
-memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops,
->
-fw,
->
-"cxl-fixed-memory-region", fw->size);
->
-memory_region_add_subregion(system_memory, fw->base,
->
-&fw->mr);
->
--                e820_add_entry(fw->base, fw->size, E820_RESERVED);
->
-cxl_fmw_base += fw->size;
->
-cxl_resv_end = cxl_fmw_base;
->
-}
->
---
->
-2.37.3
->
-
-This patch does not resolve the issue, reserved entries are still created.
-[    0.000000] BIOS-e820: [mem 0x0000000280000000-0x00000002800fffff] reserved
-[    0.000000] BIOS-e820: [mem 0x0000000290000000-0x000000029fffffff] reserved
-# cat /proc/iomem
-290000000-29fffffff : CXL Window 0
-  290000000-29fffffff : Reserved
-# cxl create-region -m -d decoder0.0 -w 1 -g 256 mem0
-cxl region: create_region: region0: set_size failed: Numerical result out of range
-cxl region: cmd_create_region: created 0 regions
-On Tue, Oct 18, 2022 at 2:05 AM Ani Sinha <
-ani@anisinha.ca
-> wrote:
-On Tue, Oct 18, 2022 at 5:14 AM Gregory Price <
-gourry.memverge@gmail.com
-> wrote:
->
-> Early-boot e820 records will be inserted by the bios/efi/early boot
-> software and be reported to the kernel via insert_resource.  Later, when
-> CXL drivers iterate through the regions again, they will insert another
-> resource and make the RESERVED memory area a child.
-I have already sent a patch
-https://www.mail-archive.com/qemu-devel@nongnu.org/msg882012.html
-.
-When the patch is applied, there would not be any reserved entries
-even with passing E820_RESERVED .
-So this patch needs to be evaluated in the light of the above patch I
-sent. Once you apply my patch, does the issue still exist?
->
-> This RESERVED memory area causes the memory region to become unusable,
-> and as a result attempting to create memory regions with
->
->     `cxl create-region ...`
->
-> Will fail due to the RESERVED area intersecting with the CXL window.
->
->
-> During boot the following traceback is observed:
->
-> 0xffffffff81101650 in insert_resource_expand_to_fit ()
-> 0xffffffff83d964c5 in e820__reserve_resources_late ()
-> 0xffffffff83e03210 in pcibios_resource_survey ()
-> 0xffffffff83e04f4a in pcibios_init ()
->
-> Which produces a call to reserve the CFMWS area:
->
-> (gdb) p *new
-> $54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved",
->        flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0,
->        child = 0x0}
->
-> Later the Kernel parses ACPI tables and reserves the exact same area as
-> the CXL Fixed Memory Window.  The use of `insert_resource_conflict`
-> retains the RESERVED region and makes it a child of the new region.
->
-> 0xffffffff811016a4 in insert_resource_conflict ()
->                       insert_resource ()
-> 0xffffffff81a81389 in cxl_parse_cfmws ()
-> 0xffffffff818c4a81 in call_handler ()
->                       acpi_parse_entries_array ()
->
-> (gdb) p/x *new
-> $59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0",
->        flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0,
->        child = 0x0}
->
-> This produces the following output in /proc/iomem:
->
-> 590000000-68fffffff : CXL Window 0
->   590000000-68fffffff : Reserved
->
-> This reserved area causes `get_free_mem_region()` to fail due to a check
-> against `__region_intersects()`.  Due to this reserved area, the
-> intersect check will only ever return REGION_INTERSECTS, which causes
-> `cxl create-region` to always fail.
->
-> Signed-off-by: Gregory Price <
-gregory.price@memverge.com
->
-> ---
->  hw/i386/pc.c | 2 --
->  1 file changed, 2 deletions(-)
->
-> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
-> index 566accf7e6..5bf5465a21 100644
-> --- a/hw/i386/pc.c
-> +++ b/hw/i386/pc.c
-> @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms,
->          hwaddr cxl_size = MiB;
->
->          cxl_base = pc_get_cxl_range_start(pcms);
-> -        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
->          memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
->          memory_region_add_subregion(system_memory, cxl_base, mr);
->          cxl_resv_end = cxl_base + cxl_size;
-> @@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms,
->                  memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
->                                        "cxl-fixed-memory-region", fw->size);
->                  memory_region_add_subregion(system_memory, fw->base, &fw->mr);
-> -                e820_add_entry(fw->base, fw->size, E820_RESERVED);
->                  cxl_fmw_base += fw->size;
->                  cxl_resv_end = cxl_fmw_base;
->              }
-> --
-> 2.37.3
->
-
-+Gerd Hoffmann
-
-On Tue, Oct 18, 2022 at 8:16 PM Gregory Price <gourry.memverge@gmail.com> wrote:
->
->
-This patch does not resolve the issue, reserved entries are still created.
->
->
-[    0.000000] BIOS-e820: [mem 0x0000000280000000-0x00000002800fffff] reserved
->
-[    0.000000] BIOS-e820: [mem 0x0000000290000000-0x000000029fffffff] reserved
->
->
-# cat /proc/iomem
->
-290000000-29fffffff : CXL Window 0
->
-290000000-29fffffff : Reserved
->
->
-# cxl create-region -m -d decoder0.0 -w 1 -g 256 mem0
->
-cxl region: create_region: region0: set_size failed: Numerical result out of
->
-range
->
-cxl region: cmd_create_region: created 0 regions
->
->
-On Tue, Oct 18, 2022 at 2:05 AM Ani Sinha <ani@anisinha.ca> wrote:
->
->
->
-> On Tue, Oct 18, 2022 at 5:14 AM Gregory Price <gourry.memverge@gmail.com>
->
-> wrote:
->
-> >
->
-> > Early-boot e820 records will be inserted by the bios/efi/early boot
->
-> > software and be reported to the kernel via insert_resource.  Later, when
->
-> > CXL drivers iterate through the regions again, they will insert another
->
-> > resource and make the RESERVED memory area a child.
->
->
->
-> I have already sent a patch
->
->
-https://www.mail-archive.com/qemu-devel@nongnu.org/msg882012.html
-.
->
-> When the patch is applied, there would not be any reserved entries
->
-> even with passing E820_RESERVED .
->
-> So this patch needs to be evaluated in the light of the above patch I
->
-> sent. Once you apply my patch, does the issue still exist?
->
->
->
-> >
->
-> > This RESERVED memory area causes the memory region to become unusable,
->
-> > and as a result attempting to create memory regions with
->
-> >
->
-> >     `cxl create-region ...`
->
-> >
->
-> > Will fail due to the RESERVED area intersecting with the CXL window.
->
-> >
->
-> >
->
-> > During boot the following traceback is observed:
->
-> >
->
-> > 0xffffffff81101650 in insert_resource_expand_to_fit ()
->
-> > 0xffffffff83d964c5 in e820__reserve_resources_late ()
->
-> > 0xffffffff83e03210 in pcibios_resource_survey ()
->
-> > 0xffffffff83e04f4a in pcibios_init ()
->
-> >
->
-> > Which produces a call to reserve the CFMWS area:
->
-> >
->
-> > (gdb) p *new
->
-> > $54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved",
->
-> >        flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0,
->
-> >        child = 0x0}
->
-> >
->
-> > Later the Kernel parses ACPI tables and reserves the exact same area as
->
-> > the CXL Fixed Memory Window.  The use of `insert_resource_conflict`
->
-> > retains the RESERVED region and makes it a child of the new region.
->
-> >
->
-> > 0xffffffff811016a4 in insert_resource_conflict ()
->
-> >                       insert_resource ()
->
-> > 0xffffffff81a81389 in cxl_parse_cfmws ()
->
-> > 0xffffffff818c4a81 in call_handler ()
->
-> >                       acpi_parse_entries_array ()
->
-> >
->
-> > (gdb) p/x *new
->
-> > $59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0",
->
-> >        flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0,
->
-> >        child = 0x0}
->
-> >
->
-> > This produces the following output in /proc/iomem:
->
-> >
->
-> > 590000000-68fffffff : CXL Window 0
->
-> >   590000000-68fffffff : Reserved
->
-> >
->
-> > This reserved area causes `get_free_mem_region()` to fail due to a check
->
-> > against `__region_intersects()`.  Due to this reserved area, the
->
-> > intersect check will only ever return REGION_INTERSECTS, which causes
->
-> > `cxl create-region` to always fail.
->
-> >
->
-> > Signed-off-by: Gregory Price <gregory.price@memverge.com>
->
-> > ---
->
-> >  hw/i386/pc.c | 2 --
->
-> >  1 file changed, 2 deletions(-)
->
-> >
->
-> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
->
-> > index 566accf7e6..5bf5465a21 100644
->
-> > --- a/hw/i386/pc.c
->
-> > +++ b/hw/i386/pc.c
->
-> > @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms,
->
-> >          hwaddr cxl_size = MiB;
->
-> >
->
-> >          cxl_base = pc_get_cxl_range_start(pcms);
->
-> > -        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
->
-> >          memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
->
-> >          memory_region_add_subregion(system_memory, cxl_base, mr);
->
-> >          cxl_resv_end = cxl_base + cxl_size;
->
-> > @@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms,
->
-> >                  memory_region_init_io(&fw->mr, OBJECT(machine),
->
-> > &cfmws_ops, fw,
->
-> >                                        "cxl-fixed-memory-region",
->
-> > fw->size);
->
-> >                  memory_region_add_subregion(system_memory, fw->base,
->
-> > &fw->mr);
->
-> > -                e820_add_entry(fw->base, fw->size, E820_RESERVED);
->
-> >                  cxl_fmw_base += fw->size;
->
-> >                  cxl_resv_end = cxl_fmw_base;
->
-> >              }
->
-> > --
->
-> > 2.37.3
->
-> >
-
->
->> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
->
->> > index 566accf7e6..5bf5465a21 100644
->
->> > --- a/hw/i386/pc.c
->
->> > +++ b/hw/i386/pc.c
->
->> > @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms,
->
->> >          hwaddr cxl_size = MiB;
->
->> >
->
->> >          cxl_base = pc_get_cxl_range_start(pcms);
->
->> > -        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
-Just dropping it doesn't look like a good plan to me.
-
-You can try set etc/reserved-memory-end fw_cfg file instead.  Firmware
-(both seabios and ovmf) read it and will make sure the 64bit pci mmio
-window is placed above that address, i.e. this effectively reserves
-address space.  Right now used by memory hotplug code, but should work
-for cxl too I think (disclaimer: don't know much about cxl ...).
-
-take care & HTH,
-  Gerd
-
-On Tue, 8 Nov 2022 12:21:11 +0100
-Gerd Hoffmann <kraxel@redhat.com> wrote:
-
->
-> >> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
->
-> >> > index 566accf7e6..5bf5465a21 100644
->
-> >> > --- a/hw/i386/pc.c
->
-> >> > +++ b/hw/i386/pc.c
->
-> >> > @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms,
->
-> >> >          hwaddr cxl_size = MiB;
->
-> >> >
->
-> >> >          cxl_base = pc_get_cxl_range_start(pcms);
->
-> >> > -        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
->
->
-Just dropping it doesn't look like a good plan to me.
->
->
-You can try set etc/reserved-memory-end fw_cfg file instead.  Firmware
->
-(both seabios and ovmf) read it and will make sure the 64bit pci mmio
->
-window is placed above that address, i.e. this effectively reserves
->
-address space.  Right now used by memory hotplug code, but should work
->
-for cxl too I think (disclaimer: don't know much about cxl ...).
-As far as I know CXL impl. in QEMU isn't using etc/reserved-memory-end
-at all, it' has its own mapping.
-
-Regardless of that, reserved E820 entries look wrong, and looking at
-commit message OS is right to bailout on them (expected according
-to ACPI spec).
-Also spec says 
-
-"
-E820 Assumptions and Limitations
- [...]
- The platform boot firmware does not return a range description for the memory 
-mapping of
- PCI devices, ISA Option ROMs, and ISA Plug and Play cards because the OS has 
-mechanisms
- available to detect them.
-"
-
-so dropping reserved entries looks reasonable from ACPI spec point of view.
-(disclaimer: don't know much about cxl ... either)
->
->
-take care & HTH,
->
-Gerd
->
-
-On Fri, Nov 11, 2022 at 11:51:23AM +0100, Igor Mammedov wrote:
->
-On Tue, 8 Nov 2022 12:21:11 +0100
->
-Gerd Hoffmann <kraxel@redhat.com> wrote:
->
->
-> > >> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
->
-> > >> > index 566accf7e6..5bf5465a21 100644
->
-> > >> > --- a/hw/i386/pc.c
->
-> > >> > +++ b/hw/i386/pc.c
->
-> > >> > @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms,
->
-> > >> >          hwaddr cxl_size = MiB;
->
-> > >> >
->
-> > >> >          cxl_base = pc_get_cxl_range_start(pcms);
->
-> > >> > -        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
->
->
->
-> Just dropping it doesn't look like a good plan to me.
->
->
->
-> You can try set etc/reserved-memory-end fw_cfg file instead.  Firmware
->
-> (both seabios and ovmf) read it and will make sure the 64bit pci mmio
->
-> window is placed above that address, i.e. this effectively reserves
->
-> address space.  Right now used by memory hotplug code, but should work
->
-> for cxl too I think (disclaimer: don't know much about cxl ...).
->
->
-As far as I know CXL impl. in QEMU isn't using etc/reserved-memory-end
->
-at all, it' has its own mapping.
-This should be changed.  cxl should make sure the highest address used
-is stored in etc/reserved-memory-end to avoid the firmware mapping pci
-resources there.
-
->
-so dropping reserved entries looks reasonable from ACPI spec point of view.
-Yep, I don't want dispute that.
-
-I suspect the reason for these entries to exist in the first place is to
-inform the firmware that it should not place stuff there, and if we
-remove that to conform with the spec we need some alternative way for
-that ...
-
-take care,
-  Gerd
-
-On Fri, 11 Nov 2022 12:40:59 +0100
-Gerd Hoffmann <kraxel@redhat.com> wrote:
-
->
-On Fri, Nov 11, 2022 at 11:51:23AM +0100, Igor Mammedov wrote:
->
-> On Tue, 8 Nov 2022 12:21:11 +0100
->
-> Gerd Hoffmann <kraxel@redhat.com> wrote:
->
->
->
-> > > >> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
->
-> > > >> > index 566accf7e6..5bf5465a21 100644
->
-> > > >> > --- a/hw/i386/pc.c
->
-> > > >> > +++ b/hw/i386/pc.c
->
-> > > >> > @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms,
->
-> > > >> >          hwaddr cxl_size = MiB;
->
-> > > >> >
->
-> > > >> >          cxl_base = pc_get_cxl_range_start(pcms);
->
-> > > >> > -        e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
->
-> >
->
-> > Just dropping it doesn't look like a good plan to me.
->
-> >
->
-> > You can try set etc/reserved-memory-end fw_cfg file instead.  Firmware
->
-> > (both seabios and ovmf) read it and will make sure the 64bit pci mmio
->
-> > window is placed above that address, i.e. this effectively reserves
->
-> > address space.  Right now used by memory hotplug code, but should work
->
-> > for cxl too I think (disclaimer: don't know much about cxl ...).
->
->
->
-> As far as I know CXL impl. in QEMU isn't using etc/reserved-memory-end
->
-> at all, it' has its own mapping.
->
->
-This should be changed.  cxl should make sure the highest address used
->
-is stored in etc/reserved-memory-end to avoid the firmware mapping pci
->
-resources there.
-if (pcmc->has_reserved_memory && machine->device_memory->base) {            
- 
-[...]
-                                                             
-        if (pcms->cxl_devices_state.is_enabled) {                               
- 
-            res_mem_end = cxl_resv_end;
-
-that should be handled by this line
-
-        }                                   
-                                     
-        *val = cpu_to_le64(ROUND_UP(res_mem_end, 1 * GiB));                     
- 
-        fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val));  
- 
-    }  
-
-so SeaBIOS shouldn't intrude into CXL address space
-(I assume EDK2 behave similarly here)
- 
->
-> so dropping reserved entries looks reasonable from ACPI spec point of view.
->
->
->
->
-Yep, I don't want dispute that.
->
->
-I suspect the reason for these entries to exist in the first place is to
->
-inform the firmware that it should not place stuff there, and if we
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-just to educate me, can you point out what SeaBIOS code does with reservations.
-
->
-remove that to conform with the spec we need some alternative way for
->
-that ...
-with etc/reserved-memory-end set as above,
-is E820_RESERVED really needed here?
-
-(my understanding was that E820_RESERVED weren't accounted for when
-initializing PCI devices)
-
->
->
-take care,
->
-Gerd
->
-
->
-if (pcmc->has_reserved_memory && machine->device_memory->base) {
->
->
-[...]
->
->
-if (pcms->cxl_devices_state.is_enabled) {
->
->
-res_mem_end = cxl_resv_end;
->
->
-that should be handled by this line
->
->
-}
->
->
-*val = cpu_to_le64(ROUND_UP(res_mem_end, 1 * GiB));
->
->
-fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val,
->
-sizeof(*val));
->
-}
->
->
-so SeaBIOS shouldn't intrude into CXL address space
-Yes, looks good, so with this in place already everyting should be fine.
-
->
-(I assume EDK2 behave similarly here)
-Correct, ovmf reads that fw_cfg file too.
-
->
-> I suspect the reason for these entries to exist in the first place is to
->
-> inform the firmware that it should not place stuff there, and if we
->
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
->
-just to educate me, can you point out what SeaBIOS code does with
->
-reservations.
-They are added to the e820 map which gets passed on to the OS.  seabios
-uses (and updateas) the e820 map too, when allocating memory for
-example.  While thinking about it I'm not fully sure it actually looks
-at reservations, maybe it only uses (and updates) ram entries when
-allocating memory.
-
->
-> remove that to conform with the spec we need some alternative way for
->
-> that ...
->
->
-with etc/reserved-memory-end set as above,
->
-is E820_RESERVED really needed here?
-No.  Setting etc/reserved-memory-end is enough.
-
-So for the original patch:
-Acked-by: Gerd Hoffmann <kraxel@redhat.com>
-
-take care,
-  Gerd
-
-On Fri, Nov 11, 2022 at 02:36:02PM +0100, Gerd Hoffmann wrote:
->
->     if (pcmc->has_reserved_memory && machine->device_memory->base) {
->
->
->
-> [...]
->
->
->
->         if (pcms->cxl_devices_state.is_enabled) {
->
->
->
->             res_mem_end = cxl_resv_end;
->
->
->
-> that should be handled by this line
->
->
->
->         }
->
->
->
->         *val = cpu_to_le64(ROUND_UP(res_mem_end, 1 * GiB));
->
->
->
->         fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val,
->
-> sizeof(*val));
->
->     }
->
->
->
-> so SeaBIOS shouldn't intrude into CXL address space
->
->
-Yes, looks good, so with this in place already everyting should be fine.
->
->
-> (I assume EDK2 behave similarly here)
->
->
-Correct, ovmf reads that fw_cfg file too.
->
->
-> > I suspect the reason for these entries to exist in the first place is to
->
-> > inform the firmware that it should not place stuff there, and if we
->
->        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
->
-> just to educate me, can you point out what SeaBIOS code does with
->
-> reservations.
->
->
-They are added to the e820 map which gets passed on to the OS.  seabios
->
-uses (and updateas) the e820 map too, when allocating memory for
->
-example.  While thinking about it I'm not fully sure it actually looks
->
-at reservations, maybe it only uses (and updates) ram entries when
->
-allocating memory.
->
->
-> > remove that to conform with the spec we need some alternative way for
->
-> > that ...
->
->
->
-> with etc/reserved-memory-end set as above,
->
-> is E820_RESERVED really needed here?
->
->
-No.  Setting etc/reserved-memory-end is enough.
->
->
-So for the original patch:
->
-Acked-by: Gerd Hoffmann <kraxel@redhat.com>
->
->
-take care,
->
-Gerd
-It's upstream already, sorry I can't add your tag.
-
--- 
-MST
-
diff --git a/results/classifier/008/other/12869209 b/results/classifier/008/other/12869209
deleted file mode 100644
index 7eb2a9f04..000000000
--- a/results/classifier/008/other/12869209
+++ /dev/null
@@ -1,98 +0,0 @@
-other: 0.964
-device: 0.951
-files: 0.945
-permissions: 0.922
-performance: 0.906
-socket: 0.906
-PID: 0.894
-semantic: 0.891
-vnc: 0.885
-graphic: 0.879
-network: 0.858
-KVM: 0.857
-boot: 0.837
-debug: 0.813
-
-[BUG FIX][PATCH v3 0/3] vhost-user-blk: fix bug on device disconnection during initialization
-
-This is a series fixing a bug in
-          host-user-blk.
-Is there any chance for it to be considered for the next rc?
-Thanks!
-Denis
-On 29.03.2021 16:44, Denis Plotnikov
-      wrote:
-ping!
-On 25.03.2021 18:12, Denis Plotnikov
-        wrote:
-v3:
-  * 0003: a new patch added fixing the problem on vm shutdown
-    I stumbled on this bug after v2 sending.
-  * 0001: gramma fixing (Raphael)
-  * 0002: commit message fixing (Raphael)
-
-v2:
-  * split the initial patch into two (Raphael)
-  * rename init to realized (Raphael)
-  * remove unrelated comment (Raphael)
-
-When the vhost-user-blk device lose the connection to the daemon during
-the initialization phase it kills qemu because of the assert in the code.
-The series fixes the bug.
-
-0001 is preparation for the fix
-0002 fixes the bug, patch description has the full motivation for the series
-0003 (added in v3) fix bug on vm shutdown
-
-Denis Plotnikov (3):
-  vhost-user-blk: use different event handlers on initialization
-  vhost-user-blk: perform immediate cleanup if disconnect on
-    initialization
-  vhost-user-blk: add immediate cleanup on shutdown
-
- hw/block/vhost-user-blk.c | 79 ++++++++++++++++++++++++---------------
- 1 file changed, 48 insertions(+), 31 deletions(-)
-
-On 01.04.2021 14:21, Denis Plotnikov wrote:
-This is a series fixing a bug in host-user-blk.
-More specifically, it's not just a bug but crasher.
-
-Valentine
-Is there any chance for it to be considered for the next rc?
-
-Thanks!
-
-Denis
-
-On 29.03.2021 16:44, Denis Plotnikov wrote:
-ping!
-
-On 25.03.2021 18:12, Denis Plotnikov wrote:
-v3:
-   * 0003: a new patch added fixing the problem on vm shutdown
-     I stumbled on this bug after v2 sending.
-   * 0001: gramma fixing (Raphael)
-   * 0002: commit message fixing (Raphael)
-
-v2:
-   * split the initial patch into two (Raphael)
-   * rename init to realized (Raphael)
-   * remove unrelated comment (Raphael)
-
-When the vhost-user-blk device lose the connection to the daemon during
-the initialization phase it kills qemu because of the assert in the code.
-The series fixes the bug.
-
-0001 is preparation for the fix
-0002 fixes the bug, patch description has the full motivation for the series
-0003 (added in v3) fix bug on vm shutdown
-
-Denis Plotnikov (3):
-   vhost-user-blk: use different event handlers on initialization
-   vhost-user-blk: perform immediate cleanup if disconnect on
-     initialization
-   vhost-user-blk: add immediate cleanup on shutdown
-
-  hw/block/vhost-user-blk.c | 79 ++++++++++++++++++++++++---------------
-  1 file changed, 48 insertions(+), 31 deletions(-)
-
diff --git a/results/classifier/008/other/13442371 b/results/classifier/008/other/13442371
deleted file mode 100644
index 6ea769a1c..000000000
--- a/results/classifier/008/other/13442371
+++ /dev/null
@@ -1,379 +0,0 @@
-other: 0.886
-device: 0.883
-KVM: 0.872
-debug: 0.866
-vnc: 0.858
-permissions: 0.854
-graphic: 0.850
-semantic: 0.850
-PID: 0.849
-performance: 0.841
-files: 0.837
-socket: 0.831
-boot: 0.815
-network: 0.811
-
-[Qemu-devel] [BUG] nanoMIPS support problem related to extract2 support for i386 TCG target
-
-Hello, Richard, Peter, and others.
-
-As a part of activities before 4.1 release, I tested nanoMIPS support
-in QEMU (which was officially fully integrated in 4.0, is currently
-limited to system mode only, and was tested in a similar fashion right
-prior to 4.0).
-
-This support appears to be broken now. Following command line works in
-4.0, but results in kernel panic for the current tip of the tree:
-
-~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
--cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
-1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
-"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
-
-(kernel and rootfs image files used in this commend line can be
-downloaded from the locations mentioned in our user guide)
-
-The quick bisect points to the commit:
-
-commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
-Author: Richard Henderson <address@hidden>
-Date:   Mon Feb 25 11:42:35 2019 -0800
-
-    tcg/i386: Support INDEX_op_extract2_{i32,i64}
-
-    Signed-off-by: Richard Henderson <address@hidden>
-
-Please advise on further actions.
-
-Yours,
-Aleksandar
-
-On Fri, Jul 12, 2019 at 8:09 PM Aleksandar Markovic
-<address@hidden> wrote:
->
->
-Hello, Richard, Peter, and others.
->
->
-As a part of activities before 4.1 release, I tested nanoMIPS support
->
-in QEMU (which was officially fully integrated in 4.0, is currently
->
-limited to system mode only, and was tested in a similar fashion right
->
-prior to 4.0).
->
->
-This support appears to be broken now. Following command line works in
->
-4.0, but results in kernel panic for the current tip of the tree:
->
->
-~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
->
--cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
->
-1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
->
-"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
->
->
-(kernel and rootfs image files used in this commend line can be
->
-downloaded from the locations mentioned in our user guide)
->
->
-The quick bisect points to the commit:
->
->
-commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
->
-Author: Richard Henderson <address@hidden>
->
-Date:   Mon Feb 25 11:42:35 2019 -0800
->
->
-tcg/i386: Support INDEX_op_extract2_{i32,i64}
->
->
-Signed-off-by: Richard Henderson <address@hidden>
->
->
-Please advise on further actions.
->
-Just to add a data point:
-
-If the following change is applied:
-
-diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
-index 928e8b8..b6a4cf2 100644
---- a/tcg/i386/tcg-target.h
-+++ b/tcg/i386/tcg-target.h
-@@ -124,7 +124,7 @@ extern bool have_avx2;
- #define TCG_TARGET_HAS_deposit_i32      1
- #define TCG_TARGET_HAS_extract_i32      1
- #define TCG_TARGET_HAS_sextract_i32     1
--#define TCG_TARGET_HAS_extract2_i32     1
-+#define TCG_TARGET_HAS_extract2_i32     0
- #define TCG_TARGET_HAS_movcond_i32      1
- #define TCG_TARGET_HAS_add2_i32         1
- #define TCG_TARGET_HAS_sub2_i32         1
-@@ -163,7 +163,7 @@ extern bool have_avx2;
- #define TCG_TARGET_HAS_deposit_i64      1
- #define TCG_TARGET_HAS_extract_i64      1
- #define TCG_TARGET_HAS_sextract_i64     0
--#define TCG_TARGET_HAS_extract2_i64     1
-+#define TCG_TARGET_HAS_extract2_i64     0
- #define TCG_TARGET_HAS_movcond_i64      1
- #define TCG_TARGET_HAS_add2_i64         1
- #define TCG_TARGET_HAS_sub2_i64         1
-
-... the problem disappears.
-
-
->
-Yours,
->
-Aleksandar
-
-On Fri, Jul 12, 2019 at 8:19 PM Aleksandar Markovic
-<address@hidden> wrote:
->
->
-On Fri, Jul 12, 2019 at 8:09 PM Aleksandar Markovic
->
-<address@hidden> wrote:
->
->
->
-> Hello, Richard, Peter, and others.
->
->
->
-> As a part of activities before 4.1 release, I tested nanoMIPS support
->
-> in QEMU (which was officially fully integrated in 4.0, is currently
->
-> limited to system mode only, and was tested in a similar fashion right
->
-> prior to 4.0).
->
->
->
-> This support appears to be broken now. Following command line works in
->
-> 4.0, but results in kernel panic for the current tip of the tree:
->
->
->
-> ~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
->
-> -cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
->
-> 1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
->
-> "mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
->
->
->
-> (kernel and rootfs image files used in this commend line can be
->
-> downloaded from the locations mentioned in our user guide)
->
->
->
-> The quick bisect points to the commit:
->
->
->
-> commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
->
-> Author: Richard Henderson <address@hidden>
->
-> Date:   Mon Feb 25 11:42:35 2019 -0800
->
->
->
->     tcg/i386: Support INDEX_op_extract2_{i32,i64}
->
->
->
->     Signed-off-by: Richard Henderson <address@hidden>
->
->
->
-> Please advise on further actions.
->
->
->
->
-Just to add a data point:
->
->
-If the following change is applied:
->
->
-diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
->
-index 928e8b8..b6a4cf2 100644
->
---- a/tcg/i386/tcg-target.h
->
-+++ b/tcg/i386/tcg-target.h
->
-@@ -124,7 +124,7 @@ extern bool have_avx2;
->
-#define TCG_TARGET_HAS_deposit_i32      1
->
-#define TCG_TARGET_HAS_extract_i32      1
->
-#define TCG_TARGET_HAS_sextract_i32     1
->
--#define TCG_TARGET_HAS_extract2_i32     1
->
-+#define TCG_TARGET_HAS_extract2_i32     0
->
-#define TCG_TARGET_HAS_movcond_i32      1
->
-#define TCG_TARGET_HAS_add2_i32         1
->
-#define TCG_TARGET_HAS_sub2_i32         1
->
-@@ -163,7 +163,7 @@ extern bool have_avx2;
->
-#define TCG_TARGET_HAS_deposit_i64      1
->
-#define TCG_TARGET_HAS_extract_i64      1
->
-#define TCG_TARGET_HAS_sextract_i64     0
->
--#define TCG_TARGET_HAS_extract2_i64     1
->
-+#define TCG_TARGET_HAS_extract2_i64     0
->
-#define TCG_TARGET_HAS_movcond_i64      1
->
-#define TCG_TARGET_HAS_add2_i64         1
->
-#define TCG_TARGET_HAS_sub2_i64         1
->
->
-... the problem disappears.
->
-It looks the problem is in this code segment in of tcg_gen_deposit_i32():
-
-        if (ofs == 0) {
-            tcg_gen_extract2_i32(ret, arg1, arg2, len);
-            tcg_gen_rotli_i32(ret, ret, len);
-            goto done;
-        }
-
-)
-
-If that code segment is deleted altogether (which effectively forces
-usage of "fallback" part of tcg_gen_deposit_i32()), the problem also
-vanishes (without changes from my previous mail).
-
->
->
-> Yours,
->
-> Aleksandar
-
-Aleksandar Markovic <address@hidden> writes:
-
->
-Hello, Richard, Peter, and others.
->
->
-As a part of activities before 4.1 release, I tested nanoMIPS support
->
-in QEMU (which was officially fully integrated in 4.0, is currently
->
-limited to system mode only, and was tested in a similar fashion right
->
-prior to 4.0).
->
->
-This support appears to be broken now. Following command line works in
->
-4.0, but results in kernel panic for the current tip of the tree:
->
->
-~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel
->
--cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m
->
-1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append
->
-"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda"
->
->
-(kernel and rootfs image files used in this commend line can be
->
-downloaded from the locations mentioned in our user guide)
->
->
-The quick bisect points to the commit:
->
->
-commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab
->
-Author: Richard Henderson <address@hidden>
->
-Date:   Mon Feb 25 11:42:35 2019 -0800
->
->
-tcg/i386: Support INDEX_op_extract2_{i32,i64}
->
->
-Signed-off-by: Richard Henderson <address@hidden>
->
->
-Please advise on further actions.
-Please see the fix:
-
-  Subject: [PATCH for-4.1] tcg: Fix constant folding of INDEX_op_extract2_i32
-  Date: Tue,  9 Jul 2019 14:19:00 +0200
-  Message-Id: <address@hidden>
-
->
->
-Yours,
->
-Aleksandar
---
-Alex Bennée
-
-On Sat, Jul 13, 2019 at 9:21 AM Alex Bennée <address@hidden> wrote:
->
->
-Please see the fix:
->
->
-Subject: [PATCH for-4.1] tcg: Fix constant folding of INDEX_op_extract2_i32
->
-Date: Tue,  9 Jul 2019 14:19:00 +0200
->
-Message-Id: <address@hidden>
->
-Thanks, this fixed the behavior.
-
-Sincerely,
-Aleksandar
-
->
->
->
->
-> Yours,
->
-> Aleksandar
->
->
->
---
->
-Alex Bennée
->
-
diff --git a/results/classifier/008/other/16201167 b/results/classifier/008/other/16201167
deleted file mode 100644
index 5766e79dd..000000000
--- a/results/classifier/008/other/16201167
+++ /dev/null
@@ -1,110 +0,0 @@
-other: 0.954
-vnc: 0.946
-permissions: 0.939
-graphic: 0.937
-debug: 0.935
-semantic: 0.933
-KVM: 0.928
-performance: 0.927
-device: 0.911
-files: 0.900
-PID: 0.899
-socket: 0.892
-network: 0.864
-boot: 0.845
-
-[BUG] Qemu abort with error "kvm_mem_ioeventfd_add: error adding ioeventfd: File exists (17)"
-
-Hi list,
-
-When I did some tests in my virtual domain with live-attached virtio deivces, I 
-got a coredump file of Qemu.
-
-The error print from qemu is "kvm_mem_ioeventfd_add: error adding ioeventfd: 
-File exists (17)".
-And the call trace in the coredump file displays as below:
-#0  0x0000ffff89acecc8 in ?? () from /usr/lib64/libc.so.6
-#1  0x0000ffff89a8acbc in raise () from /usr/lib64/libc.so.6
-#2  0x0000ffff89a78d2c in abort () from /usr/lib64/libc.so.6
-#3  0x0000aaaabd7ccf1c in kvm_mem_ioeventfd_add (listener=<optimized out>, 
-section=<optimized out>, match_data=<optimized out>, data=<optimized out>, 
-e=<optimized out>) at ../accel/kvm/kvm-all.c:1607
-#4  0x0000aaaabd6e0304 in address_space_add_del_ioeventfds (fds_old_nb=164, 
-fds_old=0xffff5c80a1d0, fds_new_nb=160, fds_new=0xffff5c565080, 
-as=0xaaaabdfa8810 <address_space_memory>)
-    at ../softmmu/memory.c:795
-#5  address_space_update_ioeventfds (as=0xaaaabdfa8810 <address_space_memory>) 
-at ../softmmu/memory.c:856
-#6  0x0000aaaabd6e24d8 in memory_region_commit () at ../softmmu/memory.c:1113
-#7  0x0000aaaabd6e25c4 in memory_region_transaction_commit () at 
-../softmmu/memory.c:1144
-#8  0x0000aaaabd394eb4 in pci_bridge_update_mappings 
-(br=br@entry=0xaaaae755f7c0) at ../hw/pci/pci_bridge.c:248
-#9  0x0000aaaabd394f4c in pci_bridge_write_config (d=0xaaaae755f7c0, 
-address=44, val=<optimized out>, len=4) at ../hw/pci/pci_bridge.c:272
-#10 0x0000aaaabd39a928 in rp_write_config (d=0xaaaae755f7c0, address=44, 
-val=128, len=4) at ../hw/pci-bridge/pcie_root_port.c:39
-#11 0x0000aaaabd6df328 in memory_region_write_accessor (mr=0xaaaae63898d0, 
-addr=65580, value=<optimized out>, size=4, shift=<optimized out>, 
-mask=<optimized out>, attrs=...) at ../softmmu/memory.c:494
-#12 0x0000aaaabd6dcb6c in access_with_adjusted_size (addr=addr@entry=65580, 
-value=value@entry=0xffff817adc78, size=size@entry=4, access_size_min=<optimized 
-out>, access_size_max=<optimized out>,
-    access_fn=access_fn@entry=0xaaaabd6df284 <memory_region_write_accessor>, 
-mr=mr@entry=0xaaaae63898d0, attrs=attrs@entry=...) at ../softmmu/memory.c:556
-#13 0x0000aaaabd6e0dc8 in memory_region_dispatch_write 
-(mr=mr@entry=0xaaaae63898d0, addr=65580, data=<optimized out>, op=MO_32, 
-attrs=attrs@entry=...) at ../softmmu/memory.c:1534
-#14 0x0000aaaabd6d0574 in flatview_write_continue (fv=fv@entry=0xffff5c02da00, 
-addr=addr@entry=275146407980, attrs=attrs@entry=..., 
-ptr=ptr@entry=0xffff8aa8c028, len=len@entry=4,
-    addr1=<optimized out>, l=<optimized out>, mr=mr@entry=0xaaaae63898d0) at 
-/usr/src/debug/qemu-6.2.0-226.aarch64/include/qemu/host-utils.h:165
-#15 0x0000aaaabd6d4584 in flatview_write (len=4, buf=0xffff8aa8c028, attrs=..., 
-addr=275146407980, fv=0xffff5c02da00) at ../softmmu/physmem.c:3375
-#16 address_space_write (as=<optimized out>, addr=275146407980, attrs=..., 
-buf=buf@entry=0xffff8aa8c028, len=4) at ../softmmu/physmem.c:3467
-#17 0x0000aaaabd6d462c in address_space_rw (as=<optimized out>, addr=<optimized 
-out>, attrs=..., attrs@entry=..., buf=buf@entry=0xffff8aa8c028, len=<optimized 
-out>, is_write=<optimized out>)
-    at ../softmmu/physmem.c:3477
-#18 0x0000aaaabd7cf6e8 in kvm_cpu_exec (cpu=cpu@entry=0xaaaae625dfd0) at 
-../accel/kvm/kvm-all.c:2970
-#19 0x0000aaaabd7d09bc in kvm_vcpu_thread_fn (arg=arg@entry=0xaaaae625dfd0) at 
-../accel/kvm/kvm-accel-ops.c:49
-#20 0x0000aaaabd94ccd8 in qemu_thread_start (args=<optimized out>) at 
-../util/qemu-thread-posix.c:559
-
-
-By printing more info in the coredump file, I found that the addr of 
-fds_old[146] and fds_new[146] are same, but fds_old[146] belonged to a 
-live-attached virtio-scsi device while fds_new[146] was owned by another 
-live-attached virtio-net.
-The reason why addr conflicted was then been found from vm's console log. Just 
-before qemu aborted, the guest kernel crashed and kdump.service booted the 
-dump-capture kernel where re-alloced address for the devices.
-Because those virtio devices were live-attached after vm creating, different 
-addr may been assigned to them in the dump-capture kernel:
-
-the initial kernel booting log:
-[    1.663297] pci 0000:00:02.1: BAR 14: assigned [mem 0x11900000-0x11afffff]
-[    1.664560] pci 0000:00:02.1: BAR 15: assigned [mem 
-0x8001800000-0x80019fffff 64bit pref]
-
-the dump-capture kernel booting log:
-[    1.845211] pci 0000:00:02.0: BAR 14: assigned [mem 0x11900000-0x11bfffff]
-[    1.846542] pci 0000:00:02.0: BAR 15: assigned [mem 
-0x8001800000-0x8001afffff 64bit pref]
-
-
-I think directly aborting the qemu process may not be the best choice in this 
-case cuz it will interrupt the work of kdump.service so that failed to generate 
-memory dump of the crashed guest kernel.
-Perhaps, IMO, the error could be simply ignored in this case and just let kdump 
-to reboot the system after memory-dump finishing, but I failed to find a 
-suitable judgment in the codes.
-
-Any solution for this problem? Hope I can get some helps here.
-
-Hao
-
diff --git a/results/classifier/008/other/21247035 b/results/classifier/008/other/21247035
deleted file mode 100644
index ae43e9314..000000000
--- a/results/classifier/008/other/21247035
+++ /dev/null
@@ -1,1331 +0,0 @@
-other: 0.640
-permissions: 0.541
-device: 0.525
-KVM: 0.514
-debug: 0.468
-performance: 0.427
-graphic: 0.426
-files: 0.375
-semantic: 0.374
-PID: 0.370
-vnc: 0.367
-boot: 0.345
-network: 0.322
-socket: 0.322
-
-[Qemu-devel] [BUG] I/O thread segfault for QEMU on s390x
-
-Hi,
-I have been noticing some segfaults for QEMU on s390x, and I have been
-hitting this issue quite reliably (at least once in 10 runs of a test
-case). The qemu version is 2.11.50, and I have systemd created coredumps
-when this happens.
-
-Here is a back trace of the segfaulting thread:
-
-
-#0  0x000003ffafed202c in swapcontext () from /lib64/libc.so.6
-#1  0x000002aa355c02ee in qemu_coroutine_new () at
-util/coroutine-ucontext.c:164
-#2  0x000002aa355bec34 in qemu_coroutine_create
-(address@hidden <blk_aio_read_entry>,
-address@hidden) at util/qemu-coroutine.c:76
-#3  0x000002aa35510262 in blk_aio_prwv (blk=0x2aa65fbefa0,
-offset=<optimized out>, bytes=<optimized out>, qiov=0x3ffa002a9c0,
-address@hidden <blk_aio_read_entry>, flags=0,
-cb=0x2aa35340a50 <virtio_blk_rw_complete>, opaque=0x3ffa002a960) at
-block/block-backend.c:1299
-#4  0x000002aa35510376 in blk_aio_preadv (blk=<optimized out>,
-offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
-cb=<optimized out>, opaque=0x3ffa002a960) at block/block-backend.c:1392
-#5  0x000002aa3534114e in submit_requests (niov=<optimized out>,
-num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>,
-blk=<optimized out>) at
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:372
-#6  virtio_blk_submit_multireq (blk=<optimized out>,
-address@hidden) at
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:402
-#7  0x000002aa353422e0 in virtio_blk_handle_vq (s=0x2aa6611e7d8,
-vq=0x3ffb0f5f010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
-#8  0x000002aa3536655a in virtio_queue_notify_aio_vq
-(address@hidden) at
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
-#9  0x000002aa35366cd6 in virtio_queue_notify_aio_vq (vq=0x3ffb0f5f010)
-at /usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1511
-#10 virtio_queue_host_notifier_aio_poll (opaque=0x3ffb0f5f078) at
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:2409
-#11 0x000002aa355a8ba4 in run_poll_handlers_once
-(address@hidden) at util/aio-posix.c:497
-#12 0x000002aa355a9b74 in run_poll_handlers (max_ns=<optimized out>,
-ctx=0x2aa65f99310) at util/aio-posix.c:534
-#13 try_poll_mode (blocking=true, ctx=0x2aa65f99310) at util/aio-posix.c:562
-#14 aio_poll (ctx=0x2aa65f99310, address@hidden) at
-util/aio-posix.c:602
-#15 0x000002aa353d2d0a in iothread_run (opaque=0x2aa65f990f0) at
-iothread.c:60
-#16 0x000003ffb0f07e82 in start_thread () from /lib64/libpthread.so.0
-#17 0x000003ffaff91596 in thread_start () from /lib64/libc.so.6
-I don't have much knowledge about i/o threads and the block layer code
-in QEMU, so I would like to report to the community about this issue.
-I believe this very similar to the bug that I reported upstream couple
-of days ago
-(
-https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04452.html
-).
-Any help would be greatly appreciated.
-
-Thanks
-Farhan
-
-On Thu, Mar 1, 2018 at 10:33 PM, Farhan Ali <address@hidden> wrote:
->
-Hi,
->
->
-I have been noticing some segfaults for QEMU on s390x, and I have been
->
-hitting this issue quite reliably (at least once in 10 runs of a test case).
->
-The qemu version is 2.11.50, and I have systemd created coredumps
->
-when this happens.
-Can you describe the test case or suggest how to reproduce it for us?
-
-Fam
-
-On 03/02/2018 01:13 AM, Fam Zheng wrote:
-On Thu, Mar 1, 2018 at 10:33 PM, Farhan Ali <address@hidden> wrote:
-Hi,
-
-I have been noticing some segfaults for QEMU on s390x, and I have been
-hitting this issue quite reliably (at least once in 10 runs of a test case).
-The qemu version is 2.11.50, and I have systemd created coredumps
-when this happens.
-Can you describe the test case or suggest how to reproduce it for us?
-
-Fam
-The test case is with a single guest, running a memory intensive
-workload. The guest has 8 vpcus and 4G of memory.
-Here is the qemu command line, if that helps:
-
-/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \
--S -object
-secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes
-\
--machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \
--m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \
--object iothread,id=iothread1 -object iothread,id=iothread2 -uuid
-b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \
--display none -no-user-config -nodefaults -chardev
-socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait
--mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
--no-shutdown \
--boot strict=on -drive
-file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
--device
-virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
--drive
-file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native
--device
-virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1
--netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device
-virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000
--chardev pty,id=charconsole0 -device
-sclpconsole,chardev=charconsole0,id=console0 -device
-virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on
-Please let me know if I need to provide any other information.
-
-Thanks
-Farhan
-
-On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote:
->
-Hi,
->
->
-I have been noticing some segfaults for QEMU on s390x, and I have been
->
-hitting this issue quite reliably (at least once in 10 runs of a test case).
->
-The qemu version is 2.11.50, and I have systemd created coredumps
->
-when this happens.
->
->
-Here is a back trace of the segfaulting thread:
-The backtrace looks normal.
-
-Please post the QEMU command-line and the details of the segfault (which
-memory access faulted?).
-
->
-#0  0x000003ffafed202c in swapcontext () from /lib64/libc.so.6
->
-#1  0x000002aa355c02ee in qemu_coroutine_new () at
->
-util/coroutine-ucontext.c:164
->
-#2  0x000002aa355bec34 in qemu_coroutine_create
->
-(address@hidden <blk_aio_read_entry>,
->
-address@hidden) at util/qemu-coroutine.c:76
->
-#3  0x000002aa35510262 in blk_aio_prwv (blk=0x2aa65fbefa0, offset=<optimized
->
-out>, bytes=<optimized out>, qiov=0x3ffa002a9c0,
->
-address@hidden <blk_aio_read_entry>, flags=0,
->
-cb=0x2aa35340a50 <virtio_blk_rw_complete>, opaque=0x3ffa002a960) at
->
-block/block-backend.c:1299
->
-#4  0x000002aa35510376 in blk_aio_preadv (blk=<optimized out>,
->
-offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
->
-cb=<optimized out>, opaque=0x3ffa002a960) at block/block-backend.c:1392
->
-#5  0x000002aa3534114e in submit_requests (niov=<optimized out>,
->
-num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>,
->
-blk=<optimized out>) at
->
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:372
->
-#6  virtio_blk_submit_multireq (blk=<optimized out>,
->
-address@hidden) at
->
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:402
->
-#7  0x000002aa353422e0 in virtio_blk_handle_vq (s=0x2aa6611e7d8,
->
-vq=0x3ffb0f5f010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
->
-#8  0x000002aa3536655a in virtio_queue_notify_aio_vq
->
-(address@hidden) at
->
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
->
-#9  0x000002aa35366cd6 in virtio_queue_notify_aio_vq (vq=0x3ffb0f5f010) at
->
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1511
->
-#10 virtio_queue_host_notifier_aio_poll (opaque=0x3ffb0f5f078) at
->
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:2409
->
-#11 0x000002aa355a8ba4 in run_poll_handlers_once
->
-(address@hidden) at util/aio-posix.c:497
->
-#12 0x000002aa355a9b74 in run_poll_handlers (max_ns=<optimized out>,
->
-ctx=0x2aa65f99310) at util/aio-posix.c:534
->
-#13 try_poll_mode (blocking=true, ctx=0x2aa65f99310) at util/aio-posix.c:562
->
-#14 aio_poll (ctx=0x2aa65f99310, address@hidden) at
->
-util/aio-posix.c:602
->
-#15 0x000002aa353d2d0a in iothread_run (opaque=0x2aa65f990f0) at
->
-iothread.c:60
->
-#16 0x000003ffb0f07e82 in start_thread () from /lib64/libpthread.so.0
->
-#17 0x000003ffaff91596 in thread_start () from /lib64/libc.so.6
->
->
->
-I don't have much knowledge about i/o threads and the block layer code in
->
-QEMU, so I would like to report to the community about this issue.
->
-I believe this very similar to the bug that I reported upstream couple of
->
-days ago
->
-(
-https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04452.html
-).
->
->
-Any help would be greatly appreciated.
->
->
-Thanks
->
-Farhan
->
-signature.asc
-Description:
-PGP signature
-
-On 03/02/2018 04:23 AM, Stefan Hajnoczi wrote:
-On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote:
-Hi,
-
-I have been noticing some segfaults for QEMU on s390x, and I have been
-hitting this issue quite reliably (at least once in 10 runs of a test case).
-The qemu version is 2.11.50, and I have systemd created coredumps
-when this happens.
-
-Here is a back trace of the segfaulting thread:
-The backtrace looks normal.
-
-Please post the QEMU command-line and the details of the segfault (which
-memory access faulted?).
-I was able to create another crash today and here is the qemu comand line
-
-/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \
--S -object
-secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes
-\
--machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \
--m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \
--object iothread,id=iothread1 -object iothread,id=iothread2 -uuid
-b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \
--display none -no-user-config -nodefaults -chardev
-socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait
--mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
--no-shutdown \
--boot strict=on -drive
-file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
--device
-virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
--drive
-file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native
--device
-virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1
--netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device
-virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000
--chardev pty,id=charconsole0 -device
-sclpconsole,chardev=charconsole0,id=console0 -device
-virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on
-This the latest back trace on the segfaulting thread, and it seems to
-segfault in swapcontext.
-Program terminated with signal SIGSEGV, Segmentation fault.
-#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
-
-
-This is the remaining back trace:
-
-#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
-#1  0x000002aa33b45566 in qemu_coroutine_new () at
-util/coroutine-ucontext.c:164
-#2  0x000002aa33b43eac in qemu_coroutine_create
-(address@hidden <blk_aio_write_entry>,
-address@hidden) at util/qemu-coroutine.c:76
-#3  0x000002aa33a954da in blk_aio_prwv (blk=0x2aa4f0efda0,
-offset=<optimized out>, bytes=<optimized out>, qiov=0x3ff74019080,
-address@hidden <blk_aio_write_entry>, flags=0,
-cb=0x2aa338c62e8 <virtio_blk_rw_complete>, opaque=0x3ff74019020) at
-block/block-backend.c:1299
-#4  0x000002aa33a9563e in blk_aio_pwritev (blk=<optimized out>,
-offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
-cb=<optimized out>, opaque=0x3ff74019020) at block/block-backend.c:1400
-#5  0x000002aa338c6a38 in submit_requests (niov=<optimized out>,
-num_reqs=1, start=<optimized out>, mrb=0x3ff831fe6e0, blk=<optimized
-out>) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:369
-#6  virtio_blk_submit_multireq (blk=<optimized out>,
-address@hidden) at
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:426
-#7  0x000002aa338c7b78 in virtio_blk_handle_vq (s=0x2aa4f2507c8,
-vq=0x3ff869df010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
-#8  0x000002aa338ebdf2 in virtio_queue_notify_aio_vq (vq=0x3ff869df010)
-at /usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
-#9  0x000002aa33b2df46 in aio_dispatch_handlers
-(address@hidden) at util/aio-posix.c:406
-#10 0x000002aa33b2eb50 in aio_poll (ctx=0x2aa4f0ca050,
-address@hidden) at util/aio-posix.c:692
-#11 0x000002aa33957f6a in iothread_run (opaque=0x2aa4f0c9630) at
-iothread.c:60
-#12 0x000003ff86987e82 in start_thread () from /lib64/libpthread.so.0
-#13 0x000003ff85a11596 in thread_start () from /lib64/libc.so.6
-Backtrace stopped: previous frame identical to this frame (corrupt stack?)
-
-On Fri, Mar 02, 2018 at 10:30:57AM -0500, Farhan Ali wrote:
->
->
->
-On 03/02/2018 04:23 AM, Stefan Hajnoczi wrote:
->
-> On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote:
->
-> > Hi,
->
-> >
->
-> > I have been noticing some segfaults for QEMU on s390x, and I have been
->
-> > hitting this issue quite reliably (at least once in 10 runs of a test
->
-> > case).
->
-> > The qemu version is 2.11.50, and I have systemd created coredumps
->
-> > when this happens.
->
-> >
->
-> > Here is a back trace of the segfaulting thread:
->
-> The backtrace looks normal.
->
->
->
-> Please post the QEMU command-line and the details of the segfault (which
->
-> memory access faulted?).
->
->
->
->
->
-I was able to create another crash today and here is the qemu comand line
->
->
-/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \
->
--S -object
->
-secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes
->
-\
->
--machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \
->
--m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \
->
--object iothread,id=iothread1 -object iothread,id=iothread2 -uuid
->
-b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \
->
--display none -no-user-config -nodefaults -chardev
->
-socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait
->
->
--mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
->
-\
->
--boot strict=on -drive
->
-file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
->
--device
->
-virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
->
--drive
->
-file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native
->
--device
->
-virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1
->
--netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device
->
-virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000
->
--chardev pty,id=charconsole0 -device
->
-sclpconsole,chardev=charconsole0,id=console0 -device
->
-virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on
->
->
->
-This the latest back trace on the segfaulting thread, and it seems to
->
-segfault in swapcontext.
->
->
-Program terminated with signal SIGSEGV, Segmentation fault.
->
-#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
-Please include the following gdb output:
-
-  (gdb) disas swapcontext
-  (gdb) i r
-
-That way it's possible to see which instruction faulted and which
-registers were being accessed.
-
->
-This is the remaining back trace:
->
->
-#0  0x000003ff8595202c in swapcontext () from /lib64/libc.so.6
->
-#1  0x000002aa33b45566 in qemu_coroutine_new () at
->
-util/coroutine-ucontext.c:164
->
-#2  0x000002aa33b43eac in qemu_coroutine_create
->
-(address@hidden <blk_aio_write_entry>,
->
-address@hidden) at util/qemu-coroutine.c:76
->
-#3  0x000002aa33a954da in blk_aio_prwv (blk=0x2aa4f0efda0, offset=<optimized
->
-out>, bytes=<optimized out>, qiov=0x3ff74019080,
->
-address@hidden <blk_aio_write_entry>, flags=0,
->
-cb=0x2aa338c62e8 <virtio_blk_rw_complete>, opaque=0x3ff74019020) at
->
-block/block-backend.c:1299
->
-#4  0x000002aa33a9563e in blk_aio_pwritev (blk=<optimized out>,
->
-offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>,
->
-cb=<optimized out>, opaque=0x3ff74019020) at block/block-backend.c:1400
->
-#5  0x000002aa338c6a38 in submit_requests (niov=<optimized out>, num_reqs=1,
->
-start=<optimized out>, mrb=0x3ff831fe6e0, blk=<optimized out>) at
->
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:369
->
-#6  virtio_blk_submit_multireq (blk=<optimized out>,
->
-address@hidden) at
->
-/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:426
->
-#7  0x000002aa338c7b78 in virtio_blk_handle_vq (s=0x2aa4f2507c8,
->
-vq=0x3ff869df010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620
->
-#8  0x000002aa338ebdf2 in virtio_queue_notify_aio_vq (vq=0x3ff869df010) at
->
-/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515
->
-#9  0x000002aa33b2df46 in aio_dispatch_handlers
->
-(address@hidden) at util/aio-posix.c:406
->
-#10 0x000002aa33b2eb50 in aio_poll (ctx=0x2aa4f0ca050,
->
-address@hidden) at util/aio-posix.c:692
->
-#11 0x000002aa33957f6a in iothread_run (opaque=0x2aa4f0c9630) at
->
-iothread.c:60
->
-#12 0x000003ff86987e82 in start_thread () from /lib64/libpthread.so.0
->
-#13 0x000003ff85a11596 in thread_start () from /lib64/libc.so.6
->
-Backtrace stopped: previous frame identical to this frame (corrupt stack?)
->
-signature.asc
-Description:
-PGP signature
-
-On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
-Please include the following gdb output:
-
-   (gdb) disas swapcontext
-   (gdb) i r
-
-That way it's possible to see which instruction faulted and which
-registers were being accessed.
-here is the disas out for swapcontext, this is on a coredump with
-debugging symbols enabled for qemu. So the addresses from the previous
-dump is a little different.
-(gdb) disas swapcontext
-Dump of assembler code for function swapcontext:
-   0x000003ff90751fb8 <+0>:       lgr     %r1,%r2
-   0x000003ff90751fbc <+4>:       lgr     %r0,%r3
-   0x000003ff90751fc0 <+8>:       stfpc   248(%r1)
-   0x000003ff90751fc4 <+12>:      std     %f0,256(%r1)
-   0x000003ff90751fc8 <+16>:      std     %f1,264(%r1)
-   0x000003ff90751fcc <+20>:      std     %f2,272(%r1)
-   0x000003ff90751fd0 <+24>:      std     %f3,280(%r1)
-   0x000003ff90751fd4 <+28>:      std     %f4,288(%r1)
-   0x000003ff90751fd8 <+32>:      std     %f5,296(%r1)
-   0x000003ff90751fdc <+36>:      std     %f6,304(%r1)
-   0x000003ff90751fe0 <+40>:      std     %f7,312(%r1)
-   0x000003ff90751fe4 <+44>:      std     %f8,320(%r1)
-   0x000003ff90751fe8 <+48>:      std     %f9,328(%r1)
-   0x000003ff90751fec <+52>:      std     %f10,336(%r1)
-   0x000003ff90751ff0 <+56>:      std     %f11,344(%r1)
-   0x000003ff90751ff4 <+60>:      std     %f12,352(%r1)
-   0x000003ff90751ff8 <+64>:      std     %f13,360(%r1)
-   0x000003ff90751ffc <+68>:      std     %f14,368(%r1)
-   0x000003ff90752000 <+72>:      std     %f15,376(%r1)
-   0x000003ff90752004 <+76>:      slgr    %r2,%r2
-   0x000003ff90752008 <+80>:      stam    %a0,%a15,184(%r1)
-   0x000003ff9075200c <+84>:      stmg    %r0,%r15,56(%r1)
-   0x000003ff90752012 <+90>:      la      %r2,2
-   0x000003ff90752016 <+94>:      lgr     %r5,%r0
-   0x000003ff9075201a <+98>:      la      %r3,384(%r5)
-   0x000003ff9075201e <+102>:     la      %r4,384(%r1)
-   0x000003ff90752022 <+106>:     lghi    %r5,8
-   0x000003ff90752026 <+110>:     svc     175
-   0x000003ff90752028 <+112>:     lgr     %r5,%r0
-=> 0x000003ff9075202c <+116>:  lfpc    248(%r5)
-   0x000003ff90752030 <+120>:     ld      %f0,256(%r5)
-   0x000003ff90752034 <+124>:     ld      %f1,264(%r5)
-   0x000003ff90752038 <+128>:     ld      %f2,272(%r5)
-   0x000003ff9075203c <+132>:     ld      %f3,280(%r5)
-   0x000003ff90752040 <+136>:     ld      %f4,288(%r5)
-   0x000003ff90752044 <+140>:     ld      %f5,296(%r5)
-   0x000003ff90752048 <+144>:     ld      %f6,304(%r5)
-   0x000003ff9075204c <+148>:     ld      %f7,312(%r5)
-   0x000003ff90752050 <+152>:     ld      %f8,320(%r5)
-   0x000003ff90752054 <+156>:     ld      %f9,328(%r5)
-   0x000003ff90752058 <+160>:     ld      %f10,336(%r5)
-   0x000003ff9075205c <+164>:     ld      %f11,344(%r5)
-   0x000003ff90752060 <+168>:     ld      %f12,352(%r5)
-   0x000003ff90752064 <+172>:     ld      %f13,360(%r5)
-   0x000003ff90752068 <+176>:     ld      %f14,368(%r5)
-   0x000003ff9075206c <+180>:     ld      %f15,376(%r5)
-   0x000003ff90752070 <+184>:     lam     %a2,%a15,192(%r5)
-   0x000003ff90752074 <+188>:     lmg     %r0,%r15,56(%r5)
-   0x000003ff9075207a <+194>:     br      %r14
-End of assembler dump.
-
-(gdb) i r
-r0             0x0      0
-r1             0x3ff8fe7de40    4396165881408
-r2             0x0      0
-r3             0x3ff8fe7e1c0    4396165882304
-r4             0x3ff8fe7dfc0    4396165881792
-r5             0x0      0
-r6             0xffffffff88004880       18446744071696304256
-r7             0x3ff880009e0    4396033247712
-r8             0x27ff89000      10736930816
-r9             0x3ff88001460    4396033250400
-r10            0x1000   4096
-r11            0x1261be0        19274720
-r12            0x3ff88001e00    4396033252864
-r13            0x14d0bc0        21826496
-r14            0x1312ac8        19999432
-r15            0x3ff8fe7dc80    4396165880960
-pc             0x3ff9075202c    0x3ff9075202c <swapcontext+116>
-cc             0x2      2
-
-On 03/05/2018 07:45 PM, Farhan Ali wrote:
->
->
->
-On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
->
-> Please include the following gdb output:
->
->
->
->    (gdb) disas swapcontext
->
->    (gdb) i r
->
->
->
-> That way it's possible to see which instruction faulted and which
->
-> registers were being accessed.
->
->
->
-here is the disas out for swapcontext, this is on a coredump with debugging
->
-symbols enabled for qemu. So the addresses from the previous dump is a little
->
-different.
->
->
->
-(gdb) disas swapcontext
->
-Dump of assembler code for function swapcontext:
->
-   0x000003ff90751fb8 <+0>:    lgr    %r1,%r2
->
-   0x000003ff90751fbc <+4>:    lgr    %r0,%r3
->
-   0x000003ff90751fc0 <+8>:    stfpc    248(%r1)
->
-   0x000003ff90751fc4 <+12>:    std    %f0,256(%r1)
->
-   0x000003ff90751fc8 <+16>:    std    %f1,264(%r1)
->
-   0x000003ff90751fcc <+20>:    std    %f2,272(%r1)
->
-   0x000003ff90751fd0 <+24>:    std    %f3,280(%r1)
->
-   0x000003ff90751fd4 <+28>:    std    %f4,288(%r1)
->
-   0x000003ff90751fd8 <+32>:    std    %f5,296(%r1)
->
-   0x000003ff90751fdc <+36>:    std    %f6,304(%r1)
->
-   0x000003ff90751fe0 <+40>:    std    %f7,312(%r1)
->
-   0x000003ff90751fe4 <+44>:    std    %f8,320(%r1)
->
-   0x000003ff90751fe8 <+48>:    std    %f9,328(%r1)
->
-   0x000003ff90751fec <+52>:    std    %f10,336(%r1)
->
-   0x000003ff90751ff0 <+56>:    std    %f11,344(%r1)
->
-   0x000003ff90751ff4 <+60>:    std    %f12,352(%r1)
->
-   0x000003ff90751ff8 <+64>:    std    %f13,360(%r1)
->
-   0x000003ff90751ffc <+68>:    std    %f14,368(%r1)
->
-   0x000003ff90752000 <+72>:    std    %f15,376(%r1)
->
-   0x000003ff90752004 <+76>:    slgr    %r2,%r2
->
-   0x000003ff90752008 <+80>:    stam    %a0,%a15,184(%r1)
->
-   0x000003ff9075200c <+84>:    stmg    %r0,%r15,56(%r1)
->
-   0x000003ff90752012 <+90>:    la    %r2,2
->
-   0x000003ff90752016 <+94>:    lgr    %r5,%r0
->
-   0x000003ff9075201a <+98>:    la    %r3,384(%r5)
->
-   0x000003ff9075201e <+102>:    la    %r4,384(%r1)
->
-   0x000003ff90752022 <+106>:    lghi    %r5,8
->
-   0x000003ff90752026 <+110>:    svc    175
-sys_rt_sigprocmask. r0 should not be changed by the system call.
-
->
-   0x000003ff90752028 <+112>:    lgr    %r5,%r0
->
-=> 0x000003ff9075202c <+116>:    lfpc    248(%r5)
-so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the 
-2nd parameter to this
-function). Now this is odd.
-
->
-   0x000003ff90752030 <+120>:    ld    %f0,256(%r5)
->
-   0x000003ff90752034 <+124>:    ld    %f1,264(%r5)
->
-   0x000003ff90752038 <+128>:    ld    %f2,272(%r5)
->
-   0x000003ff9075203c <+132>:    ld    %f3,280(%r5)
->
-   0x000003ff90752040 <+136>:    ld    %f4,288(%r5)
->
-   0x000003ff90752044 <+140>:    ld    %f5,296(%r5)
->
-   0x000003ff90752048 <+144>:    ld    %f6,304(%r5)
->
-   0x000003ff9075204c <+148>:    ld    %f7,312(%r5)
->
-   0x000003ff90752050 <+152>:    ld    %f8,320(%r5)
->
-   0x000003ff90752054 <+156>:    ld    %f9,328(%r5)
->
-   0x000003ff90752058 <+160>:    ld    %f10,336(%r5)
->
-   0x000003ff9075205c <+164>:    ld    %f11,344(%r5)
->
-   0x000003ff90752060 <+168>:    ld    %f12,352(%r5)
->
-   0x000003ff90752064 <+172>:    ld    %f13,360(%r5)
->
-   0x000003ff90752068 <+176>:    ld    %f14,368(%r5)
->
-   0x000003ff9075206c <+180>:    ld    %f15,376(%r5)
->
-   0x000003ff90752070 <+184>:    lam    %a2,%a15,192(%r5)
->
-   0x000003ff90752074 <+188>:    lmg    %r0,%r15,56(%r5)
->
-   0x000003ff9075207a <+194>:    br    %r14
->
-End of assembler dump.
->
->
-(gdb) i r
->
-r0             0x0    0
->
-r1             0x3ff8fe7de40    4396165881408
->
-r2             0x0    0
->
-r3             0x3ff8fe7e1c0    4396165882304
->
-r4             0x3ff8fe7dfc0    4396165881792
->
-r5             0x0    0
->
-r6             0xffffffff88004880    18446744071696304256
->
-r7             0x3ff880009e0    4396033247712
->
-r8             0x27ff89000    10736930816
->
-r9             0x3ff88001460    4396033250400
->
-r10            0x1000    4096
->
-r11            0x1261be0    19274720
->
-r12            0x3ff88001e00    4396033252864
->
-r13            0x14d0bc0    21826496
->
-r14            0x1312ac8    19999432
->
-r15            0x3ff8fe7dc80    4396165880960
->
-pc             0x3ff9075202c    0x3ff9075202c <swapcontext+116>
->
-cc             0x2    2
-
-On 5 March 2018 at 18:54, Christian Borntraeger <address@hidden> wrote:
->
->
->
-On 03/05/2018 07:45 PM, Farhan Ali wrote:
->
->    0x000003ff90752026 <+110>:    svc    175
->
->
-sys_rt_sigprocmask. r0 should not be changed by the system call.
->
->
->    0x000003ff90752028 <+112>:    lgr    %r5,%r0
->
-> => 0x000003ff9075202c <+116>:    lfpc    248(%r5)
->
->
-so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the
->
-2nd parameter to this
->
-function). Now this is odd.
-...particularly given that the only place we call swapcontext()
-the second parameter is always the address of a local variable
-and can't be 0...
-
-thanks
--- PMM
-
-Do you happen to run with a recent host kernel that has 
-
-commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
-    s390: scrub registers on kernel entry and KVM exit
-
-
-
-
-
-Can you run with this on top
-diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
-index 13a133a6015c..d6dc0e5e8f74 100644
---- a/arch/s390/kernel/entry.S
-+++ b/arch/s390/kernel/entry.S
-@@ -426,13 +426,13 @@ ENTRY(system_call)
-        UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
-        BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
-        stmg    %r0,%r7,__PT_R0(%r11)
--       # clear user controlled register to prevent speculative use
--       xgr     %r0,%r0
-        mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
-        mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
-        mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
-        stg     %r14,__PT_FLAGS(%r11)
- .Lsysc_do_svc:
-+       # clear user controlled register to prevent speculative use
-+       xgr     %r0,%r0
-        # load address of system call table
-        lg      %r10,__THREAD_sysc_table(%r13,%r12)
-        llgh    %r8,__PT_INT_CODE+2(%r11)
-
-
-To me it looks like that the critical section cleanup (interrupt during system 
-call entry) might
-save the registers again into ptregs but we have already zeroed out r0.
-This patch moves the clearing of r0 after sysc_do_svc, which should fix the 
-critical
-section cleanup.
-
-Adding Martin and Heiko. Will spin a patch.
-
-
-On 03/05/2018 07:54 PM, Christian Borntraeger wrote:
->
->
->
-On 03/05/2018 07:45 PM, Farhan Ali wrote:
->
->
->
->
->
-> On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
->
->> Please include the following gdb output:
->
->>
->
->>    (gdb) disas swapcontext
->
->>    (gdb) i r
->
->>
->
->> That way it's possible to see which instruction faulted and which
->
->> registers were being accessed.
->
->
->
->
->
-> here is the disas out for swapcontext, this is on a coredump with debugging
->
-> symbols enabled for qemu. So the addresses from the previous dump is a
->
-> little different.
->
->
->
->
->
-> (gdb) disas swapcontext
->
-> Dump of assembler code for function swapcontext:
->
->    0x000003ff90751fb8 <+0>:    lgr    %r1,%r2
->
->    0x000003ff90751fbc <+4>:    lgr    %r0,%r3
->
->    0x000003ff90751fc0 <+8>:    stfpc    248(%r1)
->
->    0x000003ff90751fc4 <+12>:    std    %f0,256(%r1)
->
->    0x000003ff90751fc8 <+16>:    std    %f1,264(%r1)
->
->    0x000003ff90751fcc <+20>:    std    %f2,272(%r1)
->
->    0x000003ff90751fd0 <+24>:    std    %f3,280(%r1)
->
->    0x000003ff90751fd4 <+28>:    std    %f4,288(%r1)
->
->    0x000003ff90751fd8 <+32>:    std    %f5,296(%r1)
->
->    0x000003ff90751fdc <+36>:    std    %f6,304(%r1)
->
->    0x000003ff90751fe0 <+40>:    std    %f7,312(%r1)
->
->    0x000003ff90751fe4 <+44>:    std    %f8,320(%r1)
->
->    0x000003ff90751fe8 <+48>:    std    %f9,328(%r1)
->
->    0x000003ff90751fec <+52>:    std    %f10,336(%r1)
->
->    0x000003ff90751ff0 <+56>:    std    %f11,344(%r1)
->
->    0x000003ff90751ff4 <+60>:    std    %f12,352(%r1)
->
->    0x000003ff90751ff8 <+64>:    std    %f13,360(%r1)
->
->    0x000003ff90751ffc <+68>:    std    %f14,368(%r1)
->
->    0x000003ff90752000 <+72>:    std    %f15,376(%r1)
->
->    0x000003ff90752004 <+76>:    slgr    %r2,%r2
->
->    0x000003ff90752008 <+80>:    stam    %a0,%a15,184(%r1)
->
->    0x000003ff9075200c <+84>:    stmg    %r0,%r15,56(%r1)
->
->    0x000003ff90752012 <+90>:    la    %r2,2
->
->    0x000003ff90752016 <+94>:    lgr    %r5,%r0
->
->    0x000003ff9075201a <+98>:    la    %r3,384(%r5)
->
->    0x000003ff9075201e <+102>:    la    %r4,384(%r1)
->
->    0x000003ff90752022 <+106>:    lghi    %r5,8
->
->    0x000003ff90752026 <+110>:    svc    175
->
->
-sys_rt_sigprocmask. r0 should not be changed by the system call.
->
->
->    0x000003ff90752028 <+112>:    lgr    %r5,%r0
->
-> => 0x000003ff9075202c <+116>:    lfpc    248(%r5)
->
->
-so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the
->
-2nd parameter to this
->
-function). Now this is odd.
->
->
->    0x000003ff90752030 <+120>:    ld    %f0,256(%r5)
->
->    0x000003ff90752034 <+124>:    ld    %f1,264(%r5)
->
->    0x000003ff90752038 <+128>:    ld    %f2,272(%r5)
->
->    0x000003ff9075203c <+132>:    ld    %f3,280(%r5)
->
->    0x000003ff90752040 <+136>:    ld    %f4,288(%r5)
->
->    0x000003ff90752044 <+140>:    ld    %f5,296(%r5)
->
->    0x000003ff90752048 <+144>:    ld    %f6,304(%r5)
->
->    0x000003ff9075204c <+148>:    ld    %f7,312(%r5)
->
->    0x000003ff90752050 <+152>:    ld    %f8,320(%r5)
->
->    0x000003ff90752054 <+156>:    ld    %f9,328(%r5)
->
->    0x000003ff90752058 <+160>:    ld    %f10,336(%r5)
->
->    0x000003ff9075205c <+164>:    ld    %f11,344(%r5)
->
->    0x000003ff90752060 <+168>:    ld    %f12,352(%r5)
->
->    0x000003ff90752064 <+172>:    ld    %f13,360(%r5)
->
->    0x000003ff90752068 <+176>:    ld    %f14,368(%r5)
->
->    0x000003ff9075206c <+180>:    ld    %f15,376(%r5)
->
->    0x000003ff90752070 <+184>:    lam    %a2,%a15,192(%r5)
->
->    0x000003ff90752074 <+188>:    lmg    %r0,%r15,56(%r5)
->
->    0x000003ff9075207a <+194>:    br    %r14
->
-> End of assembler dump.
->
->
->
-> (gdb) i r
->
-> r0             0x0    0
->
-> r1             0x3ff8fe7de40    4396165881408
->
-> r2             0x0    0
->
-> r3             0x3ff8fe7e1c0    4396165882304
->
-> r4             0x3ff8fe7dfc0    4396165881792
->
-> r5             0x0    0
->
-> r6             0xffffffff88004880    18446744071696304256
->
-> r7             0x3ff880009e0    4396033247712
->
-> r8             0x27ff89000    10736930816
->
-> r9             0x3ff88001460    4396033250400
->
-> r10            0x1000    4096
->
-> r11            0x1261be0    19274720
->
-> r12            0x3ff88001e00    4396033252864
->
-> r13            0x14d0bc0    21826496
->
-> r14            0x1312ac8    19999432
->
-> r15            0x3ff8fe7dc80    4396165880960
->
-> pc             0x3ff9075202c    0x3ff9075202c <swapcontext+116>
->
-> cc             0x2    2
-
-On 03/05/2018 02:08 PM, Christian Borntraeger wrote:
-Do you happen to run with a recent host kernel that has
-
-commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
-     s390: scrub registers on kernel entry and KVM exit
-Yes.
-Can you run with this on top
-diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
-index 13a133a6015c..d6dc0e5e8f74 100644
---- a/arch/s390/kernel/entry.S
-+++ b/arch/s390/kernel/entry.S
-@@ -426,13 +426,13 @@ ENTRY(system_call)
-         UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
-         BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
-         stmg    %r0,%r7,__PT_R0(%r11)
--       # clear user controlled register to prevent speculative use
--       xgr     %r0,%r0
-         mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
-         mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
-         mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
-         stg     %r14,__PT_FLAGS(%r11)
-  .Lsysc_do_svc:
-+       # clear user controlled register to prevent speculative use
-+       xgr     %r0,%r0
-         # load address of system call table
-         lg      %r10,__THREAD_sysc_table(%r13,%r12)
-         llgh    %r8,__PT_INT_CODE+2(%r11)
-
-
-To me it looks like that the critical section cleanup (interrupt during system 
-call entry) might
-save the registers again into ptregs but we have already zeroed out r0.
-This patch moves the clearing of r0 after sysc_do_svc, which should fix the 
-critical
-section cleanup.
-Okay I will run with this.
-Adding Martin and Heiko. Will spin a patch.
-
-
-On 03/05/2018 07:54 PM, Christian Borntraeger wrote:
-On 03/05/2018 07:45 PM, Farhan Ali wrote:
-On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote:
-Please include the following gdb output:
-
-    (gdb) disas swapcontext
-    (gdb) i r
-
-That way it's possible to see which instruction faulted and which
-registers were being accessed.
-here is the disas out for swapcontext, this is on a coredump with debugging 
-symbols enabled for qemu. So the addresses from the previous dump is a little 
-different.
-
-
-(gdb) disas swapcontext
-Dump of assembler code for function swapcontext:
-    0x000003ff90751fb8 <+0>:    lgr    %r1,%r2
-    0x000003ff90751fbc <+4>:    lgr    %r0,%r3
-    0x000003ff90751fc0 <+8>:    stfpc    248(%r1)
-    0x000003ff90751fc4 <+12>:    std    %f0,256(%r1)
-    0x000003ff90751fc8 <+16>:    std    %f1,264(%r1)
-    0x000003ff90751fcc <+20>:    std    %f2,272(%r1)
-    0x000003ff90751fd0 <+24>:    std    %f3,280(%r1)
-    0x000003ff90751fd4 <+28>:    std    %f4,288(%r1)
-    0x000003ff90751fd8 <+32>:    std    %f5,296(%r1)
-    0x000003ff90751fdc <+36>:    std    %f6,304(%r1)
-    0x000003ff90751fe0 <+40>:    std    %f7,312(%r1)
-    0x000003ff90751fe4 <+44>:    std    %f8,320(%r1)
-    0x000003ff90751fe8 <+48>:    std    %f9,328(%r1)
-    0x000003ff90751fec <+52>:    std    %f10,336(%r1)
-    0x000003ff90751ff0 <+56>:    std    %f11,344(%r1)
-    0x000003ff90751ff4 <+60>:    std    %f12,352(%r1)
-    0x000003ff90751ff8 <+64>:    std    %f13,360(%r1)
-    0x000003ff90751ffc <+68>:    std    %f14,368(%r1)
-    0x000003ff90752000 <+72>:    std    %f15,376(%r1)
-    0x000003ff90752004 <+76>:    slgr    %r2,%r2
-    0x000003ff90752008 <+80>:    stam    %a0,%a15,184(%r1)
-    0x000003ff9075200c <+84>:    stmg    %r0,%r15,56(%r1)
-    0x000003ff90752012 <+90>:    la    %r2,2
-    0x000003ff90752016 <+94>:    lgr    %r5,%r0
-    0x000003ff9075201a <+98>:    la    %r3,384(%r5)
-    0x000003ff9075201e <+102>:    la    %r4,384(%r1)
-    0x000003ff90752022 <+106>:    lghi    %r5,8
-    0x000003ff90752026 <+110>:    svc    175
-sys_rt_sigprocmask. r0 should not be changed by the system call.
-   0x000003ff90752028 <+112>:    lgr    %r5,%r0
-=> 0x000003ff9075202c <+116>:    lfpc    248(%r5)
-so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the 
-2nd parameter to this
-function). Now this is odd.
-   0x000003ff90752030 <+120>:    ld    %f0,256(%r5)
-    0x000003ff90752034 <+124>:    ld    %f1,264(%r5)
-    0x000003ff90752038 <+128>:    ld    %f2,272(%r5)
-    0x000003ff9075203c <+132>:    ld    %f3,280(%r5)
-    0x000003ff90752040 <+136>:    ld    %f4,288(%r5)
-    0x000003ff90752044 <+140>:    ld    %f5,296(%r5)
-    0x000003ff90752048 <+144>:    ld    %f6,304(%r5)
-    0x000003ff9075204c <+148>:    ld    %f7,312(%r5)
-    0x000003ff90752050 <+152>:    ld    %f8,320(%r5)
-    0x000003ff90752054 <+156>:    ld    %f9,328(%r5)
-    0x000003ff90752058 <+160>:    ld    %f10,336(%r5)
-    0x000003ff9075205c <+164>:    ld    %f11,344(%r5)
-    0x000003ff90752060 <+168>:    ld    %f12,352(%r5)
-    0x000003ff90752064 <+172>:    ld    %f13,360(%r5)
-    0x000003ff90752068 <+176>:    ld    %f14,368(%r5)
-    0x000003ff9075206c <+180>:    ld    %f15,376(%r5)
-    0x000003ff90752070 <+184>:    lam    %a2,%a15,192(%r5)
-    0x000003ff90752074 <+188>:    lmg    %r0,%r15,56(%r5)
-    0x000003ff9075207a <+194>:    br    %r14
-End of assembler dump.
-
-(gdb) i r
-r0             0x0    0
-r1             0x3ff8fe7de40    4396165881408
-r2             0x0    0
-r3             0x3ff8fe7e1c0    4396165882304
-r4             0x3ff8fe7dfc0    4396165881792
-r5             0x0    0
-r6             0xffffffff88004880    18446744071696304256
-r7             0x3ff880009e0    4396033247712
-r8             0x27ff89000    10736930816
-r9             0x3ff88001460    4396033250400
-r10            0x1000    4096
-r11            0x1261be0    19274720
-r12            0x3ff88001e00    4396033252864
-r13            0x14d0bc0    21826496
-r14            0x1312ac8    19999432
-r15            0x3ff8fe7dc80    4396165880960
-pc             0x3ff9075202c    0x3ff9075202c <swapcontext+116>
-cc             0x2    2
-
-On Mon, 5 Mar 2018 20:08:45 +0100
-Christian Borntraeger <address@hidden> wrote:
-
->
-Do you happen to run with a recent host kernel that has
->
->
-commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
->
-s390: scrub registers on kernel entry and KVM exit
->
->
-Can you run with this on top
->
-diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
->
-index 13a133a6015c..d6dc0e5e8f74 100644
->
---- a/arch/s390/kernel/entry.S
->
-+++ b/arch/s390/kernel/entry.S
->
-@@ -426,13 +426,13 @@ ENTRY(system_call)
->
-UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
->
-BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
->
-stmg    %r0,%r7,__PT_R0(%r11)
->
--       # clear user controlled register to prevent speculative use
->
--       xgr     %r0,%r0
->
-mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
->
-mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
->
-mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
->
-stg     %r14,__PT_FLAGS(%r11)
->
-.Lsysc_do_svc:
->
-+       # clear user controlled register to prevent speculative use
->
-+       xgr     %r0,%r0
->
-# load address of system call table
->
-lg      %r10,__THREAD_sysc_table(%r13,%r12)
->
-llgh    %r8,__PT_INT_CODE+2(%r11)
->
->
->
-To me it looks like that the critical section cleanup (interrupt during
->
-system call entry) might
->
-save the registers again into ptregs but we have already zeroed out r0.
->
-This patch moves the clearing of r0 after sysc_do_svc, which should fix the
->
-critical
->
-section cleanup.
->
->
-Adding Martin and Heiko. Will spin a patch.
-Argh, yes. Thanks Chrisitan, this is it. I have been searching for the bug
-for days now. The point is that if the system call handler is interrupted
-after the xgr but before .Lsysc_do_svc the code at .Lcleanup_system_call 
-repeats the stmg for %r0-%r7 but now %r0 is already zero.
-
-Please commit a patch for this and I'll will queue it up immediately.
-
--- 
-blue skies,
-   Martin.
-
-"Reality continues to ruin my life." - Calvin.
-
-On 03/06/2018 01:34 AM, Martin Schwidefsky wrote:
-On Mon, 5 Mar 2018 20:08:45 +0100
-Christian Borntraeger <address@hidden> wrote:
-Do you happen to run with a recent host kernel that has
-
-commit 7041d28115e91f2144f811ffe8a195c696b1e1d0
-     s390: scrub registers on kernel entry and KVM exit
-
-Can you run with this on top
-diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
-index 13a133a6015c..d6dc0e5e8f74 100644
---- a/arch/s390/kernel/entry.S
-+++ b/arch/s390/kernel/entry.S
-@@ -426,13 +426,13 @@ ENTRY(system_call)
-         UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER
-         BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP
-         stmg    %r0,%r7,__PT_R0(%r11)
--       # clear user controlled register to prevent speculative use
--       xgr     %r0,%r0
-         mvc     __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC
-         mvc     __PT_PSW(16,%r11),__LC_SVC_OLD_PSW
-         mvc     __PT_INT_CODE(4,%r11),__LC_SVC_ILC
-         stg     %r14,__PT_FLAGS(%r11)
-  .Lsysc_do_svc:
-+       # clear user controlled register to prevent speculative use
-+       xgr     %r0,%r0
-         # load address of system call table
-         lg      %r10,__THREAD_sysc_table(%r13,%r12)
-         llgh    %r8,__PT_INT_CODE+2(%r11)
-
-
-To me it looks like that the critical section cleanup (interrupt during system 
-call entry) might
-save the registers again into ptregs but we have already zeroed out r0.
-This patch moves the clearing of r0 after sysc_do_svc, which should fix the 
-critical
-section cleanup.
-
-Adding Martin and Heiko. Will spin a patch.
-Argh, yes. Thanks Chrisitan, this is it. I have been searching for the bug
-for days now. The point is that if the system call handler is interrupted
-after the xgr but before .Lsysc_do_svc the code at .Lcleanup_system_call
-repeats the stmg for %r0-%r7 but now %r0 is already zero.
-
-Please commit a patch for this and I'll will queue it up immediately.
-This patch does fix the QEMU crash. I haven't seen the crash after
-running the test case for more than a day. Thanks to everyone for taking
-a look at this problem :)
-Thanks
-Farhan
-
diff --git a/results/classifier/008/other/22219210 b/results/classifier/008/other/22219210
deleted file mode 100644
index 62e24e9f0..000000000
--- a/results/classifier/008/other/22219210
+++ /dev/null
@@ -1,53 +0,0 @@
-graphic: 0.701
-performance: 0.498
-device: 0.489
-semantic: 0.387
-other: 0.345
-network: 0.323
-socket: 0.244
-debug: 0.225
-PID: 0.214
-vnc: 0.204
-files: 0.202
-permissions: 0.141
-KVM: 0.099
-boot: 0.070
-
-[BUG][CPU hot-plug]CPU hot-plugs cause the qemu process to coredump
-
-Hello,Recently, when I was developing CPU hot-plugs under the loongarch
-architecture,
-I found that there was a problem with qemu cpu hot-plugs under x86
-architecture,
-which caused the qemu process coredump when repeatedly inserting and
-unplugging
-the CPU when the TCG was accelerated.
-
-
-The specific operation process is as follows:
-
-1.Use the following command to start the virtual machine
-
-qemu-system-x86_64 \
--machine q35  \
--cpu Broadwell-IBRS \
--smp 1,maxcpus=4,sockets=4,cores=1,threads=1 \
--m 4G \
--drive file=~/anolis-8.8.qcow2  \
--serial stdio   \
--monitor telnet:localhost:4498,server,nowait
-
-
-2.Enter QEMU Monitor via telnet for repeated CPU insertion and unplugging
-
-telnet 127.0.0.1 4498
-(qemu) device_add
-Broadwell-IBRS-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=cpu1
-(qemu) device_del cpu1
-(qemu) device_add
-Broadwell-IBRS-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=cpu1
-3.You will notice that the QEMU process has a coredump
-
-# malloc(): unsorted double linked list corrupted
-Aborted (core dumped)
-
diff --git a/results/classifier/008/other/23270873 b/results/classifier/008/other/23270873
deleted file mode 100644
index 0140ac0a4..000000000
--- a/results/classifier/008/other/23270873
+++ /dev/null
@@ -1,702 +0,0 @@
-other: 0.839
-boot: 0.830
-vnc: 0.820
-device: 0.810
-KVM: 0.803
-permissions: 0.802
-debug: 0.788
-network: 0.768
-graphic: 0.764
-socket: 0.758
-semantic: 0.752
-performance: 0.744
-PID: 0.731
-files: 0.730
-
-[Qemu-devel] [BUG?] aio_get_linux_aio: Assertion `ctx->linux_aio' failed
-
-Hi,
-
-I am seeing some strange QEMU assertion failures for qemu on s390x,
-which prevents a guest from starting.
-
-Git bisecting points to the following commit as the source of the error.
-
-commit ed6e2161715c527330f936d44af4c547f25f687e
-Author: Nishanth Aravamudan <address@hidden>
-Date:   Fri Jun 22 12:37:00 2018 -0700
-
-    linux-aio: properly bubble up errors from initialization
-
-    laio_init() can fail for a couple of reasons, which will lead to a NULL
-    pointer dereference in laio_attach_aio_context().
-
-    To solve this, add a aio_setup_linux_aio() function which is called
-    early in raw_open_common. If this fails, propagate the error up. The
-    signature of aio_get_linux_aio() was not modified, because it seems
-    preferable to return the actual errno from the possible failing
-    initialization calls.
-
-    Additionally, when the AioContext changes, we need to associate a
-    LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context
-    callback and call the new aio_setup_linux_aio(), which will allocate a
-new AioContext if needed, and return errors on failures. If it
-fails for
-any reason, fallback to threaded AIO with an error message, as the
-    device is already in-use by the guest.
-
-    Add an assert that aio_get_linux_aio() cannot return NULL.
-
-    Signed-off-by: Nishanth Aravamudan <address@hidden>
-    Message-id: address@hidden
-    Signed-off-by: Stefan Hajnoczi <address@hidden>
-Not sure what is causing this assertion to fail. Here is the qemu
-command line of the guest, from qemu log, which throws this error:
-LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
-QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name
-guest=rt_vm1,debug-threads=on -S -object
-secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes
--machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m
-1024 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object
-iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d
--display none -no-user-config -nodefaults -chardev
-socket,id=charmonitor,fd=28,server,nowait -mon
-chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
--boot strict=on -drive
-file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
--device
-virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
--netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device
-virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000
--netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device
-virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002
--chardev pty,id=charconsole0 -device
-sclpconsole,chardev=charconsole0,id=console0 -device
-virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox
-on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
--msg timestamp=on
-2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges
-2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev
-pty,id=charconsole0: char device redirected to /dev/pts/3 (label
-charconsole0)
-qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion
-`ctx->linux_aio' failed.
-2018-07-17 15:48:43.309+0000: shutting down, reason=failed
-
-
-Any help debugging this would be greatly appreciated.
-
-Thank you
-Farhan
-
-On 17.07.2018 [13:25:53 -0400], Farhan Ali wrote:
->
-Hi,
->
->
-I am seeing some strange QEMU assertion failures for qemu on s390x,
->
-which prevents a guest from starting.
->
->
-Git bisecting points to the following commit as the source of the error.
->
->
-commit ed6e2161715c527330f936d44af4c547f25f687e
->
-Author: Nishanth Aravamudan <address@hidden>
->
-Date:   Fri Jun 22 12:37:00 2018 -0700
->
->
-linux-aio: properly bubble up errors from initialization
->
->
-laio_init() can fail for a couple of reasons, which will lead to a NULL
->
-pointer dereference in laio_attach_aio_context().
->
->
-To solve this, add a aio_setup_linux_aio() function which is called
->
-early in raw_open_common. If this fails, propagate the error up. The
->
-signature of aio_get_linux_aio() was not modified, because it seems
->
-preferable to return the actual errno from the possible failing
->
-initialization calls.
->
->
-Additionally, when the AioContext changes, we need to associate a
->
-LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context
->
-callback and call the new aio_setup_linux_aio(), which will allocate a
->
-new AioContext if needed, and return errors on failures. If it fails for
->
-any reason, fallback to threaded AIO with an error message, as the
->
-device is already in-use by the guest.
->
->
-Add an assert that aio_get_linux_aio() cannot return NULL.
->
->
-Signed-off-by: Nishanth Aravamudan <address@hidden>
->
-Message-id: address@hidden
->
-Signed-off-by: Stefan Hajnoczi <address@hidden>
->
->
->
-Not sure what is causing this assertion to fail. Here is the qemu command
->
-line of the guest, from qemu log, which throws this error:
->
->
->
-LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
->
-QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name
->
-guest=rt_vm1,debug-threads=on -S -object
->
-secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes
->
--machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m 1024
->
--realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object
->
-iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d -display
->
-none -no-user-config -nodefaults -chardev
->
-socket,id=charmonitor,fd=28,server,nowait -mon
->
-chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot
->
-strict=on -drive
->
-file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
->
--device
->
-virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
->
--netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device
->
-virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000
->
--netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device
->
-virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002
->
--chardev pty,id=charconsole0 -device
->
-sclpconsole,chardev=charconsole0,id=console0 -device
->
-virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox
->
-on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg
->
-timestamp=on
->
->
->
->
-2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges
->
-2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev pty,id=charconsole0:
->
-char device redirected to /dev/pts/3 (label charconsole0)
->
-qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion
->
-`ctx->linux_aio' failed.
->
-2018-07-17 15:48:43.309+0000: shutting down, reason=failed
->
->
->
-Any help debugging this would be greatly appreciated.
-iiuc, this possibly implies AIO was not actually used previously on this
-guest (it might have silently been falling back to threaded IO?). I
-don't have access to s390x, but would it be possible to run qemu under
-gdb and see if aio_setup_linux_aio is being called at all (I think it
-might not be, but I'm not sure why), and if so, if it's for the context
-in question?
-
-If it's not being called first, could you see what callpath is calling
-aio_get_linux_aio when this assertion trips?
-
-Thanks!
--Nish
-
-On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
-iiuc, this possibly implies AIO was not actually used previously on this
-guest (it might have silently been falling back to threaded IO?). I
-don't have access to s390x, but would it be possible to run qemu under
-gdb and see if aio_setup_linux_aio is being called at all (I think it
-might not be, but I'm not sure why), and if so, if it's for the context
-in question?
-
-If it's not being called first, could you see what callpath is calling
-aio_get_linux_aio when this assertion trips?
-
-Thanks!
--Nish
-Hi Nishant,
-From the coredump of the guest this is the call trace that calls
-aio_get_linux_aio:
-Stack trace of thread 145158:
-#0  0x000003ff94dbe274 raise (libc.so.6)
-#1  0x000003ff94da39a8 abort (libc.so.6)
-#2  0x000003ff94db62ce __assert_fail_base (libc.so.6)
-#3  0x000003ff94db634c __assert_fail (libc.so.6)
-#4  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
-#5  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
-#6  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
-#7  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
-#8  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
-#9  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
-#10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
-#11 0x000003ff94f879a8 start_thread (libpthread.so.0)
-#12 0x000003ff94e797ee thread_start (libc.so.6)
-
-
-Thanks for taking a look and responding.
-
-Thanks
-Farhan
-
-On 07/18/2018 09:42 AM, Farhan Ali wrote:
-On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
-iiuc, this possibly implies AIO was not actually used previously on this
-guest (it might have silently been falling back to threaded IO?). I
-don't have access to s390x, but would it be possible to run qemu under
-gdb and see if aio_setup_linux_aio is being called at all (I think it
-might not be, but I'm not sure why), and if so, if it's for the context
-in question?
-
-If it's not being called first, could you see what callpath is calling
-aio_get_linux_aio when this assertion trips?
-
-Thanks!
--Nish
-Hi Nishant,
-From the coredump of the guest this is the call trace that calls
-aio_get_linux_aio:
-Stack trace of thread 145158:
-#0  0x000003ff94dbe274 raise (libc.so.6)
-#1  0x000003ff94da39a8 abort (libc.so.6)
-#2  0x000003ff94db62ce __assert_fail_base (libc.so.6)
-#3  0x000003ff94db634c __assert_fail (libc.so.6)
-#4  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
-#5  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
-#6  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
-#7  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
-#8  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
-#9  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
-#10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
-#11 0x000003ff94f879a8 start_thread (libpthread.so.0)
-#12 0x000003ff94e797ee thread_start (libc.so.6)
-
-
-Thanks for taking a look and responding.
-
-Thanks
-Farhan
-Trying to debug a little further, the block device in this case is a
-"host device". And looking at your commit carefully you use the
-bdrv_attach_aio_context callback to setup a Linux AioContext.
-For some reason the "host device" struct (BlockDriver bdrv_host_device
-in block/file-posix.c) does not have a bdrv_attach_aio_context defined.
-So a simple change of adding the callback to the struct solves the issue
-and the guest starts fine.
-diff --git a/block/file-posix.c b/block/file-posix.c
-index 28824aa..b8d59fb 100644
---- a/block/file-posix.c
-+++ b/block/file-posix.c
-@@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = {
-     .bdrv_refresh_limits = raw_refresh_limits,
-     .bdrv_io_plug = raw_aio_plug,
-     .bdrv_io_unplug = raw_aio_unplug,
-+    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
-
-     .bdrv_co_truncate       = raw_co_truncate,
-     .bdrv_getlength    = raw_getlength,
-I am not too familiar with block device code in QEMU, so not sure if
-this is the right fix or if there are some underlying problems.
-Thanks
-Farhan
-
-On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote:
->
->
->
-On 07/18/2018 09:42 AM, Farhan Ali wrote:
->
->
->
->
->
-> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
->
-> > iiuc, this possibly implies AIO was not actually used previously on this
->
-> > guest (it might have silently been falling back to threaded IO?). I
->
-> > don't have access to s390x, but would it be possible to run qemu under
->
-> > gdb and see if aio_setup_linux_aio is being called at all (I think it
->
-> > might not be, but I'm not sure why), and if so, if it's for the context
->
-> > in question?
->
-> >
->
-> > If it's not being called first, could you see what callpath is calling
->
-> > aio_get_linux_aio when this assertion trips?
->
-> >
->
-> > Thanks!
->
-> > -Nish
->
->
->
->
->
-> Hi Nishant,
->
->
->
->  From the coredump of the guest this is the call trace that calls
->
-> aio_get_linux_aio:
->
->
->
->
->
-> Stack trace of thread 145158:
->
-> #0  0x000003ff94dbe274 raise (libc.so.6)
->
-> #1  0x000003ff94da39a8 abort (libc.so.6)
->
-> #2  0x000003ff94db62ce __assert_fail_base (libc.so.6)
->
-> #3  0x000003ff94db634c __assert_fail (libc.so.6)
->
-> #4  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
->
-> #5  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
->
-> #6  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
->
-> #7  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
->
-> #8  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
->
-> #9  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
->
-> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
->
-> #11 0x000003ff94f879a8 start_thread (libpthread.so.0)
->
-> #12 0x000003ff94e797ee thread_start (libc.so.6)
->
->
->
->
->
-> Thanks for taking a look and responding.
->
->
->
-> Thanks
->
-> Farhan
->
->
->
->
->
->
->
->
-Trying to debug a little further, the block device in this case is a "host
->
-device". And looking at your commit carefully you use the
->
-bdrv_attach_aio_context callback to setup a Linux AioContext.
->
->
-For some reason the "host device" struct (BlockDriver bdrv_host_device in
->
-block/file-posix.c) does not have a bdrv_attach_aio_context defined.
->
-So a simple change of adding the callback to the struct solves the issue and
->
-the guest starts fine.
->
->
->
-diff --git a/block/file-posix.c b/block/file-posix.c
->
-index 28824aa..b8d59fb 100644
->
---- a/block/file-posix.c
->
-+++ b/block/file-posix.c
->
-@@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = {
->
-.bdrv_refresh_limits = raw_refresh_limits,
->
-.bdrv_io_plug = raw_aio_plug,
->
-.bdrv_io_unplug = raw_aio_unplug,
->
-+    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
->
->
-.bdrv_co_truncate       = raw_co_truncate,
->
-.bdrv_getlength    = raw_getlength,
->
->
->
->
-I am not too familiar with block device code in QEMU, so not sure if
->
-this is the right fix or if there are some underlying problems.
-Oh this is quite embarassing! I only added the bdrv_attach_aio_context
-callback for the file-backed device. Your fix is definitely corect for
-host device. Let me make sure there weren't any others missed and I will
-send out a properly formatted patch. Thank you for the quick testing and
-turnaround!
-
--Nish
-
-On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote:
->
-On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote:
->
->
->
->
->
-> On 07/18/2018 09:42 AM, Farhan Ali wrote:
->
->>
->
->>
->
->> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote:
->
->>> iiuc, this possibly implies AIO was not actually used previously on this
->
->>> guest (it might have silently been falling back to threaded IO?). I
->
->>> don't have access to s390x, but would it be possible to run qemu under
->
->>> gdb and see if aio_setup_linux_aio is being called at all (I think it
->
->>> might not be, but I'm not sure why), and if so, if it's for the context
->
->>> in question?
->
->>>
->
->>> If it's not being called first, could you see what callpath is calling
->
->>> aio_get_linux_aio when this assertion trips?
->
->>>
->
->>> Thanks!
->
->>> -Nish
->
->>
->
->>
->
->> Hi Nishant,
->
->>
->
->>  From the coredump of the guest this is the call trace that calls
->
->> aio_get_linux_aio:
->
->>
->
->>
->
->> Stack trace of thread 145158:
->
->> #0  0x000003ff94dbe274 raise (libc.so.6)
->
->> #1  0x000003ff94da39a8 abort (libc.so.6)
->
->> #2  0x000003ff94db62ce __assert_fail_base (libc.so.6)
->
->> #3  0x000003ff94db634c __assert_fail (libc.so.6)
->
->> #4  0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x)
->
->> #5  0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x)
->
->> #6  0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x)
->
->> #7  0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x)
->
->> #8  0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x)
->
->> #9  0x000002aa20db3c34 aio_poll (qemu-system-s390x)
->
->> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x)
->
->> #11 0x000003ff94f879a8 start_thread (libpthread.so.0)
->
->> #12 0x000003ff94e797ee thread_start (libc.so.6)
->
->>
->
->>
->
->> Thanks for taking a look and responding.
->
->>
->
->> Thanks
->
->> Farhan
->
->>
->
->>
->
->>
->
->
->
-> Trying to debug a little further, the block device in this case is a "host
->
-> device". And looking at your commit carefully you use the
->
-> bdrv_attach_aio_context callback to setup a Linux AioContext.
->
->
->
-> For some reason the "host device" struct (BlockDriver bdrv_host_device in
->
-> block/file-posix.c) does not have a bdrv_attach_aio_context defined.
->
-> So a simple change of adding the callback to the struct solves the issue and
->
-> the guest starts fine.
->
->
->
->
->
-> diff --git a/block/file-posix.c b/block/file-posix.c
->
-> index 28824aa..b8d59fb 100644
->
-> --- a/block/file-posix.c
->
-> +++ b/block/file-posix.c
->
-> @@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = {
->
->      .bdrv_refresh_limits = raw_refresh_limits,
->
->      .bdrv_io_plug = raw_aio_plug,
->
->      .bdrv_io_unplug = raw_aio_unplug,
->
-> +    .bdrv_attach_aio_context = raw_aio_attach_aio_context,
->
->
->
->      .bdrv_co_truncate       = raw_co_truncate,
->
->      .bdrv_getlength    = raw_getlength,
->
->
->
->
->
->
->
-> I am not too familiar with block device code in QEMU, so not sure if
->
-> this is the right fix or if there are some underlying problems.
->
->
-Oh this is quite embarassing! I only added the bdrv_attach_aio_context
->
-callback for the file-backed device. Your fix is definitely corect for
->
-host device. Let me make sure there weren't any others missed and I will
->
-send out a properly formatted patch. Thank you for the quick testing and
->
-turnaround!
-Farhan, can you respin your patch with proper sign-off and patch description?
-Adding qemu-block.
-
-Hi Christian,
-
-On 19.07.2018 [08:55:20 +0200], Christian Borntraeger wrote:
->
->
->
-On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote:
->
-> On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote:
->
->>
->
->>
->
->> On 07/18/2018 09:42 AM, Farhan Ali wrote:
-<snip>
-
->
->> I am not too familiar with block device code in QEMU, so not sure if
->
->> this is the right fix or if there are some underlying problems.
->
->
->
-> Oh this is quite embarassing! I only added the bdrv_attach_aio_context
->
-> callback for the file-backed device. Your fix is definitely corect for
->
-> host device. Let me make sure there weren't any others missed and I will
->
-> send out a properly formatted patch. Thank you for the quick testing and
->
-> turnaround!
->
->
-Farhan, can you respin your patch with proper sign-off and patch description?
->
-Adding qemu-block.
-I sent it yesterday, sorry I didn't cc everyone from this e-mail:
-http://lists.nongnu.org/archive/html/qemu-block/2018-07/msg00516.html
-Thanks,
-Nish
-
diff --git a/results/classifier/008/other/24190340 b/results/classifier/008/other/24190340
deleted file mode 100644
index 712d8fc51..000000000
--- a/results/classifier/008/other/24190340
+++ /dev/null
@@ -1,2066 +0,0 @@
-debug: 0.876
-permissions: 0.851
-device: 0.832
-performance: 0.832
-other: 0.811
-vnc: 0.808
-boot: 0.803
-semantic: 0.793
-KVM: 0.776
-graphic: 0.775
-files: 0.770
-PID: 0.762
-network: 0.723
-socket: 0.715
-
-[BUG, RFC] Block graph deadlock on job-dismiss
-
-Hi all,
-
-There's a bug in block layer which leads to block graph deadlock.
-Notably, it takes place when blockdev IO is processed within a separate
-iothread.
-
-This was initially caught by our tests, and I was able to reduce it to a
-relatively simple reproducer.  Such deadlocks are probably supposed to
-be covered in iotests/graph-changes-while-io, but this deadlock isn't.
-
-Basically what the reproducer does is launches QEMU with a drive having
-'iothread' option set, creates a chain of 2 snapshots, launches
-block-commit job for a snapshot and then dismisses the job, starting
-from the lower snapshot.  If the guest is issuing IO at the same time,
-there's a race in acquiring block graph lock and a potential deadlock.
-
-Here's how it can be reproduced:
-
-1. Run QEMU:
->
-SRCDIR=/path/to/srcdir
->
->
->
->
->
-$SRCDIR/build/qemu-system-x86_64 -enable-kvm \
->
->
--machine q35 -cpu Nehalem \
->
->
--name guest=alma8-vm,debug-threads=on \
->
->
--m 2g -smp 2 \
->
->
--nographic -nodefaults \
->
->
--qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \
->
->
--serial unix:/var/run/alma8-serial.sock,server=on,wait=off \
->
->
--object iothread,id=iothread0 \
->
->
--blockdev
->
-node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2
->
-\
->
--device virtio-blk-pci,drive=disk,iothread=iothread0
-2. Launch IO (random reads) from within the guest:
->
-nc -U /var/run/alma8-serial.sock
->
-...
->
-[root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k
->
---size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting
->
---rw=randread --iodepth=1 --filename=/testfile
-3. Run snapshots creation & removal of lower snapshot operation in a
-loop (script attached):
->
-while /bin/true ; do ./remove_lower_snap.sh ; done
-And then it occasionally hangs.
-
-Note: I've tried bisecting this, and looks like deadlock occurs starting
-from the following commit:
-
-(BAD)  5bdbaebcce virtio: Re-enable notifications after drain
-(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll
-
-On the latest v10.0.0 it does hang as well.
-
-
-Here's backtrace of the main thread:
-
->
-#0  0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1,
->
-timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43
->
-#1  0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1,
->
-timeout=-1) at ../util/qemu-timer.c:329
->
-#2  0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20,
->
-ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79
->
-#3  0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at
->
-../util/aio-posix.c:730
->
-#4  0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950,
->
-parent=0x0, poll=true) at ../block/io.c:378
->
-#5  0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at
->
-../block/io.c:391
->
-#6  0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../block.c:7682
->
-#7  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../block.c:7608
->
-#8  0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../block.c:7668
->
-#9  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../block.c:7608
->
-#10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../block.c:7668
->
-#11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../block.c:7608
->
-#12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../blockjob.c:157
->
-#13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../block.c:7592
->
-#14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../block.c:7661
->
-#15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx
->
-(child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 =
->
-{...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234
->
-#16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../block.c:7592
->
-#17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0,
->
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-errp=0x0)
->
-at ../block.c:7661
->
-#18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0,
->
-ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715
->
-#19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at
->
-../block.c:3317
->
-#20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at
->
-../blockjob.c:209
->
-#21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at
->
-../blockjob.c:82
->
-#22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at
->
-../job.c:474
->
-#23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at
->
-../job.c:771
->
-#24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400,
->
-errp=0x7ffd94b4f488) at ../job.c:783
->
---Type <RET> for more, q to quit, c to continue without paging--
->
-#25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 "commit-snap1",
->
-errp=0x7ffd94b4f488) at ../job-qmp.c:138
->
-#26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0,
->
-ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221
->
-#27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at
->
-../qapi/qmp-dispatch.c:128
->
-#28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at
->
-../util/async.c:172
->
-#29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at
->
-../util/async.c:219
->
-#30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at
->
-../util/aio-posix.c:436
->
-#31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200,
->
-callback=0x0, user_data=0x0) at ../util/async.c:361
->
-#32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at
->
-../glib/gmain.c:3364
->
-#33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079
->
-#34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287
->
-#35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at
->
-../util/main-loop.c:310
->
-#36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at
->
-../util/main-loop.c:589
->
-#37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835
->
-#38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at
->
-../system/main.c:50
->
-#39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at
->
-../system/main.c:80
-And here's coroutine trying to acquire read lock:
-
->
-(gdb) qemu coroutine reader_queue->entries.sqh_first
->
-#0  0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0,
->
-to_=0x7fc537fff508, action=COROUTINE_YIELD) at
->
-../util/coroutine-ucontext.c:321
->
-#1  0x0000557eb47d4d4a in qemu_coroutine_yield () at
->
-../util/qemu-coroutine.c:339
->
-#2  0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0
->
-<reader_queue>, lock=0x7fc53c57de50, flags=0) at
->
-../util/qemu-coroutine-lock.c:60
->
-#3  0x0000557eb461fea7 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:231
->
-#4  0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at
->
-/home/root/src/qemu/master/include/block/graph-lock.h:213
->
-#5  0x0000557eb460fa41 in blk_co_do_preadv_part
->
-(blk=0x557eb84c0810, offset=6890553344, bytes=4096, qiov=0x7fc530006988,
->
-qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1339
->
-#6  0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at
->
-../block/block-backend.c:1619
->
-#7  0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) at
->
-../util/coroutine-ucontext.c:175
->
-#8  0x00007fc547c2a360 in __start_context () at
->
-../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
->
-#9  0x00007ffd94b4ea40 in  ()
->
-#10 0x0000000000000000 in  ()
-So it looks like main thread is processing job-dismiss request and is
-holding write lock taken in block_job_remove_all_bdrv() (frame #20
-above).  At the same time iothread spawns a coroutine which performs IO
-request.  Before the coroutine is spawned, blk_aio_prwv() increases
-'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
-trying to acquire the read lock.  But main thread isn't releasing the
-lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
-Here's the deadlock.
-
-Any comments and suggestions on the subject are welcomed.  Thanks!
-
-Andrey
-remove_lower_snap.sh
-Description:
-application/shellscript
-
-On 4/24/25 8:32 PM, Andrey Drobyshev wrote:
->
-Hi all,
->
->
-There's a bug in block layer which leads to block graph deadlock.
->
-Notably, it takes place when blockdev IO is processed within a separate
->
-iothread.
->
->
-This was initially caught by our tests, and I was able to reduce it to a
->
-relatively simple reproducer.  Such deadlocks are probably supposed to
->
-be covered in iotests/graph-changes-while-io, but this deadlock isn't.
->
->
-Basically what the reproducer does is launches QEMU with a drive having
->
-'iothread' option set, creates a chain of 2 snapshots, launches
->
-block-commit job for a snapshot and then dismisses the job, starting
->
-from the lower snapshot.  If the guest is issuing IO at the same time,
->
-there's a race in acquiring block graph lock and a potential deadlock.
->
->
-Here's how it can be reproduced:
->
->
-[...]
->
-I took a closer look at iotests/graph-changes-while-io, and have managed
-to reproduce the same deadlock in a much simpler setup, without a guest.
-
-1. Run QSD:> ./build/storage-daemon/qemu-storage-daemon --object
-iothread,id=iothread0 \
->
---blockdev null-co,node-name=node0,read-zeroes=true \
->
->
---nbd-server addr.type=unix,addr.path=/var/run/qsd_nbd.sock \
->
->
---export
->
-nbd,id=exp0,node-name=node0,iothread=iothread0,fixed-iothread=true,writable=true
->
-\
->
---chardev
->
-socket,id=qmp-sock,path=/var/run/qsd_qmp.sock,server=on,wait=off \
->
---monitor chardev=qmp-sock
-2. Launch IO:
->
-qemu-img bench -f raw -c 2000000
->
-'nbd+unix:///node0?socket=/var/run/qsd_nbd.sock'
-3. Add 2 snapshots and remove lower one (script attached):> while
-/bin/true ; do ./rls_qsd.sh ; done
-
-And then it hangs.
-
-I'll also send a patch with corresponding test case added directly to
-iotests.
-
-This reproduce seems to be hanging starting from Fiona's commit
-67446e605dc ("blockjob: drop AioContext lock before calling
-bdrv_graph_wrlock()").  AioContext locks were dropped entirely later on
-in Stefan's commit b49f4755c7 ("block: remove AioContext locking"), but
-the problem remains.
-
-Andrey
-rls_qsd.sh
-Description:
-application/shellscript
-
-From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
-
-This case is catching potential deadlock which takes place when job-dismiss
-is issued when I/O requests are processed in a separate iothread.
-
-See
-https://mail.gnu.org/archive/html/qemu-devel/2025-04/msg04421.html
-Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
----
- .../qemu-iotests/tests/graph-changes-while-io | 101 ++++++++++++++++--
- .../tests/graph-changes-while-io.out          |   4 +-
- 2 files changed, 96 insertions(+), 9 deletions(-)
-
-diff --git a/tests/qemu-iotests/tests/graph-changes-while-io 
-b/tests/qemu-iotests/tests/graph-changes-while-io
-index 194fda500e..e30f823da4 100755
---- a/tests/qemu-iotests/tests/graph-changes-while-io
-+++ b/tests/qemu-iotests/tests/graph-changes-while-io
-@@ -27,6 +27,8 @@ from iotests import imgfmt, qemu_img, qemu_img_create, 
-qemu_io, \
- 
- 
- top = os.path.join(iotests.test_dir, 'top.img')
-+snap1 = os.path.join(iotests.test_dir, 'snap1.img')
-+snap2 = os.path.join(iotests.test_dir, 'snap2.img')
- nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
- 
- 
-@@ -58,6 +60,15 @@ class TestGraphChangesWhileIO(QMPTestCase):
-     def tearDown(self) -> None:
-         self.qsd.stop()
- 
-+    def _wait_for_blockjob(self, status) -> None:
-+        done = False
-+        while not done:
-+            for event in self.qsd.get_qmp().get_events(wait=10.0):
-+                if event['event'] != 'JOB_STATUS_CHANGE':
-+                    continue
-+                if event['data']['status'] == status:
-+                    done = True
-+
-     def test_blockdev_add_while_io(self) -> None:
-         # Run qemu-img bench in the background
-         bench_thr = Thread(target=do_qemu_img_bench)
-@@ -116,13 +127,89 @@ class TestGraphChangesWhileIO(QMPTestCase):
-                 'device': 'job0',
-             })
- 
--            cancelled = False
--            while not cancelled:
--                for event in self.qsd.get_qmp().get_events(wait=10.0):
--                    if event['event'] != 'JOB_STATUS_CHANGE':
--                        continue
--                    if event['data']['status'] == 'null':
--                        cancelled = True
-+            self._wait_for_blockjob('null')
-+
-+        bench_thr.join()
-+
-+    def test_remove_lower_snapshot_while_io(self) -> None:
-+        # Run qemu-img bench in the background
-+        bench_thr = Thread(target=do_qemu_img_bench, args=(100000, ))
-+        bench_thr.start()
-+
-+        # While I/O is performed on 'node0' node, consequently add 2 snapshots
-+        # on top of it, then remove (commit) them starting from lower one.
-+        while bench_thr.is_alive():
-+            # Recreate snapshot images on every iteration
-+            qemu_img_create('-f', imgfmt, snap1, '1G')
-+            qemu_img_create('-f', imgfmt, snap2, '1G')
-+
-+            self.qsd.cmd('blockdev-add', {
-+                'driver': imgfmt,
-+                'node-name': 'snap1',
-+                'file': {
-+                    'driver': 'file',
-+                    'filename': snap1
-+                }
-+            })
-+
-+            self.qsd.cmd('blockdev-snapshot', {
-+                'node': 'node0',
-+                'overlay': 'snap1',
-+            })
-+
-+            self.qsd.cmd('blockdev-add', {
-+                'driver': imgfmt,
-+                'node-name': 'snap2',
-+                'file': {
-+                    'driver': 'file',
-+                    'filename': snap2
-+                }
-+            })
-+
-+            self.qsd.cmd('blockdev-snapshot', {
-+                'node': 'snap1',
-+                'overlay': 'snap2',
-+            })
-+
-+            self.qsd.cmd('block-commit', {
-+                'job-id': 'commit-snap1',
-+                'device': 'snap2',
-+                'top-node': 'snap1',
-+                'base-node': 'node0',
-+                'auto-finalize': True,
-+                'auto-dismiss': False,
-+            })
-+
-+            self._wait_for_blockjob('concluded')
-+            self.qsd.cmd('job-dismiss', {
-+                'id': 'commit-snap1',
-+            })
-+
-+            self.qsd.cmd('block-commit', {
-+                'job-id': 'commit-snap2',
-+                'device': 'snap2',
-+                'top-node': 'snap2',
-+                'base-node': 'node0',
-+                'auto-finalize': True,
-+                'auto-dismiss': False,
-+            })
-+
-+            self._wait_for_blockjob('ready')
-+            self.qsd.cmd('job-complete', {
-+                'id': 'commit-snap2',
-+            })
-+
-+            self._wait_for_blockjob('concluded')
-+            self.qsd.cmd('job-dismiss', {
-+                'id': 'commit-snap2',
-+            })
-+
-+            self.qsd.cmd('blockdev-del', {
-+                'node-name': 'snap1'
-+            })
-+            self.qsd.cmd('blockdev-del', {
-+                'node-name': 'snap2'
-+            })
- 
-         bench_thr.join()
- 
-diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out 
-b/tests/qemu-iotests/tests/graph-changes-while-io.out
-index fbc63e62f8..8d7e996700 100644
---- a/tests/qemu-iotests/tests/graph-changes-while-io.out
-+++ b/tests/qemu-iotests/tests/graph-changes-while-io.out
-@@ -1,5 +1,5 @@
--..
-+...
- ----------------------------------------------------------------------
--Ran 2 tests
-+Ran 3 tests
- 
- OK
--- 
-2.43.5
-
-Am 24.04.25 um 19:32 schrieb Andrey Drobyshev:
->
-So it looks like main thread is processing job-dismiss request and is
->
-holding write lock taken in block_job_remove_all_bdrv() (frame #20
->
-above).  At the same time iothread spawns a coroutine which performs IO
->
-request.  Before the coroutine is spawned, blk_aio_prwv() increases
->
-'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
->
-trying to acquire the read lock.  But main thread isn't releasing the
->
-lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
->
-Here's the deadlock.
-And for the IO test you provided, it's client->nb_requests that behaves
-similarly to blk->in_flight here.
-
-The issue also reproduces easily when issuing the following QMP command
-in a loop while doing IO on a device:
-
->
-void qmp_block_locked_drain(const char *node_name, Error **errp)
->
-{
->
-BlockDriverState *bs;
->
->
-bs = bdrv_find_node(node_name);
->
-if (!bs) {
->
-error_setg(errp, "node not found");
->
-return;
->
-}
->
->
-bdrv_graph_wrlock();
->
-bdrv_drained_begin(bs);
->
-bdrv_drained_end(bs);
->
-bdrv_graph_wrunlock();
->
-}
-It seems like either it would be necessary to require:
-1. not draining inside an exclusively locked section
-or
-2. making sure that variables used by drained_poll routines are only set
-while holding the reader lock
-?
-
-Those seem to require rather involved changes, so a third option might
-be to make draining inside an exclusively locked section possible, by
-embedding such locked sections in a drained section:
-
->
-diff --git a/blockjob.c b/blockjob.c
->
-index 32007f31a9..9b2f3b3ea9 100644
->
---- a/blockjob.c
->
-+++ b/blockjob.c
->
-@@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
->
-* one to make sure that such a concurrent access does not attempt
->
-* to process an already freed BdrvChild.
->
-*/
->
-+    bdrv_drain_all_begin();
->
-bdrv_graph_wrlock();
->
-while (job->nodes) {
->
-GSList *l = job->nodes;
->
-@@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
->
-g_slist_free_1(l);
->
-}
->
-bdrv_graph_wrunlock();
->
-+    bdrv_drain_all_end();
->
-}
->
->
-bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
-This seems to fix the issue at hand. I can send a patch if this is
-considered an acceptable approach.
-
-Best Regards,
-Fiona
-
-On 4/30/25 11:47 AM, Fiona Ebner wrote:
->
-Am 24.04.25 um 19:32 schrieb Andrey Drobyshev:
->
-> So it looks like main thread is processing job-dismiss request and is
->
-> holding write lock taken in block_job_remove_all_bdrv() (frame #20
->
-> above).  At the same time iothread spawns a coroutine which performs IO
->
-> request.  Before the coroutine is spawned, blk_aio_prwv() increases
->
-> 'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
->
-> trying to acquire the read lock.  But main thread isn't releasing the
->
-> lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
->
-> Here's the deadlock.
->
->
-And for the IO test you provided, it's client->nb_requests that behaves
->
-similarly to blk->in_flight here.
->
->
-The issue also reproduces easily when issuing the following QMP command
->
-in a loop while doing IO on a device:
->
->
-> void qmp_block_locked_drain(const char *node_name, Error **errp)
->
-> {
->
->     BlockDriverState *bs;
->
->
->
->     bs = bdrv_find_node(node_name);
->
->     if (!bs) {
->
->         error_setg(errp, "node not found");
->
->         return;
->
->     }
->
->
->
->     bdrv_graph_wrlock();
->
->     bdrv_drained_begin(bs);
->
->     bdrv_drained_end(bs);
->
->     bdrv_graph_wrunlock();
->
-> }
->
->
-It seems like either it would be necessary to require:
->
-1. not draining inside an exclusively locked section
->
-or
->
-2. making sure that variables used by drained_poll routines are only set
->
-while holding the reader lock
->
-?
->
->
-Those seem to require rather involved changes, so a third option might
->
-be to make draining inside an exclusively locked section possible, by
->
-embedding such locked sections in a drained section:
->
->
-> diff --git a/blockjob.c b/blockjob.c
->
-> index 32007f31a9..9b2f3b3ea9 100644
->
-> --- a/blockjob.c
->
-> +++ b/blockjob.c
->
-> @@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
->
->       * one to make sure that such a concurrent access does not attempt
->
->       * to process an already freed BdrvChild.
->
->       */
->
-> +    bdrv_drain_all_begin();
->
->      bdrv_graph_wrlock();
->
->      while (job->nodes) {
->
->          GSList *l = job->nodes;
->
-> @@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
->
->          g_slist_free_1(l);
->
->      }
->
->      bdrv_graph_wrunlock();
->
-> +    bdrv_drain_all_end();
->
->  }
->
->
->
->  bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
->
->
-This seems to fix the issue at hand. I can send a patch if this is
->
-considered an acceptable approach.
->
->
-Best Regards,
->
-Fiona
->
-Hello Fiona,
-
-Thanks for looking into it.  I've tried your 3rd option above and can
-confirm it does fix the deadlock, at least I can't reproduce it.  Other
-iotests also don't seem to be breaking.  So I personally am fine with
-that patch.  Would be nice to hear a word from the maintainers though on
-whether there're any caveats with such approach.
-
-Andrey
-
-On Wed, Apr 30, 2025 at 10:11 AM Andrey Drobyshev
-<andrey.drobyshev@virtuozzo.com> wrote:
->
->
-On 4/30/25 11:47 AM, Fiona Ebner wrote:
->
-> Am 24.04.25 um 19:32 schrieb Andrey Drobyshev:
->
->> So it looks like main thread is processing job-dismiss request and is
->
->> holding write lock taken in block_job_remove_all_bdrv() (frame #20
->
->> above).  At the same time iothread spawns a coroutine which performs IO
->
->> request.  Before the coroutine is spawned, blk_aio_prwv() increases
->
->> 'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
->
->> trying to acquire the read lock.  But main thread isn't releasing the
->
->> lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
->
->> Here's the deadlock.
->
->
->
-> And for the IO test you provided, it's client->nb_requests that behaves
->
-> similarly to blk->in_flight here.
->
->
->
-> The issue also reproduces easily when issuing the following QMP command
->
-> in a loop while doing IO on a device:
->
->
->
->> void qmp_block_locked_drain(const char *node_name, Error **errp)
->
->> {
->
->>     BlockDriverState *bs;
->
->>
->
->>     bs = bdrv_find_node(node_name);
->
->>     if (!bs) {
->
->>         error_setg(errp, "node not found");
->
->>         return;
->
->>     }
->
->>
->
->>     bdrv_graph_wrlock();
->
->>     bdrv_drained_begin(bs);
->
->>     bdrv_drained_end(bs);
->
->>     bdrv_graph_wrunlock();
->
->> }
->
->
->
-> It seems like either it would be necessary to require:
->
-> 1. not draining inside an exclusively locked section
->
-> or
->
-> 2. making sure that variables used by drained_poll routines are only set
->
-> while holding the reader lock
->
-> ?
->
->
->
-> Those seem to require rather involved changes, so a third option might
->
-> be to make draining inside an exclusively locked section possible, by
->
-> embedding such locked sections in a drained section:
->
->
->
->> diff --git a/blockjob.c b/blockjob.c
->
->> index 32007f31a9..9b2f3b3ea9 100644
->
->> --- a/blockjob.c
->
->> +++ b/blockjob.c
->
->> @@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
->
->>       * one to make sure that such a concurrent access does not attempt
->
->>       * to process an already freed BdrvChild.
->
->>       */
->
->> +    bdrv_drain_all_begin();
->
->>      bdrv_graph_wrlock();
->
->>      while (job->nodes) {
->
->>          GSList *l = job->nodes;
->
->> @@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
->
->>          g_slist_free_1(l);
->
->>      }
->
->>      bdrv_graph_wrunlock();
->
->> +    bdrv_drain_all_end();
->
->>  }
->
->>
->
->>  bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
->
->
->
-> This seems to fix the issue at hand. I can send a patch if this is
->
-> considered an acceptable approach.
-Kevin is aware of this thread but it's a public holiday tomorrow so it
-may be a little longer.
-
-Stefan
-
-Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben:
->
-Hi all,
->
->
-There's a bug in block layer which leads to block graph deadlock.
->
-Notably, it takes place when blockdev IO is processed within a separate
->
-iothread.
->
->
-This was initially caught by our tests, and I was able to reduce it to a
->
-relatively simple reproducer.  Such deadlocks are probably supposed to
->
-be covered in iotests/graph-changes-while-io, but this deadlock isn't.
->
->
-Basically what the reproducer does is launches QEMU with a drive having
->
-'iothread' option set, creates a chain of 2 snapshots, launches
->
-block-commit job for a snapshot and then dismisses the job, starting
->
-from the lower snapshot.  If the guest is issuing IO at the same time,
->
-there's a race in acquiring block graph lock and a potential deadlock.
->
->
-Here's how it can be reproduced:
->
->
-1. Run QEMU:
->
-> SRCDIR=/path/to/srcdir
->
->
->
->
->
->
->
->
->
-> $SRCDIR/build/qemu-system-x86_64 -enable-kvm \
->
->
->
->   -machine q35 -cpu Nehalem \
->
->
->
->   -name guest=alma8-vm,debug-threads=on \
->
->
->
->   -m 2g -smp 2 \
->
->
->
->   -nographic -nodefaults \
->
->
->
->   -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \
->
->
->
->   -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \
->
->
->
->   -object iothread,id=iothread0 \
->
->
->
->   -blockdev
->
-> node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2
->
->  \
->
->   -device virtio-blk-pci,drive=disk,iothread=iothread0
->
->
-2. Launch IO (random reads) from within the guest:
->
-> nc -U /var/run/alma8-serial.sock
->
-> ...
->
-> [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k
->
-> --size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting
->
-> --rw=randread --iodepth=1 --filename=/testfile
->
->
-3. Run snapshots creation & removal of lower snapshot operation in a
->
-loop (script attached):
->
-> while /bin/true ; do ./remove_lower_snap.sh ; done
->
->
-And then it occasionally hangs.
->
->
-Note: I've tried bisecting this, and looks like deadlock occurs starting
->
-from the following commit:
->
->
-(BAD)  5bdbaebcce virtio: Re-enable notifications after drain
->
-(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll
->
->
-On the latest v10.0.0 it does hang as well.
->
->
->
-Here's backtrace of the main thread:
->
->
-> #0  0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1,
->
-> timeout=<optimized out>, sigmask=0x0) at
->
-> ../sysdeps/unix/sysv/linux/ppoll.c:43
->
-> #1  0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1,
->
-> timeout=-1) at ../util/qemu-timer.c:329
->
-> #2  0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20,
->
-> ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79
->
-> #3  0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at
->
-> ../util/aio-posix.c:730
->
-> #4  0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950,
->
-> parent=0x0, poll=true) at ../block/io.c:378
->
-> #5  0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at
->
-> ../block/io.c:391
->
-> #6  0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../block.c:7682
->
-> #7  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../block.c:7608
->
-> #8  0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../block.c:7668
->
-> #9  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../block.c:7608
->
-> #10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../block.c:7668
->
-> #11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../block.c:7608
->
-> #12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../blockjob.c:157
->
-> #13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../block.c:7592
->
-> #14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../block.c:7661
->
-> #15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx
->
->     (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 =
->
-> {...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234
->
-> #16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../block.c:7592
->
-> #17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0,
->
-> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
->
-> errp=0x0)
->
->     at ../block.c:7661
->
-> #18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0,
->
-> ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715
->
-> #19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at
->
-> ../block.c:3317
->
-> #20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at
->
-> ../blockjob.c:209
->
-> #21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at
->
-> ../blockjob.c:82
->
-> #22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at
->
-> ../job.c:474
->
-> #23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at
->
-> ../job.c:771
->
-> #24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400,
->
-> errp=0x7ffd94b4f488) at ../job.c:783
->
-> --Type <RET> for more, q to quit, c to continue without paging--
->
-> #25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0
->
-> "commit-snap1", errp=0x7ffd94b4f488) at ../job-qmp.c:138
->
-> #26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0,
->
-> ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221
->
-> #27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at
->
-> ../qapi/qmp-dispatch.c:128
->
-> #28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at
->
-> ../util/async.c:172
->
-> #29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at
->
-> ../util/async.c:219
->
-> #30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at
->
-> ../util/aio-posix.c:436
->
-> #31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200,
->
-> callback=0x0, user_data=0x0) at ../util/async.c:361
->
-> #32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at
->
-> ../glib/gmain.c:3364
->
-> #33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079
->
-> #34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287
->
-> #35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at
->
-> ../util/main-loop.c:310
->
-> #36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at
->
-> ../util/main-loop.c:589
->
-> #37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835
->
-> #38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at
->
-> ../system/main.c:50
->
-> #39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at
->
-> ../system/main.c:80
->
->
->
-And here's coroutine trying to acquire read lock:
->
->
-> (gdb) qemu coroutine reader_queue->entries.sqh_first
->
-> #0  0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0,
->
-> to_=0x7fc537fff508, action=COROUTINE_YIELD) at
->
-> ../util/coroutine-ucontext.c:321
->
-> #1  0x0000557eb47d4d4a in qemu_coroutine_yield () at
->
-> ../util/qemu-coroutine.c:339
->
-> #2  0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0
->
-> <reader_queue>, lock=0x7fc53c57de50, flags=0) at
->
-> ../util/qemu-coroutine-lock.c:60
->
-> #3  0x0000557eb461fea7 in bdrv_graph_co_rdlock () at
->
-> ../block/graph-lock.c:231
->
-> #4  0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at
->
-> /home/root/src/qemu/master/include/block/graph-lock.h:213
->
-> #5  0x0000557eb460fa41 in blk_co_do_preadv_part
->
->     (blk=0x557eb84c0810, offset=6890553344, bytes=4096,
->
-> qiov=0x7fc530006988, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at
->
-> ../block/block-backend.c:1339
->
-> #6  0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at
->
-> ../block/block-backend.c:1619
->
-> #7  0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886)
->
-> at ../util/coroutine-ucontext.c:175
->
-> #8  0x00007fc547c2a360 in __start_context () at
->
-> ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
->
-> #9  0x00007ffd94b4ea40 in  ()
->
-> #10 0x0000000000000000 in  ()
->
->
->
-So it looks like main thread is processing job-dismiss request and is
->
-holding write lock taken in block_job_remove_all_bdrv() (frame #20
->
-above).  At the same time iothread spawns a coroutine which performs IO
->
-request.  Before the coroutine is spawned, blk_aio_prwv() increases
->
-'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
->
-trying to acquire the read lock.  But main thread isn't releasing the
->
-lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
->
-Here's the deadlock.
->
->
-Any comments and suggestions on the subject are welcomed.  Thanks!
-I think this is what the blk_wait_while_drained() call was supposed to
-address in blk_co_do_preadv_part(). However, with the use of multiple
-I/O threads, this is racy.
-
-Do you think that in your case we hit the small race window between the
-checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there
-another reason why blk_wait_while_drained() didn't do its job?
-
-Kevin
-
-On 5/2/25 19:34, Kevin Wolf wrote:
-Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben:
-Hi all,
-
-There's a bug in block layer which leads to block graph deadlock.
-Notably, it takes place when blockdev IO is processed within a separate
-iothread.
-
-This was initially caught by our tests, and I was able to reduce it to a
-relatively simple reproducer.  Such deadlocks are probably supposed to
-be covered in iotests/graph-changes-while-io, but this deadlock isn't.
-
-Basically what the reproducer does is launches QEMU with a drive having
-'iothread' option set, creates a chain of 2 snapshots, launches
-block-commit job for a snapshot and then dismisses the job, starting
-from the lower snapshot.  If the guest is issuing IO at the same time,
-there's a race in acquiring block graph lock and a potential deadlock.
-
-Here's how it can be reproduced:
-
-1. Run QEMU:
-SRCDIR=/path/to/srcdir
-$SRCDIR/build/qemu-system-x86_64 -enable-kvm \
--machine q35 -cpu Nehalem \
-   -name guest=alma8-vm,debug-threads=on \
-   -m 2g -smp 2 \
-   -nographic -nodefaults \
-   -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \
-   -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \
-   -object iothread,id=iothread0 \
-   -blockdev 
-node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2
- \
-   -device virtio-blk-pci,drive=disk,iothread=iothread0
-2. Launch IO (random reads) from within the guest:
-nc -U /var/run/alma8-serial.sock
-...
-[root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k 
---size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting 
---rw=randread --iodepth=1 --filename=/testfile
-3. Run snapshots creation & removal of lower snapshot operation in a
-loop (script attached):
-while /bin/true ; do ./remove_lower_snap.sh ; done
-And then it occasionally hangs.
-
-Note: I've tried bisecting this, and looks like deadlock occurs starting
-from the following commit:
-
-(BAD)  5bdbaebcce virtio: Re-enable notifications after drain
-(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll
-
-On the latest v10.0.0 it does hang as well.
-
-
-Here's backtrace of the main thread:
-#0  0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, timeout=<optimized 
-out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43
-#1  0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, timeout=-1) 
-at ../util/qemu-timer.c:329
-#2  0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, 
-ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79
-#3  0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at 
-../util/aio-posix.c:730
-#4  0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, parent=0x0, 
-poll=true) at ../block/io.c:378
-#5  0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at 
-../block/io.c:391
-#6  0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../block.c:7682
-#7  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../block.c:7608
-#8  0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../block.c:7668
-#9  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../block.c:7608
-#10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../block.c:7668
-#11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../block.c:7608
-#12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../blockjob.c:157
-#13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../block.c:7592
-#14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../block.c:7661
-#15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx
-     (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, 
-tran=0x557eb7a87160, errp=0x0) at ../block.c:1234
-#16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../block.c:7592
-#17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, 
-ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
-errp=0x0)
-     at ../block.c:7661
-#18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, 
-ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715
-#19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at 
-../block.c:3317
-#20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at 
-../blockjob.c:209
-#21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at 
-../blockjob.c:82
-#22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at ../job.c:474
-#23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at 
-../job.c:771
-#24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, 
-errp=0x7ffd94b4f488) at ../job.c:783
---Type <RET> for more, q to quit, c to continue without paging--
-#25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 "commit-snap1", 
-errp=0x7ffd94b4f488) at ../job-qmp.c:138
-#26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, 
-ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221
-#27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at 
-../qapi/qmp-dispatch.c:128
-#28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at ../util/async.c:172
-#29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at 
-../util/async.c:219
-#30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at 
-../util/aio-posix.c:436
-#31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, 
-callback=0x0, user_data=0x0) at ../util/async.c:361
-#32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at 
-../glib/gmain.c:3364
-#33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079
-#34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287
-#35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at 
-../util/main-loop.c:310
-#36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at 
-../util/main-loop.c:589
-#37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835
-#38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at ../system/main.c:50
-#39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at 
-../system/main.c:80
-And here's coroutine trying to acquire read lock:
-(gdb) qemu coroutine reader_queue->entries.sqh_first
-#0  0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, 
-to_=0x7fc537fff508, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:321
-#1  0x0000557eb47d4d4a in qemu_coroutine_yield () at 
-../util/qemu-coroutine.c:339
-#2  0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 
-<reader_queue>, lock=0x7fc53c57de50, flags=0) at 
-../util/qemu-coroutine-lock.c:60
-#3  0x0000557eb461fea7 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:231
-#4  0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at 
-/home/root/src/qemu/master/include/block/graph-lock.h:213
-#5  0x0000557eb460fa41 in blk_co_do_preadv_part
-     (blk=0x557eb84c0810, offset=6890553344, bytes=4096, qiov=0x7fc530006988, 
-qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1339
-#6  0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at 
-../block/block-backend.c:1619
-#7  0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) at 
-../util/coroutine-ucontext.c:175
-#8  0x00007fc547c2a360 in __start_context () at 
-../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
-#9  0x00007ffd94b4ea40 in  ()
-#10 0x0000000000000000 in  ()
-So it looks like main thread is processing job-dismiss request and is
-holding write lock taken in block_job_remove_all_bdrv() (frame #20
-above).  At the same time iothread spawns a coroutine which performs IO
-request.  Before the coroutine is spawned, blk_aio_prwv() increases
-'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
-trying to acquire the read lock.  But main thread isn't releasing the
-lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
-Here's the deadlock.
-
-Any comments and suggestions on the subject are welcomed.  Thanks!
-I think this is what the blk_wait_while_drained() call was supposed to
-address in blk_co_do_preadv_part(). However, with the use of multiple
-I/O threads, this is racy.
-
-Do you think that in your case we hit the small race window between the
-checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there
-another reason why blk_wait_while_drained() didn't do its job?
-
-Kevin
-At my opinion there is very big race window. Main thread has
-eaten graph write lock. After that another coroutine is stalled
-within GRAPH_RDLOCK_GUARD() as there is no drain at the moment and only
-after that main thread has started drain. That is why Fiona's idea is
-looking working. Though this would mean that normally we should always
-do that at the moment when we acquire write lock. May be even inside
-this function. Den
-
-Am 02.05.2025 um 19:52 hat Denis V. Lunev geschrieben:
->
-On 5/2/25 19:34, Kevin Wolf wrote:
->
-> Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben:
->
-> > Hi all,
->
-> >
->
-> > There's a bug in block layer which leads to block graph deadlock.
->
-> > Notably, it takes place when blockdev IO is processed within a separate
->
-> > iothread.
->
-> >
->
-> > This was initially caught by our tests, and I was able to reduce it to a
->
-> > relatively simple reproducer.  Such deadlocks are probably supposed to
->
-> > be covered in iotests/graph-changes-while-io, but this deadlock isn't.
->
-> >
->
-> > Basically what the reproducer does is launches QEMU with a drive having
->
-> > 'iothread' option set, creates a chain of 2 snapshots, launches
->
-> > block-commit job for a snapshot and then dismisses the job, starting
->
-> > from the lower snapshot.  If the guest is issuing IO at the same time,
->
-> > there's a race in acquiring block graph lock and a potential deadlock.
->
-> >
->
-> > Here's how it can be reproduced:
->
-> >
->
-> > 1. Run QEMU:
->
-> > > SRCDIR=/path/to/srcdir
->
-> > > $SRCDIR/build/qemu-system-x86_64 -enable-kvm \
->
-> > >    -machine q35 -cpu Nehalem \
->
-> > >    -name guest=alma8-vm,debug-threads=on \
->
-> > >    -m 2g -smp 2 \
->
-> > >    -nographic -nodefaults \
->
-> > >    -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \
->
-> > >    -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \
->
-> > >    -object iothread,id=iothread0 \
->
-> > >    -blockdev
->
-> > > node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2
->
-> > >  \
->
-> > >    -device virtio-blk-pci,drive=disk,iothread=iothread0
->
-> > 2. Launch IO (random reads) from within the guest:
->
-> > > nc -U /var/run/alma8-serial.sock
->
-> > > ...
->
-> > > [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1
->
-> > > --bs=4k --size=1G --numjobs=1 --time_based=1 --runtime=300
->
-> > > --group_reporting --rw=randread --iodepth=1 --filename=/testfile
->
-> > 3. Run snapshots creation & removal of lower snapshot operation in a
->
-> > loop (script attached):
->
-> > > while /bin/true ; do ./remove_lower_snap.sh ; done
->
-> > And then it occasionally hangs.
->
-> >
->
-> > Note: I've tried bisecting this, and looks like deadlock occurs starting
->
-> > from the following commit:
->
-> >
->
-> > (BAD)  5bdbaebcce virtio: Re-enable notifications after drain
->
-> > (GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll
->
-> >
->
-> > On the latest v10.0.0 it does hang as well.
->
-> >
->
-> >
->
-> > Here's backtrace of the main thread:
->
-> >
->
-> > > #0  0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1,
->
-> > > timeout=<optimized out>, sigmask=0x0) at
->
-> > > ../sysdeps/unix/sysv/linux/ppoll.c:43
->
-> > > #1  0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1,
->
-> > > timeout=-1) at ../util/qemu-timer.c:329
->
-> > > #2  0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20,
->
-> > > ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79
->
-> > > #3  0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true)
->
-> > > at ../util/aio-posix.c:730
->
-> > > #4  0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950,
->
-> > > parent=0x0, poll=true) at ../block/io.c:378
->
-> > > #5  0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at
->
-> > > ../block/io.c:391
->
-> > > #6  0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950,
->
-> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../block.c:7682
->
-> > > #7  0x0000557eb45ebf2b in bdrv_child_change_aio_context
->
-> > > (c=0x557eb7964250, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../block.c:7608
->
-> > > #8  0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0,
->
-> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../block.c:7668
->
-> > > #9  0x0000557eb45ebf2b in bdrv_child_change_aio_context
->
-> > > (c=0x557eb7e59110, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../block.c:7608
->
-> > > #10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960,
->
-> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../block.c:7668
->
-> > > #11 0x0000557eb45ebf2b in bdrv_child_change_aio_context
->
-> > > (c=0x557eb814ed80, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../block.c:7608
->
-> > > #12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0,
->
-> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../blockjob.c:157
->
-> > > #13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context
->
-> > > (c=0x557eb7c9d3f0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../block.c:7592
->
-> > > #14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310,
->
-> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../block.c:7661
->
-> > > #15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx
->
-> > >      (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60
->
-> > > = {...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234
->
-> > > #16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context
->
-> > > (c=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../block.c:7592
->
-> > > #17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0,
->
-> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
->
-> > > tran=0x557eb7a87160, errp=0x0)
->
-> > >      at ../block.c:7661
->
-> > > #18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context
->
-> > > (bs=0x557eb79575e0, ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at
->
-> > > ../block.c:7715
->
-> > > #19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30)
->
-> > > at ../block.c:3317
->
-> > > #20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv
->
-> > > (job=0x557eb7952800) at ../blockjob.c:209
->
-> > > #21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at
->
-> > > ../blockjob.c:82
->
-> > > #22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at
->
-> > > ../job.c:474
->
-> > > #23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at
->
-> > > ../job.c:771
->
-> > > #24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400,
->
-> > > errp=0x7ffd94b4f488) at ../job.c:783
->
-> > > --Type <RET> for more, q to quit, c to continue without paging--
->
-> > > #25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0
->
-> > > "commit-snap1", errp=0x7ffd94b4f488) at ../job-qmp.c:138
->
-> > > #26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0,
->
-> > > ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221
->
-> > > #27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at
->
-> > > ../qapi/qmp-dispatch.c:128
->
-> > > #28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at
->
-> > > ../util/async.c:172
->
-> > > #29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at
->
-> > > ../util/async.c:219
->
-> > > #30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at
->
-> > > ../util/aio-posix.c:436
->
-> > > #31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200,
->
-> > > callback=0x0, user_data=0x0) at ../util/async.c:361
->
-> > > #32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at
->
-> > > ../glib/gmain.c:3364
->
-> > > #33 g_main_context_dispatch (context=0x557eb76c6430) at
->
-> > > ../glib/gmain.c:4079
->
-> > > #34 0x0000557eb47d3ab1 in glib_pollfds_poll () at
->
-> > > ../util/main-loop.c:287
->
-> > > #35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at
->
-> > > ../util/main-loop.c:310
->
-> > > #36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at
->
-> > > ../util/main-loop.c:589
->
-> > > #37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835
->
-> > > #38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at
->
-> > > ../system/main.c:50
->
-> > > #39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at
->
-> > > ../system/main.c:80
->
-> >
->
-> > And here's coroutine trying to acquire read lock:
->
-> >
->
-> > > (gdb) qemu coroutine reader_queue->entries.sqh_first
->
-> > > #0  0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0,
->
-> > > to_=0x7fc537fff508, action=COROUTINE_YIELD) at
->
-> > > ../util/coroutine-ucontext.c:321
->
-> > > #1  0x0000557eb47d4d4a in qemu_coroutine_yield () at
->
-> > > ../util/qemu-coroutine.c:339
->
-> > > #2  0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0
->
-> > > <reader_queue>, lock=0x7fc53c57de50, flags=0) at
->
-> > > ../util/qemu-coroutine-lock.c:60
->
-> > > #3  0x0000557eb461fea7 in bdrv_graph_co_rdlock () at
->
-> > > ../block/graph-lock.c:231
->
-> > > #4  0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3)
->
-> > > at /home/root/src/qemu/master/include/block/graph-lock.h:213
->
-> > > #5  0x0000557eb460fa41 in blk_co_do_preadv_part
->
-> > >      (blk=0x557eb84c0810, offset=6890553344, bytes=4096,
->
-> > > qiov=0x7fc530006988, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at
->
-> > > ../block/block-backend.c:1339
->
-> > > #6  0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at
->
-> > > ../block/block-backend.c:1619
->
-> > > #7  0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040,
->
-> > > i1=21886) at ../util/coroutine-ucontext.c:175
->
-> > > #8  0x00007fc547c2a360 in __start_context () at
->
-> > > ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
->
-> > > #9  0x00007ffd94b4ea40 in  ()
->
-> > > #10 0x0000000000000000 in  ()
->
-> >
->
-> > So it looks like main thread is processing job-dismiss request and is
->
-> > holding write lock taken in block_job_remove_all_bdrv() (frame #20
->
-> > above).  At the same time iothread spawns a coroutine which performs IO
->
-> > request.  Before the coroutine is spawned, blk_aio_prwv() increases
->
-> > 'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
->
-> > trying to acquire the read lock.  But main thread isn't releasing the
->
-> > lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
->
-> > Here's the deadlock.
->
-> >
->
-> > Any comments and suggestions on the subject are welcomed.  Thanks!
->
-> I think this is what the blk_wait_while_drained() call was supposed to
->
-> address in blk_co_do_preadv_part(). However, with the use of multiple
->
-> I/O threads, this is racy.
->
->
->
-> Do you think that in your case we hit the small race window between the
->
-> checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there
->
-> another reason why blk_wait_while_drained() didn't do its job?
->
->
->
-At my opinion there is very big race window. Main thread has
->
-eaten graph write lock. After that another coroutine is stalled
->
-within GRAPH_RDLOCK_GUARD() as there is no drain at the moment and only
->
-after that main thread has started drain.
-You're right, I confused taking the write lock with draining there.
-
->
-That is why Fiona's idea is looking working. Though this would mean
->
-that normally we should always do that at the moment when we acquire
->
-write lock. May be even inside this function.
-I actually see now that not all of my graph locking patches were merged.
-At least I did have the thought that bdrv_drained_begin() must be marked
-GRAPH_UNLOCKED because it polls. That means that calling it from inside
-bdrv_try_change_aio_context() is actually forbidden (and that's the part
-I didn't see back then because it doesn't have TSA annotations).
-
-If you refactor the code to move the drain out to before the lock is
-taken, I think you end up with Fiona's patch, except you'll remove the
-forbidden inner drain and add more annotations for some functions and
-clarify the rules around them. I don't know, but I wouldn't be surprised
-if along the process we find other bugs, too.
-
-So Fiona's drain looks right to me, but we should probably approach it
-more systematically.
-
-Kevin
-
diff --git a/results/classifier/008/other/24930826 b/results/classifier/008/other/24930826
deleted file mode 100644
index 44d423eec..000000000
--- a/results/classifier/008/other/24930826
+++ /dev/null
@@ -1,43 +0,0 @@
-device: 0.709
-graphic: 0.667
-performance: 0.624
-other: 0.535
-PID: 0.532
-debug: 0.525
-network: 0.513
-semantic: 0.487
-vnc: 0.473
-socket: 0.447
-permissions: 0.398
-files: 0.338
-boot: 0.218
-KVM: 0.172
-
-[Qemu-devel] [BUG] vhost-user: hot-unplug vhost-user nic for windows guest OS will fail with 100% reproduce rate
-
-Hi, guys
-
-I met a problem when hot-unplug vhost-user nic for Windows 2008 rc2 sp1 64 
-(Guest OS)
-
-The xml of nic is as followed:
-<interface type='vhostuser'>
-  <mac address='52:54:00:3b:83:aa'/>
-  <source type='unix' path='/var/run/vhost-user/port1' mode='client'/>
-  <target dev='port1'/>
-  <model type='virtio'/>
-  <driver queues='4'/>
-  <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
-</interface>
-
-Firstly, I use virsh attach-device win2008 vif.xml to hot-plug a nic for Guest 
-OS. This operation returns success.
-After guest OS discover nic successfully, I use virsh detach-device win2008 
-vif.xml to hot-unplug it. This operation will fail with 100% reproduce rate.
-
-However, if I hot-plug and hot-unplug virtio-net nic , it will not fail.
-
-I have analysis the process of qmp_device_del , I found that qemu have inject 
-interrupt to acpi to let it notice guest OS to remove nic.
-I guess there is something wrong in Windows when handle the interrupt.
-
diff --git a/results/classifier/008/other/25842545 b/results/classifier/008/other/25842545
deleted file mode 100644
index 103fe3722..000000000
--- a/results/classifier/008/other/25842545
+++ /dev/null
@@ -1,212 +0,0 @@
-other: 0.912
-KVM: 0.867
-vnc: 0.862
-device: 0.847
-debug: 0.836
-performance: 0.831
-semantic: 0.829
-PID: 0.829
-boot: 0.824
-graphic: 0.822
-permissions: 0.817
-socket: 0.808
-files: 0.806
-network: 0.796
-
-[Qemu-devel] [Bug?] Guest pause because VMPTRLD failed in KVM
-
-Hello,
-
-  We encountered a problem that a guest paused because the KMOD report VMPTRLD 
-failed.
-
-The related information is as follows:
-
-1) Qemu command:
-   /usr/bin/qemu-kvm -name omu1 -S -machine pc-i440fx-2.3,accel=kvm,usb=off -cpu
-host -m 15625 -realtime mlock=off -smp 8,sockets=1,cores=8,threads=1 -uuid
-a2aacfff-6583-48b4-b6a4-e6830e519931 -no-user-config -nodefaults -chardev
-socket,id=charmonitor,path=/var/lib/libvirt/qemu/omu1.monitor,server,nowait
--mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
--boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
-virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
-file=/home/env/guest1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native
-  -device
-virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0
-  -drive
-file=/home/env/guest_300G.img,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
-  -device
-virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1
-  -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device
-virtio-net-pci,netdev=hostnet0,id=net0,mac=00:00:80:05:00:00,bus=pci.0,addr=0x3
--netdev tap,fd=27,id=hostnet1,vhost=on,vhostfd=28 -device
-virtio-net-pci,netdev=hostnet1,id=net1,mac=00:00:80:05:00:01,bus=pci.0,addr=0x4
--chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
--device usb-tablet,id=input0 -vnc 0.0.0.0:0 -device
-cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device
-virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
-
-   2) Qemu log:
-   KVM: entry failed, hardware error 0x4
-   RAX=00000000ffffffed RBX=ffff8803fa2d7fd8 RCX=0100000000000000
-RDX=0000000000000000
-   RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8803fa2d7e90
-RSP=ffff8803fa2efe90
-   R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000
-R11=000000000000b69a
-   R12=0000000000000001 R13=ffffffff81a25b40 R14=0000000000000000
-R15=ffff8803fa2d7fd8
-   RIP=ffffffff81053e16 RFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
-   ES =0000 0000000000000000 ffffffff 00c00000
-   CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
-   SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
-   DS =0000 0000000000000000 ffffffff 00c00000
-   FS =0000 0000000000000000 ffffffff 00c00000
-   GS =0000 ffff88040f540000 ffffffff 00c00000
-   LDT=0000 0000000000000000 ffffffff 00c00000
-   TR =0040 ffff88040f550a40 00002087 00008b00 DPL=0 TSS64-busy
-   GDT=     ffff88040f549000 0000007f
-   IDT=     ffffffffff529000 00000fff
-   CR0=80050033 CR2=00007f81ca0c5000 CR3=00000003f5081000 CR4=000407e0
-   DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
-DR3=0000000000000000
-   DR6=00000000ffff0ff0 DR7=0000000000000400
-   EFER=0000000000000d01
-   Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ??
-?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
-
-   3) Demsg
-   [347315.028339] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
-   klogd 1.4.1, ---------- state change ----------
-   [347315.039506] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
-   [347315.051728] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
-   [347315.057472] vmwrite error: reg 6c0a value ffff88307e66e480 (err
-2120672384)
-   [347315.064567] Pid: 69523, comm: qemu-kvm Tainted: GF           X
-3.0.93-0.8-default #1
-   [347315.064569] Call Trace:
-   [347315.064587]  [<ffffffff810049d5>] dump_trace+0x75/0x300
-   [347315.064595]  [<ffffffff8145e3e3>] dump_stack+0x69/0x6f
-   [347315.064617]  [<ffffffffa03738de>] vmx_vcpu_load+0x11e/0x1d0 [kvm_intel]
-   [347315.064647]  [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm]
-   [347315.064669]  [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0
-   [347315.064676]  [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7
-   [347315.064687]  [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-   [347315.064703]  [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm]
-   [347315.064732]  [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 
-[kvm]
-   [347315.064759]  [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-   [347315.064771]  [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0
-   [347315.064776]  [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0
-   [347315.064783]  [<ffffffff81469272>] system_call_fastpath+0x16/0x1b
-   [347315.064797]  [<00007fee51969ce7>] 0x7fee51969ce6
-   [347315.064799] vmwrite error: reg 6c0c value ffff88307e664000 (err
-2120630272)
-   [347315.064802] Pid: 69523, comm: qemu-kvm Tainted: GF           X
-3.0.93-0.8-default #1
-   [347315.064803] Call Trace:
-   [347315.064807]  [<ffffffff810049d5>] dump_trace+0x75/0x300
-   [347315.064811]  [<ffffffff8145e3e3>] dump_stack+0x69/0x6f
-   [347315.064817]  [<ffffffffa03738ec>] vmx_vcpu_load+0x12c/0x1d0 [kvm_intel]
-   [347315.064832]  [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm]
-   [347315.064851]  [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0
-   [347315.064855]  [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7
-   [347315.064865]  [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-   [347315.064880]  [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm]
-   [347315.064907]  [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 
-[kvm]
-   [347315.064933]  [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-   [347315.064943]  [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0
-   [347315.064947]  [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0
-   [347315.064951]  [<ffffffff81469272>] system_call_fastpath+0x16/0x1b
-   [347315.064957]  [<00007fee51969ce7>] 0x7fee51969ce6
-   [347315.064959] vmwrite error: reg 6c10 value 0 (err 0)
-
-   4) The isssue can't be reporduced. I search the Intel VMX sepc about reaseons
-of vmptrld failure:
-   The instruction fails if its operand is not properly aligned, sets
-unsupported physical-address bits, or is equal to the VMXON
-   pointer. In addition, the instruction fails if the 32 bits in memory
-referenced by the operand do not match the VMCS
-   revision identifier supported by this processor.
-
-   But I can't find any cues from the KVM source code. It seems each
-   error conditions is impossible in theory. :(
-
-Any suggestions will be appreciated! Paolo?
-
--- 
-Regards,
--Gonglei
-
-On 10/11/2016 15:10, gong lei wrote:
->
-4) The isssue can't be reporduced. I search the Intel VMX sepc about
->
-reaseons
->
-of vmptrld failure:
->
-The instruction fails if its operand is not properly aligned, sets
->
-unsupported physical-address bits, or is equal to the VMXON
->
-pointer. In addition, the instruction fails if the 32 bits in memory
->
-referenced by the operand do not match the VMCS
->
-revision identifier supported by this processor.
->
->
-But I can't find any cues from the KVM source code. It seems each
->
-error conditions is impossible in theory. :(
-Yes, it should not happen. :(
-
-If it's not reproducible, it's really hard to say what it was, except a
-random memory corruption elsewhere or even a bit flip (!).
-
-Paolo
-
-On 2016/11/17 20:39, Paolo Bonzini wrote:
->
->
-On 10/11/2016 15:10, gong lei wrote:
->
->     4) The isssue can't be reporduced. I search the Intel VMX sepc about
->
-> reaseons
->
-> of vmptrld failure:
->
->     The instruction fails if its operand is not properly aligned, sets
->
-> unsupported physical-address bits, or is equal to the VMXON
->
->     pointer. In addition, the instruction fails if the 32 bits in memory
->
-> referenced by the operand do not match the VMCS
->
->     revision identifier supported by this processor.
->
->
->
->     But I can't find any cues from the KVM source code. It seems each
->
->     error conditions is impossible in theory. :(
->
-Yes, it should not happen. :(
->
->
-If it's not reproducible, it's really hard to say what it was, except a
->
-random memory corruption elsewhere or even a bit flip (!).
->
->
-Paolo
-Thanks for your reply, Paolo :)
-
--- 
-Regards,
--Gonglei
-
diff --git a/results/classifier/008/other/25892827 b/results/classifier/008/other/25892827
deleted file mode 100644
index bccf4d818..000000000
--- a/results/classifier/008/other/25892827
+++ /dev/null
@@ -1,1087 +0,0 @@
-other: 0.892
-permissions: 0.881
-KVM: 0.872
-debug: 0.868
-vnc: 0.846
-boot: 0.839
-network: 0.839
-device: 0.839
-graphic: 0.832
-semantic: 0.825
-socket: 0.822
-performance: 0.819
-files: 0.804
-PID: 0.792
-
-[Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot
-
-Hi,
-
-Recently we encountered a problem in our project: 2 CPUs in VM are not brought 
-up normally after reboot.
-
-Our host is using KVM kmod 3.6 and QEMU 2.1.
-A SLES 11 sp3 VM configured with 8 vcpus,
-cpu model is configured with 'host-passthrough'.
-
-After VM's first time started up, everything seems to be OK.
-and then VM is paniced and rebooted.
-After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not online.
-
-This is the only message we can get from VM:
-VM dmesg shows:
-[    0.069867] Booting Node   0, Processors  #1
-[    5.060042] CPU1: Stuck ??
-[    5.060499]  #2
-[    5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock
-[    5.088335] KVM setup async PF for cpu 2
-[    5.092967] NMI watchdog enabled, takes one hw-pmu counter.
-[    5.094405]  #3
-[    5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock
-[    5.108333] KVM setup async PF for cpu 3
-[    5.113553] NMI watchdog enabled, takes one hw-pmu counter.
-[    5.114970]  #4
-[    5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock
-[    5.128336] KVM setup async PF for cpu 4
-[    5.134576] NMI watchdog enabled, takes one hw-pmu counter.
-[    5.135998]  #5
-[    5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock
-[    5.152334] KVM setup async PF for cpu 5
-[    5.154764] NMI watchdog enabled, takes one hw-pmu counter.
-[    5.156467]  #6
-[    5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock
-[    5.172341] KVM setup async PF for cpu 6
-[    5.180738] NMI watchdog enabled, takes one hw-pmu counter.
-[    5.182173]  #7 Ok.
-[   10.170815] CPU7: Stuck ??
-[   10.171648] Brought up 6 CPUs
-[   10.172394] Total of 6 processors activated (28799.97 BogoMIPS).
-
-From host, we found that QEMU vcpu1 thread and vcpu7 thread were not consuming 
-any cpu (Should be in idle state),
-All of VCPUs' stacks in host is like bellow:
-
-[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
-[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
-[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
-[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
-[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
-[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
-[<ffffffffffffffff>] 0xffffffffffffffff
-
-We looked into the kernel codes that could leading to the above 'Stuck' warning,
-and found that the only possible is the emulation of 'cpuid' instruct in 
-kvm/qemu has something wrong.
-But since we can’t reproduce this problem, we are not quite sure.
-Is there any possible that the cupid emulation in kvm/qemu has some bug ?
-
-Has anyone come across these problem before? Or any idea?
-
-Thanks,
-zhanghailiang
-
-On 06/07/2015 09:54, zhanghailiang wrote:
->
->
-From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
->
-consuming any cpu (Should be in idle state),
->
-All of VCPUs' stacks in host is like bellow:
->
->
-[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
->
-[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
->
-[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
->
-[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
->
-[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
->
-[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
->
-[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
->
-[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
->
-[<ffffffffffffffff>] 0xffffffffffffffff
->
->
-We looked into the kernel codes that could leading to the above 'Stuck'
->
-warning,
->
-and found that the only possible is the emulation of 'cpuid' instruct in
->
-kvm/qemu has something wrong.
->
-But since we can’t reproduce this problem, we are not quite sure.
->
-Is there any possible that the cupid emulation in kvm/qemu has some bug ?
-Can you explain the relationship to the cpuid emulation?  What do the
-traces say about vcpus 1 and 7?
-
-Paolo
-
-On 2015/7/6 16:45, Paolo Bonzini wrote:
-On 06/07/2015 09:54, zhanghailiang wrote:
-From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
-consuming any cpu (Should be in idle state),
-All of VCPUs' stacks in host is like bellow:
-
-[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
-[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
-[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
-[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
-[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
-[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
-[<ffffffffffffffff>] 0xffffffffffffffff
-
-We looked into the kernel codes that could leading to the above 'Stuck'
-warning,
-and found that the only possible is the emulation of 'cpuid' instruct in
-kvm/qemu has something wrong.
-But since we can’t reproduce this problem, we are not quite sure.
-Is there any possible that the cupid emulation in kvm/qemu has some bug ?
-Can you explain the relationship to the cpuid emulation?  What do the
-traces say about vcpus 1 and 7?
-OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
-located in
-do_boot_cpu(). It's in BSP context, the call process is:
-BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
--> wakeup_secondary_via_INIT() to trigger APs.
-It will wait 5s for APs to startup, if some AP not startup normally, it will 
-print 'CPU%d Stuck' or 'CPU%d: Not responding'.
-
-If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
-begins to execute the code
-'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
-smp_callin()(smpboot.c).
-The follow is the starup process of BSP and AP.
-BSP:
-start_kernel()
-  ->smp_init()
-     ->smp_boot_cpus()
-       ->do_boot_cpu()
-           ->start_ip = trampoline_address(); //set the address that AP will go 
-to execute
-           ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
-           ->for (timeout = 0; timeout < 50000; timeout++)
-               if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if AP 
-startup or not
-
-APs:
-ENTRY(trampoline_data) (trampoline_64.S)
-      ->ENTRY(secondary_startup_64) (head_64.S)
-         ->start_secondary() (smpboot.c)
-            ->cpu_init();
-            ->smp_callin();
-                ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP comes 
-here, the BSP will not prints the error message.
-
-From above call process, we can be sure that, the AP has been stuck between 
-trampoline_data and the cpumask_set_cpu() in
-smp_callin(), we look through these codes path carefully, and only found a 
-'hlt' instruct that could block the process.
-It is located in trampoline_data():
-
-ENTRY(trampoline_data)
-        ...
-
-        call    verify_cpu              # Verify the cpu supports long mode
-        testl   %eax, %eax              # Check for return code
-        jnz     no_longmode
-
-        ...
-
-no_longmode:
-        hlt
-        jmp no_longmode
-
-For the process verify_cpu(),
-we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
-No-root mode.
-This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
-the fail in verify_cpu.
-
-From the message in VM, we know vcpu1 and vcpu7 is something wrong.
-[    5.060042] CPU1: Stuck ??
-[   10.170815] CPU7: Stuck ??
-[   10.171648] Brought up 6 CPUs
-
-Besides, the follow is the cpus message got from host.
-80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
-instance-0000000
-* CPU #0: pc=0x00007f64160c683d thread_id=68570
-  CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
-  CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
-  CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
-  CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
-  CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
-  CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
-  CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
-
-Oh, i also forgot to mention in the above message that, we have bond each vCPU 
-to different physical CPU in
-host.
-
-Thanks,
-zhanghailiang
-
-On 06/07/2015 11:59, zhanghailiang wrote:
->
->
->
-Besides, the follow is the cpus message got from host.
->
-80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
->
-qemu-monitor-command instance-0000000
->
-* CPU #0: pc=0x00007f64160c683d thread_id=68570
->
-CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
->
-CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
->
-CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
->
-CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
->
-CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
->
-CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
->
-CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
->
->
-Oh, i also forgot to mention in the above message that, we have bond
->
-each vCPU to different physical CPU in
->
-host.
-Can you capture a trace on the host (trace-cmd record -e kvm) and send
-it privately?  Please note which CPUs get stuck, since I guess it's not
-always 1 and 7.
-
-Paolo
-
-On Mon, 6 Jul 2015 17:59:10 +0800
-zhanghailiang <address@hidden> wrote:
-
->
-On 2015/7/6 16:45, Paolo Bonzini wrote:
->
->
->
->
->
-> On 06/07/2015 09:54, zhanghailiang wrote:
->
->>
->
->>  From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
->
->> consuming any cpu (Should be in idle state),
->
->> All of VCPUs' stacks in host is like bellow:
->
->>
->
->> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
->
->> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
->
->> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
->
->> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
->
->> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
->
->> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
->
->> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b
->
->> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
->
->> [<ffffffffffffffff>] 0xffffffffffffffff
->
->>
->
->> We looked into the kernel codes that could leading to the above 'Stuck'
->
->> warning,
-in current upstream there isn't any printk(...Stuck...) left since that code 
-path
-has been reworked.
-I've often seen this on over-committed host during guest CPUs up/down torture 
-test.
-Could you update guest kernel to upstream and see if issue reproduces?
-
->
->> and found that the only possible is the emulation of 'cpuid' instruct in
->
->> kvm/qemu has something wrong.
->
->> But since we can’t reproduce this problem, we are not quite sure.
->
->> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
->
->
->
-> Can you explain the relationship to the cpuid emulation?  What do the
->
-> traces say about vcpus 1 and 7?
->
->
-OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is
->
-located in
->
-do_boot_cpu(). It's in BSP context, the call process is:
->
-BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu()
->
--> wakeup_secondary_via_INIT() to trigger APs.
->
-It will wait 5s for APs to startup, if some AP not startup normally, it will
->
-print 'CPU%d Stuck' or 'CPU%d: Not responding'.
->
->
-If it prints 'Stuck', it means the AP has received the SIPI interrupt and
->
-begins to execute the code
->
-'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places
->
-before smp_callin()(smpboot.c).
->
-The follow is the starup process of BSP and AP.
->
-BSP:
->
-start_kernel()
->
-->smp_init()
->
-->smp_boot_cpus()
->
-->do_boot_cpu()
->
-->start_ip = trampoline_address(); //set the address that AP will
->
-go to execute
->
-->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
->
-->for (timeout = 0; timeout < 50000; timeout++)
->
-if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if
->
-AP startup or not
->
->
-APs:
->
-ENTRY(trampoline_data) (trampoline_64.S)
->
-->ENTRY(secondary_startup_64) (head_64.S)
->
-->start_secondary() (smpboot.c)
->
-->cpu_init();
->
-->smp_callin();
->
-->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
->
-comes here, the BSP will not prints the error message.
->
->
-From above call process, we can be sure that, the AP has been stuck between
->
-trampoline_data and the cpumask_set_cpu() in
->
-smp_callin(), we look through these codes path carefully, and only found a
->
-'hlt' instruct that could block the process.
->
-It is located in trampoline_data():
->
->
-ENTRY(trampoline_data)
->
-...
->
->
-call    verify_cpu              # Verify the cpu supports long mode
->
-testl   %eax, %eax              # Check for return code
->
-jnz     no_longmode
->
->
-...
->
->
-no_longmode:
->
-hlt
->
-jmp no_longmode
->
->
-For the process verify_cpu(),
->
-we can only find the 'cpuid' sensitive instruct that could lead VM exit from
->
-No-root mode.
->
-This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to
->
-the fail in verify_cpu.
->
->
-From the message in VM, we know vcpu1 and vcpu7 is something wrong.
->
-[    5.060042] CPU1: Stuck ??
->
-[   10.170815] CPU7: Stuck ??
->
-[   10.171648] Brought up 6 CPUs
->
->
-Besides, the follow is the cpus message got from host.
->
-80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
->
-qemu-monitor-command instance-0000000
->
-* CPU #0: pc=0x00007f64160c683d thread_id=68570
->
-CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
->
-CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
->
-CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
->
-CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
->
-CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
->
-CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
->
-CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
->
->
-Oh, i also forgot to mention in the above message that, we have bond each
->
-vCPU to different physical CPU in
->
-host.
->
->
-Thanks,
->
-zhanghailiang
->
->
->
->
->
---
->
-To unsubscribe from this list: send the line "unsubscribe kvm" in
->
-the body of a message to address@hidden
->
-More majordomo info at
-http://vger.kernel.org/majordomo-info.html
-
-On 2015/7/7 19:23, Igor Mammedov wrote:
-On Mon, 6 Jul 2015 17:59:10 +0800
-zhanghailiang <address@hidden> wrote:
-On 2015/7/6 16:45, Paolo Bonzini wrote:
-On 06/07/2015 09:54, zhanghailiang wrote:
-From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
-consuming any cpu (Should be in idle state),
-All of VCPUs' stacks in host is like bellow:
-
-[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
-[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
-[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
-[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
-[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
-[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
-[<ffffffffffffffff>] 0xffffffffffffffff
-
-We looked into the kernel codes that could leading to the above 'Stuck'
-warning,
-in current upstream there isn't any printk(...Stuck...) left since that code 
-path
-has been reworked.
-I've often seen this on over-committed host during guest CPUs up/down torture 
-test.
-Could you update guest kernel to upstream and see if issue reproduces?
-Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to 
-reproduce it.
-
-For your test case, is it a kernel bug?
-Or is there any related patch could solve your test problem been merged into
-upstream ?
-
-Thanks,
-zhanghailiang
-and found that the only possible is the emulation of 'cpuid' instruct in
-kvm/qemu has something wrong.
-But since we can’t reproduce this problem, we are not quite sure.
-Is there any possible that the cupid emulation in kvm/qemu has some bug ?
-Can you explain the relationship to the cpuid emulation?  What do the
-traces say about vcpus 1 and 7?
-OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
-located in
-do_boot_cpu(). It's in BSP context, the call process is:
-BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
--> wakeup_secondary_via_INIT() to trigger APs.
-It will wait 5s for APs to startup, if some AP not startup normally, it will 
-print 'CPU%d Stuck' or 'CPU%d: Not responding'.
-
-If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
-begins to execute the code
-'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
-smp_callin()(smpboot.c).
-The follow is the starup process of BSP and AP.
-BSP:
-start_kernel()
-    ->smp_init()
-       ->smp_boot_cpus()
-         ->do_boot_cpu()
-             ->start_ip = trampoline_address(); //set the address that AP will 
-go to execute
-             ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
-             ->for (timeout = 0; timeout < 50000; timeout++)
-                 if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if 
-AP startup or not
-
-APs:
-ENTRY(trampoline_data) (trampoline_64.S)
-        ->ENTRY(secondary_startup_64) (head_64.S)
-           ->start_secondary() (smpboot.c)
-              ->cpu_init();
-              ->smp_callin();
-                  ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP 
-comes here, the BSP will not prints the error message.
-
-  From above call process, we can be sure that, the AP has been stuck between 
-trampoline_data and the cpumask_set_cpu() in
-smp_callin(), we look through these codes path carefully, and only found a 
-'hlt' instruct that could block the process.
-It is located in trampoline_data():
-
-ENTRY(trampoline_data)
-          ...
-
-        call    verify_cpu              # Verify the cpu supports long mode
-        testl   %eax, %eax              # Check for return code
-        jnz     no_longmode
-
-          ...
-
-no_longmode:
-        hlt
-        jmp no_longmode
-
-For the process verify_cpu(),
-we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
-No-root mode.
-This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
-the fail in verify_cpu.
-
-  From the message in VM, we know vcpu1 and vcpu7 is something wrong.
-[    5.060042] CPU1: Stuck ??
-[   10.170815] CPU7: Stuck ??
-[   10.171648] Brought up 6 CPUs
-
-Besides, the follow is the cpus message got from host.
-80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
-instance-0000000
-* CPU #0: pc=0x00007f64160c683d thread_id=68570
-    CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
-    CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
-    CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
-    CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
-    CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
-    CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
-    CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
-
-Oh, i also forgot to mention in the above message that, we have bond each vCPU 
-to different physical CPU in
-host.
-
-Thanks,
-zhanghailiang
-
-
-
-
---
-To unsubscribe from this list: send the line "unsubscribe kvm" in
-the body of a message to address@hidden
-More majordomo info at
-http://vger.kernel.org/majordomo-info.html
-.
-
-On Tue, 7 Jul 2015 19:43:35 +0800
-zhanghailiang <address@hidden> wrote:
-
->
-On 2015/7/7 19:23, Igor Mammedov wrote:
->
-> On Mon, 6 Jul 2015 17:59:10 +0800
->
-> zhanghailiang <address@hidden> wrote:
->
->
->
->> On 2015/7/6 16:45, Paolo Bonzini wrote:
->
->>>
->
->>>
->
->>> On 06/07/2015 09:54, zhanghailiang wrote:
->
->>>>
->
->>>>   From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
->
->>>> consuming any cpu (Should be in idle state),
->
->>>> All of VCPUs' stacks in host is like bellow:
->
->>>>
->
->>>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
->
->>>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
->
->>>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
->
->>>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
->
->>>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
->
->>>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
->
->>>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b
->
->>>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
->
->>>> [<ffffffffffffffff>] 0xffffffffffffffff
->
->>>>
->
->>>> We looked into the kernel codes that could leading to the above 'Stuck'
->
->>>> warning,
->
-> in current upstream there isn't any printk(...Stuck...) left since that
->
-> code path
->
-> has been reworked.
->
-> I've often seen this on over-committed host during guest CPUs up/down
->
-> torture test.
->
-> Could you update guest kernel to upstream and see if issue reproduces?
->
->
->
->
-Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to
->
-reproduce it.
->
->
-For your test case, is it a kernel bug?
->
-Or is there any related patch could solve your test problem been merged into
->
-upstream ?
-I don't remember all prerequisite patches but you should be able to find
-http://marc.info/?l=linux-kernel&m=140326703108009&w=2
-"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
-and then look for dependencies.
-
-
->
->
-Thanks,
->
-zhanghailiang
->
->
->>>> and found that the only possible is the emulation of 'cpuid' instruct in
->
->>>> kvm/qemu has something wrong.
->
->>>> But since we can’t reproduce this problem, we are not quite sure.
->
->>>> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
->
->>>
->
->>> Can you explain the relationship to the cpuid emulation?  What do the
->
->>> traces say about vcpus 1 and 7?
->
->>
->
->> OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is
->
->> located in
->
->> do_boot_cpu(). It's in BSP context, the call process is:
->
->> BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() ->
->
->> do_boot_cpu() -> wakeup_secondary_via_INIT() to trigger APs.
->
->> It will wait 5s for APs to startup, if some AP not startup normally, it
->
->> will print 'CPU%d Stuck' or 'CPU%d: Not responding'.
->
->>
->
->> If it prints 'Stuck', it means the AP has received the SIPI interrupt and
->
->> begins to execute the code
->
->> 'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places
->
->> before smp_callin()(smpboot.c).
->
->> The follow is the starup process of BSP and AP.
->
->> BSP:
->
->> start_kernel()
->
->>     ->smp_init()
->
->>        ->smp_boot_cpus()
->
->>          ->do_boot_cpu()
->
->>              ->start_ip = trampoline_address(); //set the address that AP
->
->> will go to execute
->
->>              ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
->
->>              ->for (timeout = 0; timeout < 50000; timeout++)
->
->>                  if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;//
->
->> check if AP startup or not
->
->>
->
->> APs:
->
->> ENTRY(trampoline_data) (trampoline_64.S)
->
->>         ->ENTRY(secondary_startup_64) (head_64.S)
->
->>            ->start_secondary() (smpboot.c)
->
->>               ->cpu_init();
->
->>               ->smp_callin();
->
->>                   ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
->
->> comes here, the BSP will not prints the error message.
->
->>
->
->>   From above call process, we can be sure that, the AP has been stuck
->
->> between trampoline_data and the cpumask_set_cpu() in
->
->> smp_callin(), we look through these codes path carefully, and only found a
->
->> 'hlt' instruct that could block the process.
->
->> It is located in trampoline_data():
->
->>
->
->> ENTRY(trampoline_data)
->
->>           ...
->
->>
->
->>    call    verify_cpu              # Verify the cpu supports long mode
->
->>    testl   %eax, %eax              # Check for return code
->
->>    jnz     no_longmode
->
->>
->
->>           ...
->
->>
->
->> no_longmode:
->
->>    hlt
->
->>    jmp no_longmode
->
->>
->
->> For the process verify_cpu(),
->
->> we can only find the 'cpuid' sensitive instruct that could lead VM exit
->
->> from No-root mode.
->
->> This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading
->
->> to the fail in verify_cpu.
->
->>
->
->>   From the message in VM, we know vcpu1 and vcpu7 is something wrong.
->
->> [    5.060042] CPU1: Stuck ??
->
->> [   10.170815] CPU7: Stuck ??
->
->> [   10.171648] Brought up 6 CPUs
->
->>
->
->> Besides, the follow is the cpus message got from host.
->
->> 80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
->
->> qemu-monitor-command instance-0000000
->
->> * CPU #0: pc=0x00007f64160c683d thread_id=68570
->
->>     CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
->
->>     CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
->
->>     CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
->
->>     CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
->
->>     CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
->
->>     CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
->
->>     CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
->
->>
->
->> Oh, i also forgot to mention in the above message that, we have bond each
->
->> vCPU to different physical CPU in
->
->> host.
->
->>
->
->> Thanks,
->
->> zhanghailiang
->
->>
->
->>
->
->>
->
->>
->
->> --
->
->> To unsubscribe from this list: send the line "unsubscribe kvm" in
->
->> the body of a message to address@hidden
->
->> More majordomo info at
-http://vger.kernel.org/majordomo-info.html
->
->
->
->
->
-> .
->
->
->
->
->
-
-On 2015/7/7 20:21, Igor Mammedov wrote:
-On Tue, 7 Jul 2015 19:43:35 +0800
-zhanghailiang <address@hidden> wrote:
-On 2015/7/7 19:23, Igor Mammedov wrote:
-On Mon, 6 Jul 2015 17:59:10 +0800
-zhanghailiang <address@hidden> wrote:
-On 2015/7/6 16:45, Paolo Bonzini wrote:
-On 06/07/2015 09:54, zhanghailiang wrote:
-From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
-consuming any cpu (Should be in idle state),
-All of VCPUs' stacks in host is like bellow:
-
-[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
-[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
-[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
-[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
-[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
-[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
-[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
-[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
-[<ffffffffffffffff>] 0xffffffffffffffff
-
-We looked into the kernel codes that could leading to the above 'Stuck'
-warning,
-in current upstream there isn't any printk(...Stuck...) left since that code 
-path
-has been reworked.
-I've often seen this on over-committed host during guest CPUs up/down torture 
-test.
-Could you update guest kernel to upstream and see if issue reproduces?
-Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to 
-reproduce it.
-
-For your test case, is it a kernel bug?
-Or is there any related patch could solve your test problem been merged into
-upstream ?
-I don't remember all prerequisite patches but you should be able to find
-http://marc.info/?l=linux-kernel&m=140326703108009&w=2
-"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
-and then look for dependencies.
-Er, we have investigated this patch, and it is not related to our problem, :)
-
-Thanks.
-Thanks,
-zhanghailiang
-and found that the only possible is the emulation of 'cpuid' instruct in
-kvm/qemu has something wrong.
-But since we can’t reproduce this problem, we are not quite sure.
-Is there any possible that the cupid emulation in kvm/qemu has some bug ?
-Can you explain the relationship to the cpuid emulation?  What do the
-traces say about vcpus 1 and 7?
-OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
-located in
-do_boot_cpu(). It's in BSP context, the call process is:
-BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
--> wakeup_secondary_via_INIT() to trigger APs.
-It will wait 5s for APs to startup, if some AP not startup normally, it will 
-print 'CPU%d Stuck' or 'CPU%d: Not responding'.
-
-If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
-begins to execute the code
-'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
-smp_callin()(smpboot.c).
-The follow is the starup process of BSP and AP.
-BSP:
-start_kernel()
-     ->smp_init()
-        ->smp_boot_cpus()
-          ->do_boot_cpu()
-              ->start_ip = trampoline_address(); //set the address that AP will 
-go to execute
-              ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
-              ->for (timeout = 0; timeout < 50000; timeout++)
-                  if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if 
-AP startup or not
-
-APs:
-ENTRY(trampoline_data) (trampoline_64.S)
-         ->ENTRY(secondary_startup_64) (head_64.S)
-            ->start_secondary() (smpboot.c)
-               ->cpu_init();
-               ->smp_callin();
-                   ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP 
-comes here, the BSP will not prints the error message.
-
-   From above call process, we can be sure that, the AP has been stuck between 
-trampoline_data and the cpumask_set_cpu() in
-smp_callin(), we look through these codes path carefully, and only found a 
-'hlt' instruct that could block the process.
-It is located in trampoline_data():
-
-ENTRY(trampoline_data)
-           ...
-
-        call    verify_cpu              # Verify the cpu supports long mode
-        testl   %eax, %eax              # Check for return code
-        jnz     no_longmode
-
-           ...
-
-no_longmode:
-        hlt
-        jmp no_longmode
-
-For the process verify_cpu(),
-we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
-No-root mode.
-This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
-the fail in verify_cpu.
-
-   From the message in VM, we know vcpu1 and vcpu7 is something wrong.
-[    5.060042] CPU1: Stuck ??
-[   10.170815] CPU7: Stuck ??
-[   10.171648] Brought up 6 CPUs
-
-Besides, the follow is the cpus message got from host.
-80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
-instance-0000000
-* CPU #0: pc=0x00007f64160c683d thread_id=68570
-     CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
-     CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
-     CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
-     CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
-     CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
-     CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
-     CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
-
-Oh, i also forgot to mention in the above message that, we have bond each vCPU 
-to different physical CPU in
-host.
-
-Thanks,
-zhanghailiang
-
-
-
-
---
-To unsubscribe from this list: send the line "unsubscribe kvm" in
-the body of a message to address@hidden
-More majordomo info at
-http://vger.kernel.org/majordomo-info.html
-.
-.
-
diff --git a/results/classifier/008/other/28596630 b/results/classifier/008/other/28596630
deleted file mode 100644
index 852b323cd..000000000
--- a/results/classifier/008/other/28596630
+++ /dev/null
@@ -1,123 +0,0 @@
-device: 0.835
-semantic: 0.814
-performance: 0.797
-permissions: 0.791
-graphic: 0.785
-network: 0.780
-PID: 0.750
-other: 0.707
-debug: 0.704
-socket: 0.697
-vnc: 0.674
-KVM: 0.649
-files: 0.630
-boot: 0.609
-
-[Qemu-devel] [BUG] [low severity] a strange appearance of message involving slirp while doing "empty" make
-
-Folks,
-
-If qemu tree is already fully built, and "make" is attempted, for 3.1, the 
-outcome is:
-
-$ make
-        CHK version_gen.h
-$
-
-For 4.0-rc0, the outcome seems to be different:
-
-$ make
-make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
-make[1]: Nothing to be done for 'all'.
-make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
-        CHK version_gen.h
-$
-
-Not sure how significant is that, but I report it just in case.
-
-Yours,
-Aleksandar
-
-On 20/03/2019 22.08, Aleksandar Markovic wrote:
->
-Folks,
->
->
-If qemu tree is already fully built, and "make" is attempted, for 3.1, the
->
-outcome is:
->
->
-$ make
->
-CHK version_gen.h
->
-$
->
->
-For 4.0-rc0, the outcome seems to be different:
->
->
-$ make
->
-make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
->
-make[1]: Nothing to be done for 'all'.
->
-make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
->
-CHK version_gen.h
->
-$
->
->
-Not sure how significant is that, but I report it just in case.
-It's likely because slirp is currently being reworked to become a
-separate project, so the makefiles have been changed a little bit. I
-guess the message will go away again once slirp has become a stand-alone
-library.
-
- Thomas
-
-On Fri, 22 Mar 2019 at 04:59, Thomas Huth <address@hidden> wrote:
->
-On 20/03/2019 22.08, Aleksandar Markovic wrote:
->
-> $ make
->
-> make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
->
-> make[1]: Nothing to be done for 'all'.
->
-> make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
->
->       CHK version_gen.h
->
-> $
->
->
->
-> Not sure how significant is that, but I report it just in case.
->
->
-It's likely because slirp is currently being reworked to become a
->
-separate project, so the makefiles have been changed a little bit. I
->
-guess the message will go away again once slirp has become a stand-alone
->
-library.
-Well, we'll still need to ship slirp for the foreseeable future...
-
-I think the cause of this is that the rule in Makefile for
-calling the slirp Makefile is not passing it $(SUBDIR_MAKEFLAGS)
-like all the other recursive make invocations. If we do that
-then we'll suppress the entering/leaving messages for
-non-verbose builds. (Some tweaking will be needed as
-it looks like the slirp makefile has picked an incompatible
-meaning for $BUILD_DIR, which the SUBDIR_MAKEFLAGS will
-also be passing to it.)
-
-thanks
--- PMM
-
diff --git a/results/classifier/008/other/31349848 b/results/classifier/008/other/31349848
deleted file mode 100644
index 8dc0eb6b0..000000000
--- a/results/classifier/008/other/31349848
+++ /dev/null
@@ -1,164 +0,0 @@
-permissions: 0.908
-PID: 0.903
-other: 0.901
-device: 0.881
-graphic: 0.876
-performance: 0.864
-vnc: 0.854
-semantic: 0.846
-socket: 0.846
-KVM: 0.827
-debug: 0.826
-files: 0.820
-boot: 0.815
-network: 0.769
-
-[Qemu-devel]  [BUG] qemu stuck when detach host-usb device
-
-Description of problem:
-The guest has a host-usb device(Kingston Technology DataTraveler 100 G3/G4/SE9 
-G2), which is attached
-to xhci controller(on host). Qemu will stuck if I detach it from guest.
-
-How reproducible:
-100%
-
-Steps to Reproduce:
-1.            Use usb stick to copy files in guest , make it busy working.
-2.            virsh detach-device vm_name usb.xml
-
-Then qemu will stuck for 20s, I found this is because libusb_release_interface 
-block for 20s.
-Dmesg prints:
-
-[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
-[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
-[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
-[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
-
-Is this a hardware error or software's bug?
-
-On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote:
->
-Description of problem:
->
-The guest has a host-usb device(Kingston Technology DataTraveler 100
->
-G3/G4/SE9 G2), which is attached
->
-to xhci controller(on host). Qemu will stuck if I detach it from guest.
->
->
-How reproducible:
->
-100%
->
->
-Steps to Reproduce:
->
-1.            Use usb stick to copy files in guest , make it busy working.
->
-2.            virsh detach-device vm_name usb.xml
->
->
-Then qemu will stuck for 20s, I found this is because
->
-libusb_release_interface block for 20s.
->
-Dmesg prints:
->
->
-[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
->
-[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
->
-[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
->
-[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
->
->
-Is this a hardware error or software's bug?
-I'd guess software error, could be is libusb or (host) linux kernel.
-Cc'ing libusb-devel.
-
-cheers,
-  Gerd
-
->
------Original Message-----
->
-From: Gerd Hoffmann [
-mailto:address@hidden
->
-Sent: Tuesday, November 27, 2018 2:09 PM
->
-To: linzhecheng <address@hidden>
->
-Cc: address@hidden; wangxin (U) <address@hidden>;
->
-Zhoujian (jay) <address@hidden>; address@hidden
->
-Subject: Re: [Qemu-devel] [BUG] qemu stuck when detach host-usb device
->
->
-On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote:
->
-> Description of problem:
->
-> The guest has a host-usb device(Kingston Technology DataTraveler 100
->
-> G3/G4/SE9 G2), which is attached to xhci controller(on host). Qemu will
->
-> stuck
->
-if I detach it from guest.
->
->
->
-> How reproducible:
->
-> 100%
->
->
->
-> Steps to Reproduce:
->
-> 1.            Use usb stick to copy files in guest , make it busy working.
->
-> 2.            virsh detach-device vm_name usb.xml
->
->
->
-> Then qemu will stuck for 20s, I found this is because
->
-> libusb_release_interface
->
-block for 20s.
->
-> Dmesg prints:
->
->
->
-> [35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
->
-> [35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
->
-> [35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
->
-> [35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
->
->
->
-> Is this a hardware error or software's bug?
->
->
-I'd guess software error, could be is libusb or (host) linux kernel.
->
-Cc'ing libusb-devel.
-Perhaps it's usb driver's bug. Could you also reproduce it?
->
->
-cheers,
->
-Gerd
-
diff --git a/results/classifier/008/other/32484936 b/results/classifier/008/other/32484936
deleted file mode 100644
index f849e1e5e..000000000
--- a/results/classifier/008/other/32484936
+++ /dev/null
@@ -1,233 +0,0 @@
-other: 0.856
-PID: 0.839
-semantic: 0.832
-vnc: 0.830
-device: 0.830
-socket: 0.829
-permissions: 0.826
-debug: 0.825
-files: 0.816
-performance: 0.815
-graphic: 0.813
-network: 0.811
-boot: 0.810
-KVM: 0.793
-
-[Qemu-devel] [Snapshot Bug?]Qcow2 meta data corruption
-
-Hi all,
-There was a problem about qcow2 image file happened in my serval vms and I could not figure it out,
-so have to ask for some help.
-Here is the thing:
-At first, I found there were some data corruption in a vm, so I did qemu-img check to all my vms.
-parts of check report:
-3-Leaked cluster 2926229 refcount=1 reference=0
-4-Leaked cluster 3021181 refcount=1 reference=0
-5-Leaked cluster 3021182 refcount=1 reference=0
-6-Leaked cluster 3021183 refcount=1 reference=0
-7-Leaked cluster 3021184 refcount=1 reference=0
-8-ERROR cluster 3102547 refcount=3 reference=4
-9-ERROR cluster 3111536 refcount=3 reference=4
-10-ERROR cluster 3113369 refcount=3 reference=4
-11-ERROR cluster 3235590 refcount=10 reference=11
-12-ERROR cluster 3235591 refcount=10 reference=11
-423-Warning: cluster offset=0xc000c00020000 is after the end of the image file, can't properly check refcounts.
-424-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-425-Warning: cluster offset=0xc0001000c0000 is after the end of the image file, can't properly check refcounts.
-426-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-427-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-428-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-429-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-430-Warning: cluster offset=0xc000c00010000 is after the end of the image file, can't properly check refcounts.
-After a futher look in, I found two l2 entries point to the same cluster, and that was found in serval qcow2 image files of different vms.
-Like this:
-table entry conflict (with our qcow2 check 
-tool):
-a table offset : 0x00000093f7080000 level : 2, l1 table entry 100, l2 table entry 7
-b table offset : 0x00000093f7080000 level : 2, l1 table entry 5, l2 table entry 7
-table entry conflict :
-a table offset : 0x00000000a01e0000 level : 2, l1 table entry 100, l2 table entry 19
-b table offset : 0x00000000a01e0000 level : 2, l1 table entry 5, l2 table entry 19
-table entry conflict :
-a table offset : 0x00000000a01d0000 level : 2, l1 table entry 100, l2 table entry 18
-b table offset : 0x00000000a01d0000 level : 2, l1 table entry 5, l2 table entry 18
-table entry conflict :
-a table offset : 0x00000000a01c0000 level : 2, l1 table entry 100, l2 table entry 17
-b table offset : 0x00000000a01c0000 level : 2, l1 table entry 5, l2 table entry 17
-table entry conflict :
-a table offset : 0x00000000a01b0000 level : 2, l1 table entry 100, l2 table entry 16
-b table offset : 0x00000000a01b0000 level : 2, l1 table entry 5, l2 table entry 16
-I think the problem is relate to the snapshot create, delete. But I cant reproduce it .
-Can Anyone give a hint about how this happen?
-Qemu version 2.0.1, I download the source code and make install it.
-Qemu parameters:
-/usr/bin/kvm -chardev socket,id=qmp,path=/var/run/qemu-server/5855899639838.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5855899639838.pid -daemonize -name yfMailSvr-200.200.0.14 -smp sockets=1,cores=4 -cpu core2duo,hv_spinlocks=0xffff,hv_relaxed,hv_time,hv_vapic,+sse4.1,+sse4.2,+x2apic,+erms,+smep,+fsgsbase,+f16c,+dca,+pcid,+pdcm,+xtpr,+ht,+ss,+acpi,+ds -nodefaults -vga cirrus -k en-us -boot menu=on,splash-time=8000 -m 8192 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0 -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=585589963983800,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,romfile=,mac=FE:FC:FE:F0:AB:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1
-Thanks
-Sangfor VT.
-leijian
-
-Hi all,
-There was a problem about qcow2 image file happened in my serval vms and I could not figure it out,
-so have to ask for some help.
-Here is the thing:
-At first, I found there were some data corruption in a vm, so I did qemu-img check to all my vms.
-parts of check report:
-3-Leaked cluster 2926229 refcount=1 reference=0
-4-Leaked cluster 3021181 refcount=1 reference=0
-5-Leaked cluster 3021182 refcount=1 reference=0
-6-Leaked cluster 3021183 refcount=1 reference=0
-7-Leaked cluster 3021184 refcount=1 reference=0
-8-ERROR cluster 3102547 refcount=3 reference=4
-9-ERROR cluster 3111536 refcount=3 reference=4
-10-ERROR cluster 3113369 refcount=3 reference=4
-11-ERROR cluster 3235590 refcount=10 reference=11
-12-ERROR cluster 3235591 refcount=10 reference=11
-423-Warning: cluster offset=0xc000c00020000 is after the end of the image file, can't properly check refcounts.
-424-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-425-Warning: cluster offset=0xc0001000c0000 is after the end of the image file, can't properly check refcounts.
-426-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-427-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-428-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-429-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts.
-430-Warning: cluster offset=0xc000c00010000 is after the end of the image file, can't properly check refcounts.
-After a futher look in, I found two l2 entries point to the same cluster, and that was found in serval qcow2 image files of different vms.
-Like this:
-table entry conflict (with our qcow2 check 
-tool):
-a table offset : 0x00000093f7080000 level : 2, l1 table entry 100, l2 table entry 7
-b table offset : 0x00000093f7080000 level : 2, l1 table entry 5, l2 table entry 7
-table entry conflict :
-a table offset : 0x00000000a01e0000 level : 2, l1 table entry 100, l2 table entry 19
-b table offset : 0x00000000a01e0000 level : 2, l1 table entry 5, l2 table entry 19
-table entry conflict :
-a table offset : 0x00000000a01d0000 level : 2, l1 table entry 100, l2 table entry 18
-b table offset : 0x00000000a01d0000 level : 2, l1 table entry 5, l2 table entry 18
-table entry conflict :
-a table offset : 0x00000000a01c0000 level : 2, l1 table entry 100, l2 table entry 17
-b table offset : 0x00000000a01c0000 level : 2, l1 table entry 5, l2 table entry 17
-table entry conflict :
-a table offset : 0x00000000a01b0000 level : 2, l1 table entry 100, l2 table entry 16
-b table offset : 0x00000000a01b0000 level : 2, l1 table entry 5, l2 table entry 16
-I think the problem is relate to the snapshot create, delete. But I cant reproduce it .
-Can Anyone give a hint about how this happen?
-Qemu version 2.0.1, I download the source code and make install it.
-Qemu parameters:
-/usr/bin/kvm -chardev socket,id=qmp,path=/var/run/qemu-server/5855899639838.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5855899639838.pid -daemonize -name yfMailSvr-200.200.0.14 -smp sockets=1,cores=4 -cpu core2duo,hv_spinlocks=0xffff,hv_relaxed,hv_time,hv_vapic,+sse4.1,+sse4.2,+x2apic,+erms,+smep,+fsgsbase,+f16c,+dca,+pcid,+pdcm,+xtpr,+ht,+ss,+acpi,+ds -nodefaults -vga cirrus -k en-us -boot menu=on,splash-time=8000 -m 8192 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0 -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=585589963983800,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,romfile=,mac=FE:FC:FE:F0:AB:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1
-Thanks
-Sangfor VT.
-leijian
-
-Am 03.04.2015 um 12:04 hat leijian geschrieben:
->
-Hi all,
->
->
-There was a problem about qcow2 image file happened in my serval vms and I
->
-could not figure it out,
->
-so have to ask for some help.
->
-[...]
->
-I think the problem is relate to the snapshot create, delete. But I cant
->
-reproduce it .
->
-Can Anyone give a hint about how this happen?
-How did you create/delete your snapshots?
-
-More specifically, did you take care to never access your image from
-more than one process (except if both are read-only)? It happens
-occasionally that people use 'qemu-img snapshot' while the VM is
-running. This is wrong and can corrupt the image.
-
-Kevin
-
-On 04/07/2015 03:33 AM, Kevin Wolf wrote:
->
-More specifically, did you take care to never access your image from
->
-more than one process (except if both are read-only)? It happens
->
-occasionally that people use 'qemu-img snapshot' while the VM is
->
-running. This is wrong and can corrupt the image.
-Since this has been done by more than one person, I'm wondering if there
-is something we can do in the qcow2 format itself to make it harder for
-the casual user to cause corruption.  Maybe if we declare some bit or
-extension header for an image open for writing, which other readers can
-use as a warning ("this image is being actively modified; reading it may
-fail"), and other writers can use to deny access ("another process is
-already modifying this image"), where a writer should set that bit
-before writing anything else in the file, then clear it on exit.  Of
-course, you'd need a way to override the bit to actively clear it to
-recover from the case of a writer dying unexpectedly without resetting
-it normally.  And it won't help the case of a reader opening the file
-first, followed by a writer, where the reader could still get thrown off
-track.
-
-Or maybe we could document in the qcow2 format that all readers and
-writers should attempt to obtain the appropriate flock() permissions [or
-other appropriate advisory locking scheme] over the file header, so that
-cooperating processes that both use advisory locking will know when the
-file is in use by another process.
-
--- 
-Eric Blake   eblake redhat com    +1-919-301-3266
-Libvirt virtualization library
-http://libvirt.org
-signature.asc
-Description:
-OpenPGP digital signature
-
-
-I created/deleted the snapshot by using qmp command "snapshot_blkdev_internal"/"snapshot_delete_blkdev_internal", and for avoiding the case you mentioned above, I have added the flock() permission in the qemu_open().
-Here is the test of doing qemu-img snapshot to a running vm:
-Diskfile:/sf/data/36c81f660e38b3b001b183da50b477d89_f8bc123b3e74/images/host-f8bc123b3e74/4a8d8728fcdc/Devried30030.vm/vm-disk-1.qcow2 is used! errno=Resource temporarily unavailable
-Does the two cluster entry happen to be the same because of the refcount of using cluster decrease to 0 unexpectedly and  is allocated again?
-If it was not accessing the image from more than one process, any other exceptions I can test for?
-Thanks
-leijian
-From:
-Eric Blake
-Date:
-2015-04-07 23:27
-To:
-Kevin Wolf
-;
-leijian
-CC:
-qemu-devel
-;
-stefanha
-Subject:
-Re: [Qemu-devel] [Snapshot Bug?]Qcow2 meta data 
-corruption
-On 04/07/2015 03:33 AM, Kevin Wolf wrote:
-> More specifically, did you take care to never access your image from
-> more than one process (except if both are read-only)? It happens
-> occasionally that people use 'qemu-img snapshot' while the VM is
-> running. This is wrong and can corrupt the image.
-Since this has been done by more than one person, I'm wondering if there
-is something we can do in the qcow2 format itself to make it harder for
-the casual user to cause corruption.  Maybe if we declare some bit or
-extension header for an image open for writing, which other readers can
-use as a warning ("this image is being actively modified; reading it may
-fail"), and other writers can use to deny access ("another process is
-already modifying this image"), where a writer should set that bit
-before writing anything else in the file, then clear it on exit.  Of
-course, you'd need a way to override the bit to actively clear it to
-recover from the case of a writer dying unexpectedly without resetting
-it normally.  And it won't help the case of a reader opening the file
-first, followed by a writer, where the reader could still get thrown off
-track.
-Or maybe we could document in the qcow2 format that all readers and
-writers should attempt to obtain the appropriate flock() permissions [or
-other appropriate advisory locking scheme] over the file header, so that
-cooperating processes that both use advisory locking will know when the
-file is in use by another process.
---
-Eric Blake   eblake redhat com    +1-919-301-3266
-Libvirt virtualization library http://libvirt.org
-
diff --git a/results/classifier/008/other/33802194 b/results/classifier/008/other/33802194
deleted file mode 100644
index 496a41215..000000000
--- a/results/classifier/008/other/33802194
+++ /dev/null
@@ -1,4949 +0,0 @@
-vnc: 0.728
-KVM: 0.725
-permissions: 0.705
-device: 0.691
-debug: 0.681
-performance: 0.659
-semantic: 0.656
-socket: 0.655
-network: 0.644
-graphic: 0.640
-other: 0.637
-PID: 0.636
-boot: 0.631
-files: 0.598
-
-[BUG] cxl can not create region
-
-Hi list
-
-I want to test cxl functions in arm64, and found some problems I can't
-figure out.
-
-My test environment:
-
-1. build latest bios from
-https://github.com/tianocore/edk2.git
-master
-branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2)
-2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git
-master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm
-support patch:
-https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/
-3. build Linux kernel from
-https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git
-preview
-branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683)
-4. build latest ndctl tools from
-https://github.com/pmem/ndctl
-create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159)
-
-And my qemu test commands:
-sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \
-        -cpu max -smp 8 -nographic -no-reboot \
-        -kernel $KERNEL -bios $BIOS_BIN \
-        -drive if=none,file=$ROOTFS,format=qcow2,id=hd \
-        -device virtio-blk-pci,drive=hd -append 'root=/dev/vda1
-nokaslr dyndbg="module cxl* +p"' \
-        -object memory-backend-ram,size=4G,id=mem0 \
-        -numa node,nodeid=0,cpus=0-7,memdev=mem0 \
-        -net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \
-        -object
-memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M
-\
-        -object
-memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M
-\
-        -object
-memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M
-\
-        -object
-memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M
-\
-        -object
-memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M
-\
-        -object
-memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M
-\
-        -object
-memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M
-\
-        -object
-memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M
-\
-        -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-        -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-        -device cxl-upstream,bus=root_port0,id=us0 \
-        -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-        -device
-cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
-        -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-        -device
-cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \
-        -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-        -device
-cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \
-        -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-        -device
-cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \
-        -M 
-cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
-
-And I have got two problems.
-1. When I want to create x1 region with command: "cxl create-region -d
-decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer
-reference. Crash log:
-
-[  534.697324] cxl_region region0: config state: 0
-[  534.697346] cxl_region region0: probe: -6
-[  534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0
-[  534.699115] cxl region0: mem0:endpoint3 decoder3.0 add:
-mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
-[  534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
-mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
-[  534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add:
-mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
-[  534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
-[  534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
-for mem0:decoder3.0 @ 0
-[  534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256
-[  534.699193] cxl region0: 0000:0d:00.0:port2 target[0] =
-0000:0e:00.0 for mem0:decoder3.0 @ 0
-[  534.699405] Unable to handle kernel NULL pointer dereference at
-virtual address 0000000000000000
-[  534.701474] Mem abort info:
-[  534.701994]   ESR = 0x0000000086000004
-[  534.702653]   EC = 0x21: IABT (current EL), IL = 32 bits
-[  534.703616]   SET = 0, FnV = 0
-[  534.704174]   EA = 0, S1PTW = 0
-[  534.704803]   FSC = 0x04: level 0 translation fault
-[  534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000
-[  534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
-[  534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP
-[  534.710301] Modules linked in:
-[  534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted
-5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11
-[  534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
-[  534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
-[  534.719190] pc : 0x0
-[  534.719928] lr : commit_store+0x118/0x2cc
-[  534.721007] sp : ffff80000aec3c30
-[  534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: ffff0000c0c06b30
-[  534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: ffff0000c0a29400
-[  534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: ffff0000c0c06800
-[  534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: 0000000000000000
-[  534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffd41fe838
-[  534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
-[  534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
-[  534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff0000c0906e80
-[  534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000aec3bf0
-[  534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c155a000
-[  534.738878] Call trace:
-[  534.739368]  0x0
-[  534.739713]  dev_attr_store+0x1c/0x30
-[  534.740186]  sysfs_kf_write+0x48/0x58
-[  534.740961]  kernfs_fop_write_iter+0x128/0x184
-[  534.741872]  new_sync_write+0xdc/0x158
-[  534.742706]  vfs_write+0x1ac/0x2a8
-[  534.743440]  ksys_write+0x68/0xf0
-[  534.744328]  __arm64_sys_write+0x1c/0x28
-[  534.745180]  invoke_syscall+0x44/0xf0
-[  534.745989]  el0_svc_common+0x4c/0xfc
-[  534.746661]  do_el0_svc+0x60/0xa8
-[  534.747378]  el0_svc+0x2c/0x78
-[  534.748066]  el0t_64_sync_handler+0xb8/0x12c
-[  534.748919]  el0t_64_sync+0x18c/0x190
-[  534.749629] Code: bad PC value
-[  534.750169] ---[ end trace 0000000000000000 ]---
-
-2. When I want to create x4 region with command: "cxl create-region -d
-decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors:
-
-cxl region: create_region: region0: failed to set target3 to mem3
-cxl region: cmd_create_region: created 0 regions
-
-And kernel log as below:
-[   60.536663] cxl_region region0: config state: 0
-[   60.536675] cxl_region region0: probe: -6
-[   60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0
-[   60.538251] cxl region0: mem0:endpoint3 decoder3.0 add:
-mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
-[   60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
-mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
-[   60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add:
-mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
-[   60.538647] cxl region0: mem1:endpoint4 decoder4.0 add:
-mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1
-[   60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
-mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2
-[   60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add:
-mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1
-[   60.539311] cxl region0: mem2:endpoint5 decoder5.0 add:
-mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1
-[   60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
-mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3
-[   60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add:
-mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1
-[   60.539711] cxl region0: mem3:endpoint6 decoder6.0 add:
-mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1
-[   60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
-mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4
-[   60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add:
-mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1
-[   60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
-[   60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
-for mem0:decoder3.0 @ 0
-[   60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512
-[   60.539758] cxl region0: 0000:0d:00.0:port2 target[0] =
-0000:0e:00.0 for mem0:decoder3.0 @ 0
-[   60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at 1
-
-I have tried to write sysfs node manually, got same errors.
-
-Hope I can get some helps here.
-
-Bob
-
-On Fri, 5 Aug 2022 10:20:23 +0800
-Bobo WL <lmw.bobo@gmail.com> wrote:
-
->
-Hi list
->
->
-I want to test cxl functions in arm64, and found some problems I can't
->
-figure out.
-Hi Bob,
-
-Glad to see people testing this code.
-
->
->
-My test environment:
->
->
-1. build latest bios from
-https://github.com/tianocore/edk2.git
-master
->
-branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2)
->
-2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git
->
-master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm
->
-support patch:
->
-https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/
->
-3. build Linux kernel from
->
-https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git
-preview
->
-branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683)
->
-4. build latest ndctl tools from
-https://github.com/pmem/ndctl
->
-create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159)
->
->
-And my qemu test commands:
->
-sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \
->
--cpu max -smp 8 -nographic -no-reboot \
->
--kernel $KERNEL -bios $BIOS_BIN \
->
--drive if=none,file=$ROOTFS,format=qcow2,id=hd \
->
--device virtio-blk-pci,drive=hd -append 'root=/dev/vda1
->
-nokaslr dyndbg="module cxl* +p"' \
->
--object memory-backend-ram,size=4G,id=mem0 \
->
--numa node,nodeid=0,cpus=0-7,memdev=mem0 \
->
--net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \
->
--object
->
-memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M
->
-\
->
--device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
->
--device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-Probably not related to your problem, but there is a disconnect in QEMU /
-kernel assumptionsaround the presence of an HDM decoder when a HB only
-has a single root port. Spec allows it to be provided or not as an 
-implementation choice.
-Kernel assumes it isn't provide. Qemu assumes it is.
-
-The temporary solution is to throw in a second root port on the HB and not
-connect anything to it.  Longer term I may special case this so that the 
-particular
-decoder defaults to pass through settings in QEMU if there is only one root 
-port.
-
->
--device cxl-upstream,bus=root_port0,id=us0 \
->
--device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
->
--device
->
-cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
->
--device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
->
--device
->
-cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \
->
--device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
->
--device
->
-cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \
->
--device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
->
--device
->
-cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \
->
--M
->
-cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
->
->
-And I have got two problems.
->
-1. When I want to create x1 region with command: "cxl create-region -d
->
-decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer
->
-reference. Crash log:
->
->
-[  534.697324] cxl_region region0: config state: 0
->
-[  534.697346] cxl_region region0: probe: -6
-Seems odd this is up here.  But maybe fine.
-
->
-[  534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0
->
-[  534.699115] cxl region0: mem0:endpoint3 decoder3.0 add:
->
-mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
->
-[  534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
->
-mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
->
-[  534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add:
->
-mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
->
-[  534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
->
-[  534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
->
-for mem0:decoder3.0 @ 0
->
-[  534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256
->
-[  534.699193] cxl region0: 0000:0d:00.0:port2 target[0] =
->
-0000:0e:00.0 for mem0:decoder3.0 @ 0
->
-[  534.699405] Unable to handle kernel NULL pointer dereference at
->
-virtual address 0000000000000000
->
-[  534.701474] Mem abort info:
->
-[  534.701994]   ESR = 0x0000000086000004
->
-[  534.702653]   EC = 0x21: IABT (current EL), IL = 32 bits
->
-[  534.703616]   SET = 0, FnV = 0
->
-[  534.704174]   EA = 0, S1PTW = 0
->
-[  534.704803]   FSC = 0x04: level 0 translation fault
->
-[  534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000
->
-[  534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
->
-[  534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP
->
-[  534.710301] Modules linked in:
->
-[  534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted
->
-5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11
->
-[  534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
->
-[  534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
->
-[  534.719190] pc : 0x0
->
-[  534.719928] lr : commit_store+0x118/0x2cc
->
-[  534.721007] sp : ffff80000aec3c30
->
-[  534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27:
->
-ffff0000c0c06b30
->
-[  534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24:
->
-ffff0000c0a29400
->
-[  534.725440] x23: 0000000000000003 x22: 0000000000000000 x21:
->
-ffff0000c0c06800
->
-[  534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18:
->
-0000000000000000
->
-[  534.729138] x17: 0000000000000000 x16: 0000000000000000 x15:
->
-0000ffffd41fe838
->
-[  534.731046] x14: 0000000000000000 x13: 0000000000000000 x12:
->
-0000000000000000
->
-[  534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 :
->
-0000000000000000
->
-[  534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 :
->
-ffff0000c0906e80
->
-[  534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
->
-ffff80000aec3bf0
->
-[  534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
->
-ffff0000c155a000
->
-[  534.738878] Call trace:
->
-[  534.739368]  0x0
->
-[  534.739713]  dev_attr_store+0x1c/0x30
->
-[  534.740186]  sysfs_kf_write+0x48/0x58
->
-[  534.740961]  kernfs_fop_write_iter+0x128/0x184
->
-[  534.741872]  new_sync_write+0xdc/0x158
->
-[  534.742706]  vfs_write+0x1ac/0x2a8
->
-[  534.743440]  ksys_write+0x68/0xf0
->
-[  534.744328]  __arm64_sys_write+0x1c/0x28
->
-[  534.745180]  invoke_syscall+0x44/0xf0
->
-[  534.745989]  el0_svc_common+0x4c/0xfc
->
-[  534.746661]  do_el0_svc+0x60/0xa8
->
-[  534.747378]  el0_svc+0x2c/0x78
->
-[  534.748066]  el0t_64_sync_handler+0xb8/0x12c
->
-[  534.748919]  el0t_64_sync+0x18c/0x190
->
-[  534.749629] Code: bad PC value
->
-[  534.750169] ---[ end trace 0000000000000000 ]---
->
->
-2. When I want to create x4 region with command: "cxl create-region -d
->
-decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors:
->
->
-cxl region: create_region: region0: failed to set target3 to mem3
->
-cxl region: cmd_create_region: created 0 regions
->
->
-And kernel log as below:
->
-[   60.536663] cxl_region region0: config state: 0
->
-[   60.536675] cxl_region region0: probe: -6
->
-[   60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0
->
-[   60.538251] cxl region0: mem0:endpoint3 decoder3.0 add:
->
-mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
->
-[   60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
->
-mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
->
-[   60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add:
->
-mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
->
-[   60.538647] cxl region0: mem1:endpoint4 decoder4.0 add:
->
-mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1
->
-[   60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
->
-mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2
->
-[   60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add:
->
-mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1
->
-[   60.539311] cxl region0: mem2:endpoint5 decoder5.0 add:
->
-mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1
->
-[   60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
->
-mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3
->
-[   60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add:
->
-mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1
->
-[   60.539711] cxl region0: mem3:endpoint6 decoder6.0 add:
->
-mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1
->
-[   60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
->
-mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4
->
-[   60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add:
->
-mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1
->
-[   60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
->
-[   60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
->
-for mem0:decoder3.0 @ 0
->
-[   60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512
-This looks like off by 1 that should be fixed in the below mentioned
-cxl/pending branch.  That ig should be 256.  Note the fix was
-for a test case with a fat HB and no switch, but certainly looks
-like this is the same issue.
-
->
-[   60.539758] cxl region0: 0000:0d:00.0:port2 target[0] =
->
-0000:0e:00.0 for mem0:decoder3.0 @ 0
->
-[   60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at
->
-1
->
->
-I have tried to write sysfs node manually, got same errors.
-When stepping through by hand, which sysfs write triggers the crash above?
-
-Not sure it's related, but I've just sent out a fix to the
-target register handling in QEMU.
-20220808122051.14822-1-Jonathan.Cameron@huawei.com
-/T/#m47ff985412ce44559e6b04d677c302f8cd371330">https://lore.kernel.org/linux-cxl/
-20220808122051.14822-1-Jonathan.Cameron@huawei.com
-/T/#m47ff985412ce44559e6b04d677c302f8cd371330
-I did have one instance last week of triggering what looked to be a race 
-condition but
-the stack trace doesn't looks related to what you've hit.
-
-It will probably be a few days before I have time to take a look at replicating
-what you have seen.
-
-If you have time, try using the kernel.org cxl/pending branch as there are
-a few additional fixes on there since you sent this email.  Optimistic to hope
-this is covered by one of those, but at least it will mean we are trying to 
-replicate
-on same branch.
-
-Jonathan
-
-
->
->
-Hope I can get some helps here.
->
->
-Bob
-
-Hi Jonathan
-
-Thanks for your reply!
-
-On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
-<Jonathan.Cameron@huawei.com> wrote:
->
->
-Probably not related to your problem, but there is a disconnect in QEMU /
->
-kernel assumptionsaround the presence of an HDM decoder when a HB only
->
-has a single root port. Spec allows it to be provided or not as an
->
-implementation choice.
->
-Kernel assumes it isn't provide. Qemu assumes it is.
->
->
-The temporary solution is to throw in a second root port on the HB and not
->
-connect anything to it.  Longer term I may special case this so that the
->
-particular
->
-decoder defaults to pass through settings in QEMU if there is only one root
->
-port.
->
-You are right! After adding an extra HB in qemu, I can create a x1
-region successfully.
-But have some errors in Nvdimm:
-
-[   74.925838] Unknown online node for memory at 0x10000000000, assuming node 0
-[   74.925846] Unknown target node for memory at 0x10000000000, assuming node 0
-[   74.927470] nd_region region0: nmem0: is disabled, failing probe
-
-And x4 region still failed with same errors, using latest cxl/preview
-branch don't work.
-I have picked "Two CXL emulation fixes" patches in qemu, still not working.
-
-Bob
-
-On Tue, 9 Aug 2022 21:07:06 +0800
-Bobo WL <lmw.bobo@gmail.com> wrote:
-
->
-Hi Jonathan
->
->
-Thanks for your reply!
->
->
-On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-<Jonathan.Cameron@huawei.com> wrote:
->
->
->
-> Probably not related to your problem, but there is a disconnect in QEMU /
->
-> kernel assumptionsaround the presence of an HDM decoder when a HB only
->
-> has a single root port. Spec allows it to be provided or not as an
->
-> implementation choice.
->
-> Kernel assumes it isn't provide. Qemu assumes it is.
->
->
->
-> The temporary solution is to throw in a second root port on the HB and not
->
-> connect anything to it.  Longer term I may special case this so that the
->
-> particular
->
-> decoder defaults to pass through settings in QEMU if there is only one root
->
-> port.
->
->
->
->
-You are right! After adding an extra HB in qemu, I can create a x1
->
-region successfully.
->
-But have some errors in Nvdimm:
->
->
-[   74.925838] Unknown online node for memory at 0x10000000000, assuming node > 0
->
-[   74.925846] Unknown target node for memory at 0x10000000000, assuming node > 0
->
-[   74.927470] nd_region region0: nmem0: is disabled, failing probe
-Ah. I've seen this one, but not chased it down yet.  Was on my todo list to 
-chase
-down. Once I reach this state I can verify the HDM Decode is correct which is 
-what
-I've been using to test (Which wasn't true until earlier this week). 
-I'm currently testing via devmem, more for historical reasons than because it 
-makes
-that much sense anymore.  
-
->
->
-And x4 region still failed with same errors, using latest cxl/preview
->
-branch don't work.
->
-I have picked "Two CXL emulation fixes" patches in qemu, still not working.
->
->
-Bob
-
-On Tue, 9 Aug 2022 17:08:25 +0100
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
-
->
-On Tue, 9 Aug 2022 21:07:06 +0800
->
-Bobo WL <lmw.bobo@gmail.com> wrote:
->
->
-> Hi Jonathan
->
->
->
-> Thanks for your reply!
->
->
->
-> On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-> <Jonathan.Cameron@huawei.com> wrote:
->
-> >
->
-> > Probably not related to your problem, but there is a disconnect in QEMU /
->
-> > kernel assumptionsaround the presence of an HDM decoder when a HB only
->
-> > has a single root port. Spec allows it to be provided or not as an
->
-> > implementation choice.
->
-> > Kernel assumes it isn't provide. Qemu assumes it is.
->
-> >
->
-> > The temporary solution is to throw in a second root port on the HB and not
->
-> > connect anything to it.  Longer term I may special case this so that the
->
-> > particular
->
-> > decoder defaults to pass through settings in QEMU if there is only one
->
-> > root port.
->
-> >
->
->
->
-> You are right! After adding an extra HB in qemu, I can create a x1
->
-> region successfully.
->
-> But have some errors in Nvdimm:
->
->
->
-> [   74.925838] Unknown online node for memory at 0x10000000000, assuming
->
-> node 0
->
-> [   74.925846] Unknown target node for memory at 0x10000000000, assuming
->
-> node 0
->
-> [   74.927470] nd_region region0: nmem0: is disabled, failing probe
->
->
-Ah. I've seen this one, but not chased it down yet.  Was on my todo list to
->
-chase
->
-down. Once I reach this state I can verify the HDM Decode is correct which is
->
-what
->
-I've been using to test (Which wasn't true until earlier this week).
->
-I'm currently testing via devmem, more for historical reasons than because it
->
-makes
->
-that much sense anymore.
-*embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
-I'd forgotten that was still on the todo list. I don't think it will
-be particularly hard to do and will take a look in next few days.
-
-Very very indirectly this error is causing a driver probe fail that means that
-we hit a code path that has a rather odd looking check on NDD_LABELING.
-Should not have gotten near that path though - hence the problem is actually
-when we call cxl_pmem_get_config_data() and it returns an error because
-we haven't fully connected up the command in QEMU.
-
-Jonathan
-
-
->
->
->
->
-> And x4 region still failed with same errors, using latest cxl/preview
->
-> branch don't work.
->
-> I have picked "Two CXL emulation fixes" patches in qemu, still not working.
->
->
->
-> Bob
-
-On Thu, 11 Aug 2022 18:08:57 +0100
-Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
-
->
-On Tue, 9 Aug 2022 17:08:25 +0100
->
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
->
-> On Tue, 9 Aug 2022 21:07:06 +0800
->
-> Bobo WL <lmw.bobo@gmail.com> wrote:
->
->
->
-> > Hi Jonathan
->
-> >
->
-> > Thanks for your reply!
->
-> >
->
-> > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-> > <Jonathan.Cameron@huawei.com> wrote:
->
-> > >
->
-> > > Probably not related to your problem, but there is a disconnect in QEMU
->
-> > > /
->
-> > > kernel assumptionsaround the presence of an HDM decoder when a HB only
->
-> > > has a single root port. Spec allows it to be provided or not as an
->
-> > > implementation choice.
->
-> > > Kernel assumes it isn't provide. Qemu assumes it is.
->
-> > >
->
-> > > The temporary solution is to throw in a second root port on the HB and
->
-> > > not
->
-> > > connect anything to it.  Longer term I may special case this so that
->
-> > > the particular
->
-> > > decoder defaults to pass through settings in QEMU if there is only one
->
-> > > root port.
->
-> > >
->
-> >
->
-> > You are right! After adding an extra HB in qemu, I can create a x1
->
-> > region successfully.
->
-> > But have some errors in Nvdimm:
->
-> >
->
-> > [   74.925838] Unknown online node for memory at 0x10000000000, assuming
->
-> > node 0
->
-> > [   74.925846] Unknown target node for memory at 0x10000000000, assuming
->
-> > node 0
->
-> > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
->
->
->
-> Ah. I've seen this one, but not chased it down yet.  Was on my todo list to
->
-> chase
->
-> down. Once I reach this state I can verify the HDM Decode is correct which
->
-> is what
->
-> I've been using to test (Which wasn't true until earlier this week).
->
-> I'm currently testing via devmem, more for historical reasons than because
->
-> it makes
->
-> that much sense anymore.
->
->
-*embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
->
-I'd forgotten that was still on the todo list. I don't think it will
->
-be particularly hard to do and will take a look in next few days.
->
->
-Very very indirectly this error is causing a driver probe fail that means that
->
-we hit a code path that has a rather odd looking check on NDD_LABELING.
->
-Should not have gotten near that path though - hence the problem is actually
->
-when we call cxl_pmem_get_config_data() and it returns an error because
->
-we haven't fully connected up the command in QEMU.
-So a least one bug in QEMU. We were not supporting variable length payloads on 
-mailbox
-inputs (but were on outputs).  That hasn't mattered until we get to LSA writes.
-We just need to relax condition on the supplied length.
-
-diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
-index c352a935c4..fdda9529fe 100644
---- a/hw/cxl/cxl-mailbox-utils.c
-+++ b/hw/cxl/cxl-mailbox-utils.c
-@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
-     cxl_cmd = &cxl_cmd_set[set][cmd];
-     h = cxl_cmd->handler;
-     if (h) {
--        if (len == cxl_cmd->in) {
-+        if (len == cxl_cmd->in || !cxl_cmd->in) {
-             cxl_cmd->payload = cxl_dstate->mbox_reg_state +
-                 A_CXL_DEV_CMD_PAYLOAD;
-             ret = (*h)(cxl_cmd, cxl_dstate, &len);
-
-
-This lets the nvdimm/region probe fine, but I'm getting some issues with
-namespace capacity so I'll look at what is causing that next.
-Unfortunately I'm not that familiar with the driver/nvdimm side of things
-so it's take a while to figure out what kicks off what!
-
-Jonathan
-
->
->
-Jonathan
->
->
->
->
->
-> >
->
-> > And x4 region still failed with same errors, using latest cxl/preview
->
-> > branch don't work.
->
-> > I have picked "Two CXL emulation fixes" patches in qemu, still not
->
-> > working.
->
-> >
->
-> > Bob
->
->
-
-Jonathan Cameron wrote:
->
-On Thu, 11 Aug 2022 18:08:57 +0100
->
-Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
->
->
-> On Tue, 9 Aug 2022 17:08:25 +0100
->
-> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
->
->
-> > On Tue, 9 Aug 2022 21:07:06 +0800
->
-> > Bobo WL <lmw.bobo@gmail.com> wrote:
->
-> >
->
-> > > Hi Jonathan
->
-> > >
->
-> > > Thanks for your reply!
->
-> > >
->
-> > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-> > > <Jonathan.Cameron@huawei.com> wrote:
->
-> > > >
->
-> > > > Probably not related to your problem, but there is a disconnect in
->
-> > > > QEMU /
->
-> > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
->
-> > > > has a single root port. Spec allows it to be provided or not as an
->
-> > > > implementation choice.
->
-> > > > Kernel assumes it isn't provide. Qemu assumes it is.
->
-> > > >
->
-> > > > The temporary solution is to throw in a second root port on the HB
->
-> > > > and not
->
-> > > > connect anything to it.  Longer term I may special case this so that
->
-> > > > the particular
->
-> > > > decoder defaults to pass through settings in QEMU if there is only
->
-> > > > one root port.
->
-> > > >
->
-> > >
->
-> > > You are right! After adding an extra HB in qemu, I can create a x1
->
-> > > region successfully.
->
-> > > But have some errors in Nvdimm:
->
-> > >
->
-> > > [   74.925838] Unknown online node for memory at 0x10000000000,
->
-> > > assuming node 0
->
-> > > [   74.925846] Unknown target node for memory at 0x10000000000,
->
-> > > assuming node 0
->
-> > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
->
-> >
->
-> > Ah. I've seen this one, but not chased it down yet.  Was on my todo list
->
-> > to chase
->
-> > down. Once I reach this state I can verify the HDM Decode is correct
->
-> > which is what
->
-> > I've been using to test (Which wasn't true until earlier this week).
->
-> > I'm currently testing via devmem, more for historical reasons than
->
-> > because it makes
->
-> > that much sense anymore.
->
->
->
-> *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
->
-> I'd forgotten that was still on the todo list. I don't think it will
->
-> be particularly hard to do and will take a look in next few days.
->
->
->
-> Very very indirectly this error is causing a driver probe fail that means
->
-> that
->
-> we hit a code path that has a rather odd looking check on NDD_LABELING.
->
-> Should not have gotten near that path though - hence the problem is actually
->
-> when we call cxl_pmem_get_config_data() and it returns an error because
->
-> we haven't fully connected up the command in QEMU.
->
->
-So a least one bug in QEMU. We were not supporting variable length payloads
->
-on mailbox
->
-inputs (but were on outputs).  That hasn't mattered until we get to LSA
->
-writes.
->
-We just need to relax condition on the supplied length.
->
->
-diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
->
-index c352a935c4..fdda9529fe 100644
->
---- a/hw/cxl/cxl-mailbox-utils.c
->
-+++ b/hw/cxl/cxl-mailbox-utils.c
->
-@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
->
-cxl_cmd = &cxl_cmd_set[set][cmd];
->
-h = cxl_cmd->handler;
->
-if (h) {
->
--        if (len == cxl_cmd->in) {
->
-+        if (len == cxl_cmd->in || !cxl_cmd->in) {
->
-cxl_cmd->payload = cxl_dstate->mbox_reg_state +
->
-A_CXL_DEV_CMD_PAYLOAD;
->
-ret = (*h)(cxl_cmd, cxl_dstate, &len);
->
->
->
-This lets the nvdimm/region probe fine, but I'm getting some issues with
->
-namespace capacity so I'll look at what is causing that next.
->
-Unfortunately I'm not that familiar with the driver/nvdimm side of things
->
-so it's take a while to figure out what kicks off what!
-The whirlwind tour is that 'struct nd_region' instances that represent a
-persitent memory address range are composed of one more mappings of
-'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
-in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
-the dimm (if locked) and interrogating the label area to look for
-namespace labels.
-
-The label command calls are routed to the '->ndctl()' callback that was
-registered when the CXL nvdimm_bus_descriptor was created. That callback
-handles both 'bus' scope calls, currently none for CXL, and per nvdimm
-calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
-to CXL commands.
-
-The 'struct nvdimm' objects that the CXL side registers have the
-NDD_LABELING flag set which means that namespaces need to be explicitly
-created / provisioned from region capacity. Otherwise, if
-drivers/nvdimm/dimm.c does not find a namespace-label-index block then
-the region reverts to label-less mode and a default namespace equal to
-the size of the region is instantiated.
-
-If you are seeing small mismatches in namespace capacity then it may
-just be the fact that by default 'ndctl create-namespace' results in an
-'fsdax' mode namespace which just means that it is a block device where
-1.5% of the capacity is reserved for 'struct page' metadata. You should
-be able to see namespace capacity == region capacity by doing "ndctl
-create-namespace -m raw", and disable DAX operation.
-
-Hope that helps.
-
-On Fri, 12 Aug 2022 09:03:02 -0700
-Dan Williams <dan.j.williams@intel.com> wrote:
-
->
-Jonathan Cameron wrote:
->
-> On Thu, 11 Aug 2022 18:08:57 +0100
->
-> Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
->
->
->
-> > On Tue, 9 Aug 2022 17:08:25 +0100
->
-> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
-> >
->
-> > > On Tue, 9 Aug 2022 21:07:06 +0800
->
-> > > Bobo WL <lmw.bobo@gmail.com> wrote:
->
-> > >
->
-> > > > Hi Jonathan
->
-> > > >
->
-> > > > Thanks for your reply!
->
-> > > >
->
-> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-> > > > <Jonathan.Cameron@huawei.com> wrote:
->
-> > > > >
->
-> > > > > Probably not related to your problem, but there is a disconnect in
->
-> > > > > QEMU /
->
-> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB
->
-> > > > > only
->
-> > > > > has a single root port. Spec allows it to be provided or not as an
->
-> > > > > implementation choice.
->
-> > > > > Kernel assumes it isn't provide. Qemu assumes it is.
->
-> > > > >
->
-> > > > > The temporary solution is to throw in a second root port on the HB
->
-> > > > > and not
->
-> > > > > connect anything to it.  Longer term I may special case this so
->
-> > > > > that the particular
->
-> > > > > decoder defaults to pass through settings in QEMU if there is only
->
-> > > > > one root port.
->
-> > > > >
->
-> > > >
->
-> > > > You are right! After adding an extra HB in qemu, I can create a x1
->
-> > > > region successfully.
->
-> > > > But have some errors in Nvdimm:
->
-> > > >
->
-> > > > [   74.925838] Unknown online node for memory at 0x10000000000,
->
-> > > > assuming node 0
->
-> > > > [   74.925846] Unknown target node for memory at 0x10000000000,
->
-> > > > assuming node 0
->
-> > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
->
-> > > >
->
-> > >
->
-> > > Ah. I've seen this one, but not chased it down yet.  Was on my todo
->
-> > > list to chase
->
-> > > down. Once I reach this state I can verify the HDM Decode is correct
->
-> > > which is what
->
-> > > I've been using to test (Which wasn't true until earlier this week).
->
-> > > I'm currently testing via devmem, more for historical reasons than
->
-> > > because it makes
->
-> > > that much sense anymore.
->
-> >
->
-> > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
->
-> > I'd forgotten that was still on the todo list. I don't think it will
->
-> > be particularly hard to do and will take a look in next few days.
->
-> >
->
-> > Very very indirectly this error is causing a driver probe fail that means
->
-> > that
->
-> > we hit a code path that has a rather odd looking check on NDD_LABELING.
->
-> > Should not have gotten near that path though - hence the problem is
->
-> > actually
->
-> > when we call cxl_pmem_get_config_data() and it returns an error because
->
-> > we haven't fully connected up the command in QEMU.
->
->
->
-> So a least one bug in QEMU. We were not supporting variable length payloads
->
-> on mailbox
->
-> inputs (but were on outputs).  That hasn't mattered until we get to LSA
->
-> writes.
->
-> We just need to relax condition on the supplied length.
->
->
->
-> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
->
-> index c352a935c4..fdda9529fe 100644
->
-> --- a/hw/cxl/cxl-mailbox-utils.c
->
-> +++ b/hw/cxl/cxl-mailbox-utils.c
->
-> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
->
->      cxl_cmd = &cxl_cmd_set[set][cmd];
->
->      h = cxl_cmd->handler;
->
->      if (h) {
->
-> -        if (len == cxl_cmd->in) {
->
-> +        if (len == cxl_cmd->in || !cxl_cmd->in) {
->
->              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
->
->                  A_CXL_DEV_CMD_PAYLOAD;
->
->              ret = (*h)(cxl_cmd, cxl_dstate, &len);
->
->
->
->
->
-> This lets the nvdimm/region probe fine, but I'm getting some issues with
->
-> namespace capacity so I'll look at what is causing that next.
->
-> Unfortunately I'm not that familiar with the driver/nvdimm side of things
->
-> so it's take a while to figure out what kicks off what!
->
->
-The whirlwind tour is that 'struct nd_region' instances that represent a
->
-persitent memory address range are composed of one more mappings of
->
-'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
->
-in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
->
-the dimm (if locked) and interrogating the label area to look for
->
-namespace labels.
->
->
-The label command calls are routed to the '->ndctl()' callback that was
->
-registered when the CXL nvdimm_bus_descriptor was created. That callback
->
-handles both 'bus' scope calls, currently none for CXL, and per nvdimm
->
-calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
->
-to CXL commands.
->
->
-The 'struct nvdimm' objects that the CXL side registers have the
->
-NDD_LABELING flag set which means that namespaces need to be explicitly
->
-created / provisioned from region capacity. Otherwise, if
->
-drivers/nvdimm/dimm.c does not find a namespace-label-index block then
->
-the region reverts to label-less mode and a default namespace equal to
->
-the size of the region is instantiated.
->
->
-If you are seeing small mismatches in namespace capacity then it may
->
-just be the fact that by default 'ndctl create-namespace' results in an
->
-'fsdax' mode namespace which just means that it is a block device where
->
-1.5% of the capacity is reserved for 'struct page' metadata. You should
->
-be able to see namespace capacity == region capacity by doing "ndctl
->
-create-namespace -m raw", and disable DAX operation.
-Currently ndctl create-namespace crashes qemu ;)
-Which isn't ideal!
-
->
->
-Hope that helps.
-Got me looking at the right code. Thanks!
-
-Jonathan
-
-On Fri, 12 Aug 2022 17:15:09 +0100
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
-
->
-On Fri, 12 Aug 2022 09:03:02 -0700
->
-Dan Williams <dan.j.williams@intel.com> wrote:
->
->
-> Jonathan Cameron wrote:
->
-> > On Thu, 11 Aug 2022 18:08:57 +0100
->
-> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
->
-> >
->
-> > > On Tue, 9 Aug 2022 17:08:25 +0100
->
-> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
-> > >
->
-> > > > On Tue, 9 Aug 2022 21:07:06 +0800
->
-> > > > Bobo WL <lmw.bobo@gmail.com> wrote:
->
-> > > >
->
-> > > > > Hi Jonathan
->
-> > > > >
->
-> > > > > Thanks for your reply!
->
-> > > > >
->
-> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-> > > > > <Jonathan.Cameron@huawei.com> wrote:
->
-> > > > > >
->
-> > > > > > Probably not related to your problem, but there is a disconnect
->
-> > > > > > in QEMU /
->
-> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB
->
-> > > > > > only
->
-> > > > > > has a single root port. Spec allows it to be provided or not as
->
-> > > > > > an implementation choice.
->
-> > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
->
-> > > > > >
->
-> > > > > > The temporary solution is to throw in a second root port on the
->
-> > > > > > HB and not
->
-> > > > > > connect anything to it.  Longer term I may special case this so
->
-> > > > > > that the particular
->
-> > > > > > decoder defaults to pass through settings in QEMU if there is
->
-> > > > > > only one root port.
->
-> > > > > >
->
-> > > > >
->
-> > > > > You are right! After adding an extra HB in qemu, I can create a x1
->
-> > > > > region successfully.
->
-> > > > > But have some errors in Nvdimm:
->
-> > > > >
->
-> > > > > [   74.925838] Unknown online node for memory at 0x10000000000,
->
-> > > > > assuming node 0
->
-> > > > > [   74.925846] Unknown target node for memory at 0x10000000000,
->
-> > > > > assuming node 0
->
-> > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
->
-> > > > >
->
-> > > >
->
-> > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo
->
-> > > > list to chase
->
-> > > > down. Once I reach this state I can verify the HDM Decode is correct
->
-> > > > which is what
->
-> > > > I've been using to test (Which wasn't true until earlier this week).
->
-> > > > I'm currently testing via devmem, more for historical reasons than
->
-> > > > because it makes
->
-> > > > that much sense anymore.
->
-> > >
->
-> > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
->
-> > > I'd forgotten that was still on the todo list. I don't think it will
->
-> > > be particularly hard to do and will take a look in next few days.
->
-> > >
->
-> > > Very very indirectly this error is causing a driver probe fail that
->
-> > > means that
->
-> > > we hit a code path that has a rather odd looking check on NDD_LABELING.
->
-> > > Should not have gotten near that path though - hence the problem is
->
-> > > actually
->
-> > > when we call cxl_pmem_get_config_data() and it returns an error because
->
-> > > we haven't fully connected up the command in QEMU.
->
-> >
->
-> > So a least one bug in QEMU. We were not supporting variable length
->
-> > payloads on mailbox
->
-> > inputs (but were on outputs).  That hasn't mattered until we get to LSA
->
-> > writes.
->
-> > We just need to relax condition on the supplied length.
->
-> >
->
-> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
->
-> > index c352a935c4..fdda9529fe 100644
->
-> > --- a/hw/cxl/cxl-mailbox-utils.c
->
-> > +++ b/hw/cxl/cxl-mailbox-utils.c
->
-> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
->
-> >      cxl_cmd = &cxl_cmd_set[set][cmd];
->
-> >      h = cxl_cmd->handler;
->
-> >      if (h) {
->
-> > -        if (len == cxl_cmd->in) {
->
-> > +        if (len == cxl_cmd->in || !cxl_cmd->in) {
->
-> >              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
->
-> >                  A_CXL_DEV_CMD_PAYLOAD;
->
-> >              ret = (*h)(cxl_cmd, cxl_dstate, &len);
->
-> >
->
-> >
->
-> > This lets the nvdimm/region probe fine, but I'm getting some issues with
->
-> > namespace capacity so I'll look at what is causing that next.
->
-> > Unfortunately I'm not that familiar with the driver/nvdimm side of things
->
-> > so it's take a while to figure out what kicks off what!
->
->
->
-> The whirlwind tour is that 'struct nd_region' instances that represent a
->
-> persitent memory address range are composed of one more mappings of
->
-> 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
->
-> in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
->
-> the dimm (if locked) and interrogating the label area to look for
->
-> namespace labels.
->
->
->
-> The label command calls are routed to the '->ndctl()' callback that was
->
-> registered when the CXL nvdimm_bus_descriptor was created. That callback
->
-> handles both 'bus' scope calls, currently none for CXL, and per nvdimm
->
-> calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
->
-> to CXL commands.
->
->
->
-> The 'struct nvdimm' objects that the CXL side registers have the
->
-> NDD_LABELING flag set which means that namespaces need to be explicitly
->
-> created / provisioned from region capacity. Otherwise, if
->
-> drivers/nvdimm/dimm.c does not find a namespace-label-index block then
->
-> the region reverts to label-less mode and a default namespace equal to
->
-> the size of the region is instantiated.
->
->
->
-> If you are seeing small mismatches in namespace capacity then it may
->
-> just be the fact that by default 'ndctl create-namespace' results in an
->
-> 'fsdax' mode namespace which just means that it is a block device where
->
-> 1.5% of the capacity is reserved for 'struct page' metadata. You should
->
-> be able to see namespace capacity == region capacity by doing "ndctl
->
-> create-namespace -m raw", and disable DAX operation.
->
->
-Currently ndctl create-namespace crashes qemu ;)
->
-Which isn't ideal!
->
-Found a cause for this one.  Mailbox payload may be as small as 256 bytes.
-We have code in kernel sanity checking that output payload fits in the
-mailbox, but nothing on the input payload.  Symptom is that we write just
-off the end whatever size the payload is.  Note doing this shouldn't crash
-qemu - so I need to fix a range check somewhere.
-
-I think this is because cxl_pmem_get_config_size() returns the mailbox
-payload size as being the available LSA size, forgetting to remove the
-size of the headers on the set_lsa side of things.
-https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/pmem.c?h=next#n110
-I've hacked the max_payload to be -8
-
-Now we still don't succeed in creating the namespace, but bonus is it doesn't 
-crash any more.
-
-
-Jonathan
-
-
-
->
->
->
-> Hope that helps.
->
-Got me looking at the right code. Thanks!
->
->
-Jonathan
->
->
-
-On Mon, 15 Aug 2022 15:18:09 +0100
-Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
-
->
-On Fri, 12 Aug 2022 17:15:09 +0100
->
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
->
-> On Fri, 12 Aug 2022 09:03:02 -0700
->
-> Dan Williams <dan.j.williams@intel.com> wrote:
->
->
->
-> > Jonathan Cameron wrote:
->
-> > > On Thu, 11 Aug 2022 18:08:57 +0100
->
-> > > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
->
-> > >
->
-> > > > On Tue, 9 Aug 2022 17:08:25 +0100
->
-> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
-> > > >
->
-> > > > > On Tue, 9 Aug 2022 21:07:06 +0800
->
-> > > > > Bobo WL <lmw.bobo@gmail.com> wrote:
->
-> > > > >
->
-> > > > > > Hi Jonathan
->
-> > > > > >
->
-> > > > > > Thanks for your reply!
->
-> > > > > >
->
-> > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-> > > > > > <Jonathan.Cameron@huawei.com> wrote:
->
-> > > > > > >
->
-> > > > > > > Probably not related to your problem, but there is a disconnect
->
-> > > > > > > in QEMU /
->
-> > > > > > > kernel assumptionsaround the presence of an HDM decoder when a
->
-> > > > > > > HB only
->
-> > > > > > > has a single root port. Spec allows it to be provided or not as
->
-> > > > > > > an implementation choice.
->
-> > > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
->
-> > > > > > >
->
-> > > > > > > The temporary solution is to throw in a second root port on the
->
-> > > > > > > HB and not
->
-> > > > > > > connect anything to it.  Longer term I may special case this so
->
-> > > > > > > that the particular
->
-> > > > > > > decoder defaults to pass through settings in QEMU if there is
->
-> > > > > > > only one root port.
->
-> > > > > > >
->
-> > > > > >
->
-> > > > > > You are right! After adding an extra HB in qemu, I can create a x1
->
-> > > > > > region successfully.
->
-> > > > > > But have some errors in Nvdimm:
->
-> > > > > >
->
-> > > > > > [   74.925838] Unknown online node for memory at 0x10000000000,
->
-> > > > > > assuming node 0
->
-> > > > > > [   74.925846] Unknown target node for memory at 0x10000000000,
->
-> > > > > > assuming node 0
->
-> > > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing
->
-> > > > > > probe
->
-> > > > >
->
-> > > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo
->
-> > > > > list to chase
->
-> > > > > down. Once I reach this state I can verify the HDM Decode is
->
-> > > > > correct which is what
->
-> > > > > I've been using to test (Which wasn't true until earlier this
->
-> > > > > week).
->
-> > > > > I'm currently testing via devmem, more for historical reasons than
->
-> > > > > because it makes
->
-> > > > > that much sense anymore.
->
-> > > >
->
-> > > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
->
-> > > > I'd forgotten that was still on the todo list. I don't think it will
->
-> > > > be particularly hard to do and will take a look in next few days.
->
-> > > >
->
-> > > > Very very indirectly this error is causing a driver probe fail that
->
-> > > > means that
->
-> > > > we hit a code path that has a rather odd looking check on
->
-> > > > NDD_LABELING.
->
-> > > > Should not have gotten near that path though - hence the problem is
->
-> > > > actually
->
-> > > > when we call cxl_pmem_get_config_data() and it returns an error
->
-> > > > because
->
-> > > > we haven't fully connected up the command in QEMU.
->
-> > >
->
-> > > So a least one bug in QEMU. We were not supporting variable length
->
-> > > payloads on mailbox
->
-> > > inputs (but were on outputs).  That hasn't mattered until we get to LSA
->
-> > > writes.
->
-> > > We just need to relax condition on the supplied length.
->
-> > >
->
-> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
->
-> > > index c352a935c4..fdda9529fe 100644
->
-> > > --- a/hw/cxl/cxl-mailbox-utils.c
->
-> > > +++ b/hw/cxl/cxl-mailbox-utils.c
->
-> > > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
->
-> > >      cxl_cmd = &cxl_cmd_set[set][cmd];
->
-> > >      h = cxl_cmd->handler;
->
-> > >      if (h) {
->
-> > > -        if (len == cxl_cmd->in) {
->
-> > > +        if (len == cxl_cmd->in || !cxl_cmd->in) {
->
-> > >              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
->
-> > >                  A_CXL_DEV_CMD_PAYLOAD;
->
-> > >              ret = (*h)(cxl_cmd, cxl_dstate, &len);
->
-> > >
->
-> > >
->
-> > > This lets the nvdimm/region probe fine, but I'm getting some issues with
->
-> > > namespace capacity so I'll look at what is causing that next.
->
-> > > Unfortunately I'm not that familiar with the driver/nvdimm side of
->
-> > > things
->
-> > > so it's take a while to figure out what kicks off what!
->
-> >
->
-> > The whirlwind tour is that 'struct nd_region' instances that represent a
->
-> > persitent memory address range are composed of one more mappings of
->
-> > 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver
->
-> > in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking
->
-> > the dimm (if locked) and interrogating the label area to look for
->
-> > namespace labels.
->
-> >
->
-> > The label command calls are routed to the '->ndctl()' callback that was
->
-> > registered when the CXL nvdimm_bus_descriptor was created. That callback
->
-> > handles both 'bus' scope calls, currently none for CXL, and per nvdimm
->
-> > calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands
->
-> > to CXL commands.
->
-> >
->
-> > The 'struct nvdimm' objects that the CXL side registers have the
->
-> > NDD_LABELING flag set which means that namespaces need to be explicitly
->
-> > created / provisioned from region capacity. Otherwise, if
->
-> > drivers/nvdimm/dimm.c does not find a namespace-label-index block then
->
-> > the region reverts to label-less mode and a default namespace equal to
->
-> > the size of the region is instantiated.
->
-> >
->
-> > If you are seeing small mismatches in namespace capacity then it may
->
-> > just be the fact that by default 'ndctl create-namespace' results in an
->
-> > 'fsdax' mode namespace which just means that it is a block device where
->
-> > 1.5% of the capacity is reserved for 'struct page' metadata. You should
->
-> > be able to see namespace capacity == region capacity by doing "ndctl
->
-> > create-namespace -m raw", and disable DAX operation.
->
->
->
-> Currently ndctl create-namespace crashes qemu ;)
->
-> Which isn't ideal!
->
->
->
->
-Found a cause for this one.  Mailbox payload may be as small as 256 bytes.
->
-We have code in kernel sanity checking that output payload fits in the
->
-mailbox, but nothing on the input payload.  Symptom is that we write just
->
-off the end whatever size the payload is.  Note doing this shouldn't crash
->
-qemu - so I need to fix a range check somewhere.
->
->
-I think this is because cxl_pmem_get_config_size() returns the mailbox
->
-payload size as being the available LSA size, forgetting to remove the
->
-size of the headers on the set_lsa side of things.
->
-https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/pmem.c?h=next#n110
->
->
-I've hacked the max_payload to be -8
->
->
-Now we still don't succeed in creating the namespace, but bonus is it doesn't
->
-crash any more.
-In the interests of defensive / correct handling from QEMU I took a
-look into why it was crashing.  Turns out that providing a NULL write callback 
-for
-the memory device region (that the above overlarge write was spilling into) 
-isn't
-a safe thing to do.  Needs a stub. Oops.
-
-On plus side we might never have noticed this was going wrong without the crash
-*silver lining in every cloud*
-
-Fix to follow...
-
-Jonathan
-
-
->
->
->
-Jonathan
->
->
->
->
-> >
->
-> > Hope that helps.
->
-> Got me looking at the right code. Thanks!
->
->
->
-> Jonathan
->
->
->
->
->
->
-
-On Mon, 15 Aug 2022 at 15:55, Jonathan Cameron via <qemu-arm@nongnu.org> wrote:
->
-In the interests of defensive / correct handling from QEMU I took a
->
-look into why it was crashing.  Turns out that providing a NULL write
->
-callback for
->
-the memory device region (that the above overlarge write was spilling into)
->
-isn't
->
-a safe thing to do.  Needs a stub. Oops.
-Yeah. We've talked before about adding an assert so that that kind of
-"missing function" bug is caught at device creation rather than only
-if the guest tries to access the device, but we never quite got around
-to it...
-
--- PMM
-
-On Fri, 12 Aug 2022 16:44:03 +0100
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
-
->
-On Thu, 11 Aug 2022 18:08:57 +0100
->
-Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
->
->
-> On Tue, 9 Aug 2022 17:08:25 +0100
->
-> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
->
->
-> > On Tue, 9 Aug 2022 21:07:06 +0800
->
-> > Bobo WL <lmw.bobo@gmail.com> wrote:
->
-> >
->
-> > > Hi Jonathan
->
-> > >
->
-> > > Thanks for your reply!
->
-> > >
->
-> > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-> > > <Jonathan.Cameron@huawei.com> wrote:
->
-> > > >
->
-> > > > Probably not related to your problem, but there is a disconnect in
->
-> > > > QEMU /
->
-> > > > kernel assumptionsaround the presence of an HDM decoder when a HB only
->
-> > > > has a single root port. Spec allows it to be provided or not as an
->
-> > > > implementation choice.
->
-> > > > Kernel assumes it isn't provide. Qemu assumes it is.
->
-> > > >
->
-> > > > The temporary solution is to throw in a second root port on the HB
->
-> > > > and not
->
-> > > > connect anything to it.  Longer term I may special case this so that
->
-> > > > the particular
->
-> > > > decoder defaults to pass through settings in QEMU if there is only
->
-> > > > one root port.
->
-> > > >
->
-> > >
->
-> > > You are right! After adding an extra HB in qemu, I can create a x1
->
-> > > region successfully.
->
-> > > But have some errors in Nvdimm:
->
-> > >
->
-> > > [   74.925838] Unknown online node for memory at 0x10000000000,
->
-> > > assuming node 0
->
-> > > [   74.925846] Unknown target node for memory at 0x10000000000,
->
-> > > assuming node 0
->
-> > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
->
-> > >
->
-> >
->
-> > Ah. I've seen this one, but not chased it down yet.  Was on my todo list
->
-> > to chase
->
-> > down. Once I reach this state I can verify the HDM Decode is correct
->
-> > which is what
->
-> > I've been using to test (Which wasn't true until earlier this week).
->
-> > I'm currently testing via devmem, more for historical reasons than
->
-> > because it makes
->
-> > that much sense anymore.
->
->
->
-> *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
->
-> I'd forgotten that was still on the todo list. I don't think it will
->
-> be particularly hard to do and will take a look in next few days.
->
->
->
-> Very very indirectly this error is causing a driver probe fail that means
->
-> that
->
-> we hit a code path that has a rather odd looking check on NDD_LABELING.
->
-> Should not have gotten near that path though - hence the problem is actually
->
-> when we call cxl_pmem_get_config_data() and it returns an error because
->
-> we haven't fully connected up the command in QEMU.
->
->
-So a least one bug in QEMU. We were not supporting variable length payloads
->
-on mailbox
->
-inputs (but were on outputs).  That hasn't mattered until we get to LSA
->
-writes.
->
-We just need to relax condition on the supplied length.
->
->
-diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
->
-index c352a935c4..fdda9529fe 100644
->
---- a/hw/cxl/cxl-mailbox-utils.c
->
-+++ b/hw/cxl/cxl-mailbox-utils.c
->
-@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
->
-cxl_cmd = &cxl_cmd_set[set][cmd];
->
-h = cxl_cmd->handler;
->
-if (h) {
->
--        if (len == cxl_cmd->in) {
->
-+        if (len == cxl_cmd->in || !cxl_cmd->in) {
-Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.
-
-With that fixed we hit new fun paths - after some errors we get the
-worrying - not totally sure but looks like a failure on an error cleanup.
-I'll chase down the error source, but even then this is probably triggerable by
-hardware problem or similar.  Some bonus prints in here from me chasing
-error paths, but it's otherwise just cxl/next + the fix I posted earlier today.
-
-[   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
-[   69.920108] nd_region_probe
-[   69.920623] ------------[ cut here ]------------
-[   69.920675] refcount_t: addition on 0; use-after-free.
-[   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 
-refcount_warn_saturate+0xa0/0x144
-[   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi 
-cxl_core
-[   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399
-[   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
-[   69.931482] Workqueue: events_unbound async_run_entry_fn
-[   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
-[   69.934023] pc : refcount_warn_saturate+0xa0/0x144
-[   69.935161] lr : refcount_warn_saturate+0xa0/0x144
-[   69.936541] sp : ffff80000890b960
-[   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: 0000000000000000
-[   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: 0000000000000000
-[   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: ffff0000c5254800
-[   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: ffffffffffffffff
-[   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: 0000000000000000
-[   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375
-[   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffffa54a8f63d288
-[   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 00000000fffff31e
-[   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : ffff5ab66e5ef000
-root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 : 
-0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
-[   69.957098] Call trace:
-[   69.957959]  refcount_warn_saturate+0xa0/0x144
-[   69.958773]  get_ndd+0x5c/0x80
-[   69.959294]  nd_region_register_namespaces+0xe4/0xe90
-[   69.960253]  nd_region_probe+0x100/0x290
-[   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
-[   69.962087]  really_probe+0x19c/0x3f0
-[   69.962620]  __driver_probe_device+0x11c/0x190
-[   69.963258]  driver_probe_device+0x44/0xf4
-[   69.963773]  __device_attach_driver+0xa4/0x140
-[   69.964471]  bus_for_each_drv+0x84/0xe0
-[   69.965068]  __device_attach+0xb0/0x1f0
-[   69.966101]  device_initial_probe+0x20/0x30
-[   69.967142]  bus_probe_device+0xa4/0xb0
-[   69.968104]  device_add+0x3e8/0x910
-[   69.969111]  nd_async_device_register+0x24/0x74
-[   69.969928]  async_run_entry_fn+0x40/0x150
-[   69.970725]  process_one_work+0x1dc/0x450
-[   69.971796]  worker_thread+0x154/0x450
-[   69.972700]  kthread+0x118/0x120
-[   69.974141]  ret_from_fork+0x10/0x20
-[   69.975141] ---[ end trace 0000000000000000 ]---
-[   70.117887] Into nd_namespace_pmem_set_resource()
-
->
-cxl_cmd->payload = cxl_dstate->mbox_reg_state +
->
-A_CXL_DEV_CMD_PAYLOAD;
->
-ret = (*h)(cxl_cmd, cxl_dstate, &len);
->
->
->
-This lets the nvdimm/region probe fine, but I'm getting some issues with
->
-namespace capacity so I'll look at what is causing that next.
->
-Unfortunately I'm not that familiar with the driver/nvdimm side of things
->
-so it's take a while to figure out what kicks off what!
->
->
-Jonathan
->
->
->
->
-> Jonathan
->
->
->
->
->
-> >
->
-> > >
->
-> > > And x4 region still failed with same errors, using latest cxl/preview
->
-> > > branch don't work.
->
-> > > I have picked "Two CXL emulation fixes" patches in qemu, still not
->
-> > > working.
->
-> > >
->
-> > > Bob
->
->
->
->
->
-
-On Mon, 15 Aug 2022 18:04:44 +0100
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
-
->
-On Fri, 12 Aug 2022 16:44:03 +0100
->
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
->
-> On Thu, 11 Aug 2022 18:08:57 +0100
->
-> Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
->
->
->
-> > On Tue, 9 Aug 2022 17:08:25 +0100
->
-> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
-> >
->
-> > > On Tue, 9 Aug 2022 21:07:06 +0800
->
-> > > Bobo WL <lmw.bobo@gmail.com> wrote:
->
-> > >
->
-> > > > Hi Jonathan
->
-> > > >
->
-> > > > Thanks for your reply!
->
-> > > >
->
-> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-> > > > <Jonathan.Cameron@huawei.com> wrote:
->
-> > > > >
->
-> > > > > Probably not related to your problem, but there is a disconnect in
->
-> > > > > QEMU /
->
-> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB
->
-> > > > > only
->
-> > > > > has a single root port. Spec allows it to be provided or not as an
->
-> > > > > implementation choice.
->
-> > > > > Kernel assumes it isn't provide. Qemu assumes it is.
->
-> > > > >
->
-> > > > > The temporary solution is to throw in a second root port on the HB
->
-> > > > > and not
->
-> > > > > connect anything to it.  Longer term I may special case this so
->
-> > > > > that the particular
->
-> > > > > decoder defaults to pass through settings in QEMU if there is only
->
-> > > > > one root port.
->
-> > > > >
->
-> > > >
->
-> > > > You are right! After adding an extra HB in qemu, I can create a x1
->
-> > > > region successfully.
->
-> > > > But have some errors in Nvdimm:
->
-> > > >
->
-> > > > [   74.925838] Unknown online node for memory at 0x10000000000,
->
-> > > > assuming node 0
->
-> > > > [   74.925846] Unknown target node for memory at 0x10000000000,
->
-> > > > assuming node 0
->
-> > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
->
-> > > >
->
-> > >
->
-> > > Ah. I've seen this one, but not chased it down yet.  Was on my todo
->
-> > > list to chase
->
-> > > down. Once I reach this state I can verify the HDM Decode is correct
->
-> > > which is what
->
-> > > I've been using to test (Which wasn't true until earlier this week).
->
-> > > I'm currently testing via devmem, more for historical reasons than
->
-> > > because it makes
->
-> > > that much sense anymore.
->
-> >
->
-> > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
->
-> > I'd forgotten that was still on the todo list. I don't think it will
->
-> > be particularly hard to do and will take a look in next few days.
->
-> >
->
-> > Very very indirectly this error is causing a driver probe fail that means
->
-> > that
->
-> > we hit a code path that has a rather odd looking check on NDD_LABELING.
->
-> > Should not have gotten near that path though - hence the problem is
->
-> > actually
->
-> > when we call cxl_pmem_get_config_data() and it returns an error because
->
-> > we haven't fully connected up the command in QEMU.
->
->
->
-> So a least one bug in QEMU. We were not supporting variable length payloads
->
-> on mailbox
->
-> inputs (but were on outputs).  That hasn't mattered until we get to LSA
->
-> writes.
->
-> We just need to relax condition on the supplied length.
->
->
->
-> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
->
-> index c352a935c4..fdda9529fe 100644
->
-> --- a/hw/cxl/cxl-mailbox-utils.c
->
-> +++ b/hw/cxl/cxl-mailbox-utils.c
->
-> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
->
->      cxl_cmd = &cxl_cmd_set[set][cmd];
->
->      h = cxl_cmd->handler;
->
->      if (h) {
->
-> -        if (len == cxl_cmd->in) {
->
-> +        if (len == cxl_cmd->in || !cxl_cmd->in) {
->
-Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.
-Cause of the error is a failure in GET_LSA.
-Reason, payload length is wrong in QEMU but was hidden previously by my wrong
-fix here.  Probably still a good idea to inject an error in GET_LSA and chase
-down the refcount issue.
-
-
-diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
-index fdda9529fe..e8565fbd6e 100644
---- a/hw/cxl/cxl-mailbox-utils.c
-+++ b/hw/cxl/cxl-mailbox-utils.c
-@@ -489,7 +489,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
-         cmd_identify_memory_device, 0, 0 },
-     [CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO",
-         cmd_ccls_get_partition_info, 0, 0 },
--    [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 0, 0 },
-+    [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 8, 0 },
-     [CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
-         ~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
-     [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST",
-@@ -510,12 +510,13 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
-     cxl_cmd = &cxl_cmd_set[set][cmd];
-     h = cxl_cmd->handler;
-     if (h) {
--        if (len == cxl_cmd->in || !cxl_cmd->in) {
-+        if (len == cxl_cmd->in || cxl_cmd->in == ~0) {
-             cxl_cmd->payload = cxl_dstate->mbox_reg_state +
-                 A_CXL_DEV_CMD_PAYLOAD;
-
-And woot, we get a namespace in the LSA :)
-
-I'll post QEMU fixes in next day or two.  Kernel side now seems more or less
-fine be it with suspicious refcount underflow.
-
->
->
-With that fixed we hit new fun paths - after some errors we get the
->
-worrying - not totally sure but looks like a failure on an error cleanup.
->
-I'll chase down the error source, but even then this is probably triggerable
->
-by
->
-hardware problem or similar.  Some bonus prints in here from me chasing
->
-error paths, but it's otherwise just cxl/next + the fix I posted earlier
->
-today.
->
->
-[   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
->
-[   69.920108] nd_region_probe
->
-[   69.920623] ------------[ cut here ]------------
->
-[   69.920675] refcount_t: addition on 0; use-after-free.
->
-[   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25
->
-refcount_warn_saturate+0xa0/0x144
->
-[   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi
->
-cxl_core
->
-[   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399
->
-[   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
->
-[   69.931482] Workqueue: events_unbound async_run_entry_fn
->
-[   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
->
-[   69.934023] pc : refcount_warn_saturate+0xa0/0x144
->
-[   69.935161] lr : refcount_warn_saturate+0xa0/0x144
->
-[   69.936541] sp : ffff80000890b960
->
-[   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27:
->
-0000000000000000
->
-[   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24:
->
-0000000000000000
->
-[   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21:
->
-ffff0000c5254800
->
-[   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18:
->
-ffffffffffffffff
->
-[   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15:
->
-0000000000000000
->
-[   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12:
->
-657466612d657375
->
-[   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 :
->
-ffffa54a8f63d288
->
-[   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 :
->
-00000000fffff31e
->
-[   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 :
->
-ffff5ab66e5ef000
->
-root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 :
->
-0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
->
-[   69.957098] Call trace:
->
-[   69.957959]  refcount_warn_saturate+0xa0/0x144
->
-[   69.958773]  get_ndd+0x5c/0x80
->
-[   69.959294]  nd_region_register_namespaces+0xe4/0xe90
->
-[   69.960253]  nd_region_probe+0x100/0x290
->
-[   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
->
-[   69.962087]  really_probe+0x19c/0x3f0
->
-[   69.962620]  __driver_probe_device+0x11c/0x190
->
-[   69.963258]  driver_probe_device+0x44/0xf4
->
-[   69.963773]  __device_attach_driver+0xa4/0x140
->
-[   69.964471]  bus_for_each_drv+0x84/0xe0
->
-[   69.965068]  __device_attach+0xb0/0x1f0
->
-[   69.966101]  device_initial_probe+0x20/0x30
->
-[   69.967142]  bus_probe_device+0xa4/0xb0
->
-[   69.968104]  device_add+0x3e8/0x910
->
-[   69.969111]  nd_async_device_register+0x24/0x74
->
-[   69.969928]  async_run_entry_fn+0x40/0x150
->
-[   69.970725]  process_one_work+0x1dc/0x450
->
-[   69.971796]  worker_thread+0x154/0x450
->
-[   69.972700]  kthread+0x118/0x120
->
-[   69.974141]  ret_from_fork+0x10/0x20
->
-[   69.975141] ---[ end trace 0000000000000000 ]---
->
-[   70.117887] Into nd_namespace_pmem_set_resource()
->
->
->              cxl_cmd->payload = cxl_dstate->mbox_reg_state +
->
->                  A_CXL_DEV_CMD_PAYLOAD;
->
->              ret = (*h)(cxl_cmd, cxl_dstate, &len);
->
->
->
->
->
-> This lets the nvdimm/region probe fine, but I'm getting some issues with
->
-> namespace capacity so I'll look at what is causing that next.
->
-> Unfortunately I'm not that familiar with the driver/nvdimm side of things
->
-> so it's take a while to figure out what kicks off what!
->
->
->
-> Jonathan
->
->
->
-> >
->
-> > Jonathan
->
-> >
->
-> >
->
-> > >
->
-> > > >
->
-> > > > And x4 region still failed with same errors, using latest cxl/preview
->
-> > > > branch don't work.
->
-> > > > I have picked "Two CXL emulation fixes" patches in qemu, still not
->
-> > > > working.
->
-> > > >
->
-> > > > Bob
->
-> >
->
-> >
->
->
->
-
-Jonathan Cameron wrote:
->
-On Fri, 12 Aug 2022 16:44:03 +0100
->
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
->
-> On Thu, 11 Aug 2022 18:08:57 +0100
->
-> Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
->
->
->
-> > On Tue, 9 Aug 2022 17:08:25 +0100
->
-> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
-> >
->
-> > > On Tue, 9 Aug 2022 21:07:06 +0800
->
-> > > Bobo WL <lmw.bobo@gmail.com> wrote:
->
-> > >
->
-> > > > Hi Jonathan
->
-> > > >
->
-> > > > Thanks for your reply!
->
-> > > >
->
-> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-> > > > <Jonathan.Cameron@huawei.com> wrote:
->
-> > > > >
->
-> > > > > Probably not related to your problem, but there is a disconnect in
->
-> > > > > QEMU /
->
-> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB
->
-> > > > > only
->
-> > > > > has a single root port. Spec allows it to be provided or not as an
->
-> > > > > implementation choice.
->
-> > > > > Kernel assumes it isn't provide. Qemu assumes it is.
->
-> > > > >
->
-> > > > > The temporary solution is to throw in a second root port on the HB
->
-> > > > > and not
->
-> > > > > connect anything to it.  Longer term I may special case this so
->
-> > > > > that the particular
->
-> > > > > decoder defaults to pass through settings in QEMU if there is only
->
-> > > > > one root port.
->
-> > > > >
->
-> > > >
->
-> > > > You are right! After adding an extra HB in qemu, I can create a x1
->
-> > > > region successfully.
->
-> > > > But have some errors in Nvdimm:
->
-> > > >
->
-> > > > [   74.925838] Unknown online node for memory at 0x10000000000,
->
-> > > > assuming node 0
->
-> > > > [   74.925846] Unknown target node for memory at 0x10000000000,
->
-> > > > assuming node 0
->
-> > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
->
-> > > >
->
-> > >
->
-> > > Ah. I've seen this one, but not chased it down yet.  Was on my todo
->
-> > > list to chase
->
-> > > down. Once I reach this state I can verify the HDM Decode is correct
->
-> > > which is what
->
-> > > I've been using to test (Which wasn't true until earlier this week).
->
-> > > I'm currently testing via devmem, more for historical reasons than
->
-> > > because it makes
->
-> > > that much sense anymore.
->
-> >
->
-> > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
->
-> > I'd forgotten that was still on the todo list. I don't think it will
->
-> > be particularly hard to do and will take a look in next few days.
->
-> >
->
-> > Very very indirectly this error is causing a driver probe fail that means
->
-> > that
->
-> > we hit a code path that has a rather odd looking check on NDD_LABELING.
->
-> > Should not have gotten near that path though - hence the problem is
->
-> > actually
->
-> > when we call cxl_pmem_get_config_data() and it returns an error because
->
-> > we haven't fully connected up the command in QEMU.
->
->
->
-> So a least one bug in QEMU. We were not supporting variable length payloads
->
-> on mailbox
->
-> inputs (but were on outputs).  That hasn't mattered until we get to LSA
->
-> writes.
->
-> We just need to relax condition on the supplied length.
->
->
->
-> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
->
-> index c352a935c4..fdda9529fe 100644
->
-> --- a/hw/cxl/cxl-mailbox-utils.c
->
-> +++ b/hw/cxl/cxl-mailbox-utils.c
->
-> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
->
->      cxl_cmd = &cxl_cmd_set[set][cmd];
->
->      h = cxl_cmd->handler;
->
->      if (h) {
->
-> -        if (len == cxl_cmd->in) {
->
-> +        if (len == cxl_cmd->in || !cxl_cmd->in) {
->
-Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.
->
->
-With that fixed we hit new fun paths - after some errors we get the
->
-worrying - not totally sure but looks like a failure on an error cleanup.
->
-I'll chase down the error source, but even then this is probably triggerable
->
-by
->
-hardware problem or similar.  Some bonus prints in here from me chasing
->
-error paths, but it's otherwise just cxl/next + the fix I posted earlier
->
-today.
-One of the scenarios that I cannot rule out is nvdimm_probe() racing
-nd_region_probe(), but given all the work it takes to create a region I
-suspect all the nvdimm_probe() work to have completed...
-
-It is at least one potentially wrong hypothesis that needs to be chased
-down.
-
->
->
-[   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
->
-[   69.920108] nd_region_probe
->
-[   69.920623] ------------[ cut here ]------------
->
-[   69.920675] refcount_t: addition on 0; use-after-free.
->
-[   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25
->
-refcount_warn_saturate+0xa0/0x144
->
-[   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi
->
-cxl_core
->
-[   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399
->
-[   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
->
-[   69.931482] Workqueue: events_unbound async_run_entry_fn
->
-[   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
->
-[   69.934023] pc : refcount_warn_saturate+0xa0/0x144
->
-[   69.935161] lr : refcount_warn_saturate+0xa0/0x144
->
-[   69.936541] sp : ffff80000890b960
->
-[   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27:
->
-0000000000000000
->
-[   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24:
->
-0000000000000000
->
-[   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21:
->
-ffff0000c5254800
->
-[   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18:
->
-ffffffffffffffff
->
-[   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15:
->
-0000000000000000
->
-[   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12:
->
-657466612d657375
->
-[   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 :
->
-ffffa54a8f63d288
->
-[   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 :
->
-00000000fffff31e
->
-[   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 :
->
-ffff5ab66e5ef000
->
-root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 :
->
-0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
->
-[   69.957098] Call trace:
->
-[   69.957959]  refcount_warn_saturate+0xa0/0x144
->
-[   69.958773]  get_ndd+0x5c/0x80
->
-[   69.959294]  nd_region_register_namespaces+0xe4/0xe90
->
-[   69.960253]  nd_region_probe+0x100/0x290
->
-[   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
->
-[   69.962087]  really_probe+0x19c/0x3f0
->
-[   69.962620]  __driver_probe_device+0x11c/0x190
->
-[   69.963258]  driver_probe_device+0x44/0xf4
->
-[   69.963773]  __device_attach_driver+0xa4/0x140
->
-[   69.964471]  bus_for_each_drv+0x84/0xe0
->
-[   69.965068]  __device_attach+0xb0/0x1f0
->
-[   69.966101]  device_initial_probe+0x20/0x30
->
-[   69.967142]  bus_probe_device+0xa4/0xb0
->
-[   69.968104]  device_add+0x3e8/0x910
->
-[   69.969111]  nd_async_device_register+0x24/0x74
->
-[   69.969928]  async_run_entry_fn+0x40/0x150
->
-[   69.970725]  process_one_work+0x1dc/0x450
->
-[   69.971796]  worker_thread+0x154/0x450
->
-[   69.972700]  kthread+0x118/0x120
->
-[   69.974141]  ret_from_fork+0x10/0x20
->
-[   69.975141] ---[ end trace 0000000000000000 ]---
->
-[   70.117887] Into nd_namespace_pmem_set_resource()
-
-On Mon, 15 Aug 2022 15:55:15 -0700
-Dan Williams <dan.j.williams@intel.com> wrote:
-
->
-Jonathan Cameron wrote:
->
-> On Fri, 12 Aug 2022 16:44:03 +0100
->
-> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
->
->
-> > On Thu, 11 Aug 2022 18:08:57 +0100
->
-> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
->
-> >
->
-> > > On Tue, 9 Aug 2022 17:08:25 +0100
->
-> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
-> > >
->
-> > > > On Tue, 9 Aug 2022 21:07:06 +0800
->
-> > > > Bobo WL <lmw.bobo@gmail.com> wrote:
->
-> > > >
->
-> > > > > Hi Jonathan
->
-> > > > >
->
-> > > > > Thanks for your reply!
->
-> > > > >
->
-> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron
->
-> > > > > <Jonathan.Cameron@huawei.com> wrote:
->
-> > > > > >
->
-> > > > > > Probably not related to your problem, but there is a disconnect
->
-> > > > > > in QEMU /
->
-> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB
->
-> > > > > > only
->
-> > > > > > has a single root port. Spec allows it to be provided or not as
->
-> > > > > > an implementation choice.
->
-> > > > > > Kernel assumes it isn't provide. Qemu assumes it is.
->
-> > > > > >
->
-> > > > > > The temporary solution is to throw in a second root port on the
->
-> > > > > > HB and not
->
-> > > > > > connect anything to it.  Longer term I may special case this so
->
-> > > > > > that the particular
->
-> > > > > > decoder defaults to pass through settings in QEMU if there is
->
-> > > > > > only one root port.
->
-> > > > > >
->
-> > > > >
->
-> > > > > You are right! After adding an extra HB in qemu, I can create a x1
->
-> > > > > region successfully.
->
-> > > > > But have some errors in Nvdimm:
->
-> > > > >
->
-> > > > > [   74.925838] Unknown online node for memory at 0x10000000000,
->
-> > > > > assuming node 0
->
-> > > > > [   74.925846] Unknown target node for memory at 0x10000000000,
->
-> > > > > assuming node 0
->
-> > > > > [   74.927470] nd_region region0: nmem0: is disabled, failing probe
->
-> > > > >
->
-> > > >
->
-> > > > Ah. I've seen this one, but not chased it down yet.  Was on my todo
->
-> > > > list to chase
->
-> > > > down. Once I reach this state I can verify the HDM Decode is correct
->
-> > > > which is what
->
-> > > > I've been using to test (Which wasn't true until earlier this week).
->
-> > > > I'm currently testing via devmem, more for historical reasons than
->
-> > > > because it makes
->
-> > > > that much sense anymore.
->
-> > >
->
-> > > *embarassed cough*.  We haven't fully hooked the LSA up in qemu yet.
->
-> > > I'd forgotten that was still on the todo list. I don't think it will
->
-> > > be particularly hard to do and will take a look in next few days.
->
-> > >
->
-> > > Very very indirectly this error is causing a driver probe fail that
->
-> > > means that
->
-> > > we hit a code path that has a rather odd looking check on NDD_LABELING.
->
-> > > Should not have gotten near that path though - hence the problem is
->
-> > > actually
->
-> > > when we call cxl_pmem_get_config_data() and it returns an error because
->
-> > > we haven't fully connected up the command in QEMU.
->
-> >
->
-> > So a least one bug in QEMU. We were not supporting variable length
->
-> > payloads on mailbox
->
-> > inputs (but were on outputs).  That hasn't mattered until we get to LSA
->
-> > writes.
->
-> > We just need to relax condition on the supplied length.
->
-> >
->
-> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
->
-> > index c352a935c4..fdda9529fe 100644
->
-> > --- a/hw/cxl/cxl-mailbox-utils.c
->
-> > +++ b/hw/cxl/cxl-mailbox-utils.c
->
-> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
->
-> >      cxl_cmd = &cxl_cmd_set[set][cmd];
->
-> >      h = cxl_cmd->handler;
->
-> >      if (h) {
->
-> > -        if (len == cxl_cmd->in) {
->
-> > +        if (len == cxl_cmd->in || !cxl_cmd->in) {
->
-> Fix is wrong as we use ~0 as the placeholder for variable payload, not 0.
->
->
->
-> With that fixed we hit new fun paths - after some errors we get the
->
-> worrying - not totally sure but looks like a failure on an error cleanup.
->
-> I'll chase down the error source, but even then this is probably
->
-> triggerable by
->
-> hardware problem or similar.  Some bonus prints in here from me chasing
->
-> error paths, but it's otherwise just cxl/next + the fix I posted earlier
->
-> today.
->
->
-One of the scenarios that I cannot rule out is nvdimm_probe() racing
->
-nd_region_probe(), but given all the work it takes to create a region I
->
-suspect all the nvdimm_probe() work to have completed...
->
->
-It is at least one potentially wrong hypothesis that needs to be chased
->
-down.
-Maybe there should be a special award for the non-intuitive 
-ndctl create-namespace command (modifies existing namespace and might create
-a different empty one...) I'm sure there is some interesting history behind 
-that one :)
-
-Upshot is I just threw a filesystem on fsdax and wrote some text files on it
-to allow easy grepping. The right data ends up in the memory and a plausible
-namespace description is stored in the LSA.
-
-So to some degree at least it's 'working' on an 8 way direct connected
-set of emulated devices.
-
-One snag is that serial number support isn't yet upstream in QEMU.
-(I have had it in my tree for a while but not posted it yet because of
- QEMU feature freeze)
-https://gitlab.com/jic23/qemu/-/commit/144c783ea8a5fbe169f46ea1ba92940157f42733
-That's needed for meaningful cookie generation.  Otherwise you can build the
-namespace once, but it won't work on next probe as the cookie is 0 and you
-hit some error paths.
-
-Maybe sensible to add a sanity check and fail namespace creation if
-cookie is 0?  (Silly side question, but is there a theoretical risk of
-a serial number / other data combination leading to a fletcher64()
-checksum that happens to be 0 - that would give a very odd bug report!)
-
-So to make it work the following is needed:
-
-1) The kernel fix for mailbox buffer overflow.
-2) Qemu fix for size of arguements for get_lsa
-3) Qemu fix to allow variable size input arguements (for set_lsa)
-4) Serial number patch above + command lines to qemu to set appropriate
-   serial numbers.
-
-I'll send out the QEMU fixes shortly and post the Serial number patch,
-though that almost certainly won't go in until next QEMU development
-cycle starts in a few weeks.
-
-Next up, run through same tests on some other topologies.
-
-Jonathan
-
->
->
->
->
-> [   69.919877] nd_bus ndbus0: START: nd_region.probe(region0)
->
-> [   69.920108] nd_region_probe
->
-> [   69.920623] ------------[ cut here ]------------
->
-> [   69.920675] refcount_t: addition on 0; use-after-free.
->
-> [   69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25
->
-> refcount_warn_saturate+0xa0/0x144
->
-> [   69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port
->
-> cxl_acpi cxl_core
->
-> [   69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+
->
-> #399
->
-> [   69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0
->
-> 02/06/2015
->
-> [   69.931482] Workqueue: events_unbound async_run_entry_fn
->
-> [   69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
->
-> BTYPE=--)
->
-> [   69.934023] pc : refcount_warn_saturate+0xa0/0x144
->
-> [   69.935161] lr : refcount_warn_saturate+0xa0/0x144
->
-> [   69.936541] sp : ffff80000890b960
->
-> [   69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27:
->
-> 0000000000000000
->
-> [   69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24:
->
-> 0000000000000000
->
-> [   69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21:
->
-> ffff0000c5254800
->
-> [   69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18:
->
-> ffffffffffffffff
->
-> [   69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15:
->
-> 0000000000000000
->
-> [   69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12:
->
-> 657466612d657375
->
-> [   69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 :
->
-> ffffa54a8f63d288
->
-> [   69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 :
->
-> 00000000fffff31e
->
-> [   69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 :
->
-> ffff5ab66e5ef000
->
-> root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [   69.954752] x2 :
->
-> 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740
->
-> [   69.957098] Call trace:
->
-> [   69.957959]  refcount_warn_saturate+0xa0/0x144
->
-> [   69.958773]  get_ndd+0x5c/0x80
->
-> [   69.959294]  nd_region_register_namespaces+0xe4/0xe90
->
-> [   69.960253]  nd_region_probe+0x100/0x290
->
-> [   69.960796]  nvdimm_bus_probe+0xf4/0x1c0
->
-> [   69.962087]  really_probe+0x19c/0x3f0
->
-> [   69.962620]  __driver_probe_device+0x11c/0x190
->
-> [   69.963258]  driver_probe_device+0x44/0xf4
->
-> [   69.963773]  __device_attach_driver+0xa4/0x140
->
-> [   69.964471]  bus_for_each_drv+0x84/0xe0
->
-> [   69.965068]  __device_attach+0xb0/0x1f0
->
-> [   69.966101]  device_initial_probe+0x20/0x30
->
-> [   69.967142]  bus_probe_device+0xa4/0xb0
->
-> [   69.968104]  device_add+0x3e8/0x910
->
-> [   69.969111]  nd_async_device_register+0x24/0x74
->
-> [   69.969928]  async_run_entry_fn+0x40/0x150
->
-> [   69.970725]  process_one_work+0x1dc/0x450
->
-> [   69.971796]  worker_thread+0x154/0x450
->
-> [   69.972700]  kthread+0x118/0x120
->
-> [   69.974141]  ret_from_fork+0x10/0x20
->
-> [   69.975141] ---[ end trace 0000000000000000 ]---
->
-> [   70.117887] Into nd_namespace_pmem_set_resource()
-
-Bobo WL wrote:
->
-Hi list
->
->
-I want to test cxl functions in arm64, and found some problems I can't
->
-figure out.
->
->
-My test environment:
->
->
-1. build latest bios from
-https://github.com/tianocore/edk2.git
-master
->
-branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2)
->
-2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git
->
-master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm
->
-support patch:
->
-https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/
->
-3. build Linux kernel from
->
-https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git
-preview
->
-branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683)
->
-4. build latest ndctl tools from
-https://github.com/pmem/ndctl
->
-create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159)
->
->
-And my qemu test commands:
->
-sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \
->
--cpu max -smp 8 -nographic -no-reboot \
->
--kernel $KERNEL -bios $BIOS_BIN \
->
--drive if=none,file=$ROOTFS,format=qcow2,id=hd \
->
--device virtio-blk-pci,drive=hd -append 'root=/dev/vda1
->
-nokaslr dyndbg="module cxl* +p"' \
->
--object memory-backend-ram,size=4G,id=mem0 \
->
--numa node,nodeid=0,cpus=0-7,memdev=mem0 \
->
--net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \
->
--object
->
-memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M
->
-\
->
--object
->
-memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M
->
-\
->
--device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
->
--device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
->
--device cxl-upstream,bus=root_port0,id=us0 \
->
--device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
->
--device
->
-cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
->
--device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
->
--device
->
-cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \
->
--device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
->
--device
->
-cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \
->
--device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
->
--device
->
-cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \
->
--M
->
-cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k
->
->
-And I have got two problems.
->
-1. When I want to create x1 region with command: "cxl create-region -d
->
-decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer
->
-reference. Crash log:
->
->
-[  534.697324] cxl_region region0: config state: 0
->
-[  534.697346] cxl_region region0: probe: -6
->
-[  534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0
->
-[  534.699115] cxl region0: mem0:endpoint3 decoder3.0 add:
->
-mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
->
-[  534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
->
-mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
->
-[  534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add:
->
-mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
->
-[  534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
->
-[  534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
->
-for mem0:decoder3.0 @ 0
->
-[  534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256
->
-[  534.699193] cxl region0: 0000:0d:00.0:port2 target[0] =
->
-0000:0e:00.0 for mem0:decoder3.0 @ 0
->
-[  534.699405] Unable to handle kernel NULL pointer dereference at
->
-virtual address 0000000000000000
->
-[  534.701474] Mem abort info:
->
-[  534.701994]   ESR = 0x0000000086000004
->
-[  534.702653]   EC = 0x21: IABT (current EL), IL = 32 bits
->
-[  534.703616]   SET = 0, FnV = 0
->
-[  534.704174]   EA = 0, S1PTW = 0
->
-[  534.704803]   FSC = 0x04: level 0 translation fault
->
-[  534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000
->
-[  534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
->
-[  534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP
->
-[  534.710301] Modules linked in:
->
-[  534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted
->
-5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11
->
-[  534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
->
-[  534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
->
-[  534.719190] pc : 0x0
->
-[  534.719928] lr : commit_store+0x118/0x2cc
->
-[  534.721007] sp : ffff80000aec3c30
->
-[  534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27:
->
-ffff0000c0c06b30
->
-[  534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24:
->
-ffff0000c0a29400
->
-[  534.725440] x23: 0000000000000003 x22: 0000000000000000 x21:
->
-ffff0000c0c06800
->
-[  534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18:
->
-0000000000000000
->
-[  534.729138] x17: 0000000000000000 x16: 0000000000000000 x15:
->
-0000ffffd41fe838
->
-[  534.731046] x14: 0000000000000000 x13: 0000000000000000 x12:
->
-0000000000000000
->
-[  534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 :
->
-0000000000000000
->
-[  534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 :
->
-ffff0000c0906e80
->
-[  534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
->
-ffff80000aec3bf0
->
-[  534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
->
-ffff0000c155a000
->
-[  534.738878] Call trace:
->
-[  534.739368]  0x0
->
-[  534.739713]  dev_attr_store+0x1c/0x30
->
-[  534.740186]  sysfs_kf_write+0x48/0x58
->
-[  534.740961]  kernfs_fop_write_iter+0x128/0x184
->
-[  534.741872]  new_sync_write+0xdc/0x158
->
-[  534.742706]  vfs_write+0x1ac/0x2a8
->
-[  534.743440]  ksys_write+0x68/0xf0
->
-[  534.744328]  __arm64_sys_write+0x1c/0x28
->
-[  534.745180]  invoke_syscall+0x44/0xf0
->
-[  534.745989]  el0_svc_common+0x4c/0xfc
->
-[  534.746661]  do_el0_svc+0x60/0xa8
->
-[  534.747378]  el0_svc+0x2c/0x78
->
-[  534.748066]  el0t_64_sync_handler+0xb8/0x12c
->
-[  534.748919]  el0t_64_sync+0x18c/0x190
->
-[  534.749629] Code: bad PC value
->
-[  534.750169] ---[ end trace 0000000000000000 ]---
-What was the top kernel commit when you ran this test? What is the line
-number of "commit_store+0x118"?
-
->
-2. When I want to create x4 region with command: "cxl create-region -d
->
-decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors:
->
->
-cxl region: create_region: region0: failed to set target3 to mem3
->
-cxl region: cmd_create_region: created 0 regions
->
->
-And kernel log as below:
->
-[   60.536663] cxl_region region0: config state: 0
->
-[   60.536675] cxl_region region0: probe: -6
->
-[   60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0
->
-[   60.538251] cxl region0: mem0:endpoint3 decoder3.0 add:
->
-mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1
->
-[   60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
->
-mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
->
-[   60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add:
->
-mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1
->
-[   60.538647] cxl region0: mem1:endpoint4 decoder4.0 add:
->
-mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1
->
-[   60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
->
-mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2
->
-[   60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add:
->
-mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1
->
-[   60.539311] cxl region0: mem2:endpoint5 decoder5.0 add:
->
-mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1
->
-[   60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
->
-mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3
->
-[   60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add:
->
-mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1
->
-[   60.539711] cxl region0: mem3:endpoint6 decoder6.0 add:
->
-mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1
->
-[   60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add:
->
-mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4
->
-[   60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add:
->
-mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1
->
-[   60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256
->
-[   60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0
->
-for mem0:decoder3.0 @ 0
->
-[   60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512
->
-[   60.539758] cxl region0: 0000:0d:00.0:port2 target[0] =
->
-0000:0e:00.0 for mem0:decoder3.0 @ 0
->
-[   60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at
->
-1
->
->
-I have tried to write sysfs node manually, got same errors.
->
->
-Hope I can get some helps here.
-What is the output of:
-
-    cxl list -MDTu -d decoder0.0
-
-...? It might be the case that mem1 cannot be mapped by decoder0.0, or
-at least not in the specified order, or that validation check is broken.
-
-Hi Dan,
-
-Thanks for your reply!
-
-On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:
->
->
-What is the output of:
->
->
-cxl list -MDTu -d decoder0.0
->
->
-...? It might be the case that mem1 cannot be mapped by decoder0.0, or
->
-at least not in the specified order, or that validation check is broken.
-Command "cxl list -MDTu -d decoder0.0" output:
-
-[
-  {
-    "memdevs":[
-      {
-        "memdev":"mem2",
-        "pmem_size":"256.00 MiB (268.44 MB)",
-        "ram_size":0,
-        "serial":"0",
-        "host":"0000:11:00.0"
-      },
-      {
-        "memdev":"mem1",
-        "pmem_size":"256.00 MiB (268.44 MB)",
-        "ram_size":0,
-        "serial":"0",
-        "host":"0000:10:00.0"
-      },
-      {
-        "memdev":"mem0",
-        "pmem_size":"256.00 MiB (268.44 MB)",
-        "ram_size":0,
-        "serial":"0",
-        "host":"0000:0f:00.0"
-      },
-      {
-        "memdev":"mem3",
-        "pmem_size":"256.00 MiB (268.44 MB)",
-        "ram_size":0,
-        "serial":"0",
-        "host":"0000:12:00.0"
-      }
-    ]
-  },
-  {
-    "root decoders":[
-      {
-        "decoder":"decoder0.0",
-        "resource":"0x10000000000",
-        "size":"4.00 GiB (4.29 GB)",
-        "pmem_capable":true,
-        "volatile_capable":true,
-        "accelmem_capable":true,
-        "nr_targets":1,
-        "targets":[
-          {
-            "target":"ACPI0016:01",
-            "alias":"pci0000:0c",
-            "position":0,
-            "id":"0xc"
-          }
-        ]
-      }
-    ]
-  }
-]
-
-Bobo WL wrote:
->
-Hi Dan,
->
->
-Thanks for your reply!
->
->
-On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote:
->
->
->
-> What is the output of:
->
->
->
->     cxl list -MDTu -d decoder0.0
->
->
->
-> ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
->
-> at least not in the specified order, or that validation check is broken.
->
->
-Command "cxl list -MDTu -d decoder0.0" output:
-Thanks for this, I think I know the problem, but will try some
-experiments with cxl_test first.
-
-Did the commit_store() crash stop reproducing with latest cxl/preview
-branch?
-
-On Tue, Aug 9, 2022 at 11:17 PM Dan Williams <dan.j.williams@intel.com> wrote:
->
->
-Bobo WL wrote:
->
-> Hi Dan,
->
->
->
-> Thanks for your reply!
->
->
->
-> On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com>
->
-> wrote:
->
-> >
->
-> > What is the output of:
->
-> >
->
-> >     cxl list -MDTu -d decoder0.0
->
-> >
->
-> > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
->
-> > at least not in the specified order, or that validation check is broken.
->
->
->
-> Command "cxl list -MDTu -d decoder0.0" output:
->
->
-Thanks for this, I think I know the problem, but will try some
->
-experiments with cxl_test first.
->
->
-Did the commit_store() crash stop reproducing with latest cxl/preview
->
-branch?
-No, still hitting this bug if don't add extra HB device in qemu
-
-Dan Williams wrote:
->
-Bobo WL wrote:
->
-> Hi Dan,
->
->
->
-> Thanks for your reply!
->
->
->
-> On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com>
->
-> wrote:
->
-> >
->
-> > What is the output of:
->
-> >
->
-> >     cxl list -MDTu -d decoder0.0
->
-> >
->
-> > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
->
-> > at least not in the specified order, or that validation check is broken.
->
->
->
-> Command "cxl list -MDTu -d decoder0.0" output:
->
->
-Thanks for this, I think I know the problem, but will try some
->
-experiments with cxl_test first.
-Hmm, so my cxl_test experiment unfortunately passed so I'm not
-reproducing the failure mode. This is the result of creating x4 region
-with devices directly attached to a single host-bridge:
-
-# cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
-{
-  "region":"region8",
-  "resource":"0xf1f0000000",
-  "size":"1024.00 MiB (1073.74 MB)",
-  "interleave_ways":4,
-  "interleave_granularity":256,
-  "decode_state":"commit",
-  "mappings":[
-    {
-      "position":3,
-      "memdev":"mem11",
-      "decoder":"decoder21.0"
-    },
-    {
-      "position":2,
-      "memdev":"mem9",
-      "decoder":"decoder19.0"
-    },
-    {
-      "position":1,
-      "memdev":"mem10",
-      "decoder":"decoder20.0"
-    },
-    {
-      "position":0,
-      "memdev":"mem12",
-      "decoder":"decoder22.0"
-    }
-  ]
-}
-cxl region: cmd_create_region: created 1 region
-
->
-Did the commit_store() crash stop reproducing with latest cxl/preview
->
-branch?
-I missed the answer to this question.
-
-All of these changes are now in Linus' tree perhaps give that a try and
-post the debug log again?
-
-On Thu, 11 Aug 2022 17:46:55 -0700
-Dan Williams <dan.j.williams@intel.com> wrote:
-
->
-Dan Williams wrote:
->
-> Bobo WL wrote:
->
-> > Hi Dan,
->
-> >
->
-> > Thanks for your reply!
->
-> >
->
-> > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com>
->
-> > wrote:
->
-> > >
->
-> > > What is the output of:
->
-> > >
->
-> > >     cxl list -MDTu -d decoder0.0
->
-> > >
->
-> > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
->
-> > > at least not in the specified order, or that validation check is
->
-> > > broken.
->
-> >
->
-> > Command "cxl list -MDTu -d decoder0.0" output:
->
->
->
-> Thanks for this, I think I know the problem, but will try some
->
-> experiments with cxl_test first.
->
->
-Hmm, so my cxl_test experiment unfortunately passed so I'm not
->
-reproducing the failure mode. This is the result of creating x4 region
->
-with devices directly attached to a single host-bridge:
->
->
-# cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30))
->
-{
->
-"region":"region8",
->
-"resource":"0xf1f0000000",
->
-"size":"1024.00 MiB (1073.74 MB)",
->
-"interleave_ways":4,
->
-"interleave_granularity":256,
->
-"decode_state":"commit",
->
-"mappings":[
->
-{
->
-"position":3,
->
-"memdev":"mem11",
->
-"decoder":"decoder21.0"
->
-},
->
-{
->
-"position":2,
->
-"memdev":"mem9",
->
-"decoder":"decoder19.0"
->
-},
->
-{
->
-"position":1,
->
-"memdev":"mem10",
->
-"decoder":"decoder20.0"
->
-},
->
-{
->
-"position":0,
->
-"memdev":"mem12",
->
-"decoder":"decoder22.0"
->
-}
->
-]
->
-}
->
-cxl region: cmd_create_region: created 1 region
->
->
-> Did the commit_store() crash stop reproducing with latest cxl/preview
->
-> branch?
->
->
-I missed the answer to this question.
->
->
-All of these changes are now in Linus' tree perhaps give that a try and
->
-post the debug log again?
-Hi Dan,
-
-I've moved onto looking at this one.
-1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that 
-up
-at some stage), 1 switch, 4 downstream switch ports each with a type 3
-
-I'm not getting a crash, but can't successfully setup a region.
-Upon adding the final target
-It's failing in check_last_peer() as pos < distance.
-Seems distance is 4 which makes me think it's using the wrong level of the 
-heirarchy for
-some reason or that distance check is wrong.
-Wasn't a good idea to just skip that step though as it goes boom - though
-stack trace is not useful.
-
-Jonathan
-
-On Wed, 17 Aug 2022 17:16:19 +0100
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
-
->
-On Thu, 11 Aug 2022 17:46:55 -0700
->
-Dan Williams <dan.j.williams@intel.com> wrote:
->
->
-> Dan Williams wrote:
->
-> > Bobo WL wrote:
->
-> > > Hi Dan,
->
-> > >
->
-> > > Thanks for your reply!
->
-> > >
->
-> > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com>
->
-> > > wrote:
->
-> > > >
->
-> > > > What is the output of:
->
-> > > >
->
-> > > >     cxl list -MDTu -d decoder0.0
->
-> > > >
->
-> > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or
->
-> > > > at least not in the specified order, or that validation check is
->
-> > > > broken.
->
-> > >
->
-> > > Command "cxl list -MDTu -d decoder0.0" output:
->
-> >
->
-> > Thanks for this, I think I know the problem, but will try some
->
-> > experiments with cxl_test first.
->
->
->
-> Hmm, so my cxl_test experiment unfortunately passed so I'm not
->
-> reproducing the failure mode. This is the result of creating x4 region
->
-> with devices directly attached to a single host-bridge:
->
->
->
-> # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s
->
-> $((1<<30))
->
-> {
->
->   "region":"region8",
->
->   "resource":"0xf1f0000000",
->
->   "size":"1024.00 MiB (1073.74 MB)",
->
->   "interleave_ways":4,
->
->   "interleave_granularity":256,
->
->   "decode_state":"commit",
->
->   "mappings":[
->
->     {
->
->       "position":3,
->
->       "memdev":"mem11",
->
->       "decoder":"decoder21.0"
->
->     },
->
->     {
->
->       "position":2,
->
->       "memdev":"mem9",
->
->       "decoder":"decoder19.0"
->
->     },
->
->     {
->
->       "position":1,
->
->       "memdev":"mem10",
->
->       "decoder":"decoder20.0"
->
->     },
->
->     {
->
->       "position":0,
->
->       "memdev":"mem12",
->
->       "decoder":"decoder22.0"
->
->     }
->
->   ]
->
-> }
->
-> cxl region: cmd_create_region: created 1 region
->
->
->
-> > Did the commit_store() crash stop reproducing with latest cxl/preview
->
-> > branch?
->
->
->
-> I missed the answer to this question.
->
->
->
-> All of these changes are now in Linus' tree perhaps give that a try and
->
-> post the debug log again?
->
->
-Hi Dan,
->
->
-I've moved onto looking at this one.
->
-1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy
->
-that up
->
-at some stage), 1 switch, 4 downstream switch ports each with a type 3
->
->
-I'm not getting a crash, but can't successfully setup a region.
->
-Upon adding the final target
->
-It's failing in check_last_peer() as pos < distance.
->
-Seems distance is 4 which makes me think it's using the wrong level of the
->
-heirarchy for
->
-some reason or that distance check is wrong.
->
-Wasn't a good idea to just skip that step though as it goes boom - though
->
-stack trace is not useful.
-Turns out really weird corruption happens if you accidentally back two type3 
-devices
-with the same memory device. Who would have thought it :)
-
-That aside ignoring the check_last_peer() failure seems to make everything work 
-for this
-topology.  I'm not seeing the crash, so my guess is we fixed it somewhere along 
-the way.
-
-Now for the fun one.  I've replicated the crash if we have
-
-1HB 1*RP 1SW, 4SW-DSP, 4Type3
-
-Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be 
-programmed
-but the null pointer dereference isn't related to that.
-
-The bug is straight forward.  Not all decoders have commit callbacks... Will 
-send out
-a possible fix shortly.
-
-Jonathan
-
-
-
->
->
-Jonathan
->
->
->
->
->
->
-
-On Thu, 18 Aug 2022 17:37:40 +0100
-Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
-
->
-On Wed, 17 Aug 2022 17:16:19 +0100
->
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
->
-> On Thu, 11 Aug 2022 17:46:55 -0700
->
-> Dan Williams <dan.j.williams@intel.com> wrote:
->
->
->
-> > Dan Williams wrote:
->
-> > > Bobo WL wrote:
->
-> > > > Hi Dan,
->
-> > > >
->
-> > > > Thanks for your reply!
->
-> > > >
->
-> > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams
->
-> > > > <dan.j.williams@intel.com> wrote:
->
-> > > > >
->
-> > > > > What is the output of:
->
-> > > > >
->
-> > > > >     cxl list -MDTu -d decoder0.0
->
-> > > > >
->
-> > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0,
->
-> > > > > or
->
-> > > > > at least not in the specified order, or that validation check is
->
-> > > > > broken.
->
-> > > >
->
-> > > > Command "cxl list -MDTu -d decoder0.0" output:
->
-> > >
->
-> > > Thanks for this, I think I know the problem, but will try some
->
-> > > experiments with cxl_test first.
->
-> >
->
-> > Hmm, so my cxl_test experiment unfortunately passed so I'm not
->
-> > reproducing the failure mode. This is the result of creating x4 region
->
-> > with devices directly attached to a single host-bridge:
->
-> >
->
-> > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s
->
-> > $((1<<30))
->
-> > {
->
-> >   "region":"region8",
->
-> >   "resource":"0xf1f0000000",
->
-> >   "size":"1024.00 MiB (1073.74 MB)",
->
-> >   "interleave_ways":4,
->
-> >   "interleave_granularity":256,
->
-> >   "decode_state":"commit",
->
-> >   "mappings":[
->
-> >     {
->
-> >       "position":3,
->
-> >       "memdev":"mem11",
->
-> >       "decoder":"decoder21.0"
->
-> >     },
->
-> >     {
->
-> >       "position":2,
->
-> >       "memdev":"mem9",
->
-> >       "decoder":"decoder19.0"
->
-> >     },
->
-> >     {
->
-> >       "position":1,
->
-> >       "memdev":"mem10",
->
-> >       "decoder":"decoder20.0"
->
-> >     },
->
-> >     {
->
-> >       "position":0,
->
-> >       "memdev":"mem12",
->
-> >       "decoder":"decoder22.0"
->
-> >     }
->
-> >   ]
->
-> > }
->
-> > cxl region: cmd_create_region: created 1 region
->
-> >
->
-> > > Did the commit_store() crash stop reproducing with latest cxl/preview
->
-> > > branch?
->
-> >
->
-> > I missed the answer to this question.
->
-> >
->
-> > All of these changes are now in Linus' tree perhaps give that a try and
->
-> > post the debug log again?
->
->
->
-> Hi Dan,
->
->
->
-> I've moved onto looking at this one.
->
-> 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy
->
-> that up
->
-> at some stage), 1 switch, 4 downstream switch ports each with a type 3
->
->
->
-> I'm not getting a crash, but can't successfully setup a region.
->
-> Upon adding the final target
->
-> It's failing in check_last_peer() as pos < distance.
->
-> Seems distance is 4 which makes me think it's using the wrong level of the
->
-> heirarchy for
->
-> some reason or that distance check is wrong.
->
-> Wasn't a good idea to just skip that step though as it goes boom - though
->
-> stack trace is not useful.
->
->
-Turns out really weird corruption happens if you accidentally back two type3
->
-devices
->
-with the same memory device. Who would have thought it :)
->
->
-That aside ignoring the check_last_peer() failure seems to make everything
->
-work for this
->
-topology.  I'm not seeing the crash, so my guess is we fixed it somewhere
->
-along the way.
->
->
-Now for the fun one.  I've replicated the crash if we have
->
->
-1HB 1*RP 1SW, 4SW-DSP, 4Type3
->
->
-Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be
->
-programmed
->
-but the null pointer dereference isn't related to that.
->
->
-The bug is straight forward.  Not all decoders have commit callbacks... Will
->
-send out
->
-a possible fix shortly.
->
-For completeness I'm carrying this hack because I haven't gotten my head
-around the right fix for check_last_peer() failing on this test topology.
-
-diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
-index c49d9a5f1091..275e143bd748 100644
---- a/drivers/cxl/core/region.c
-+++ b/drivers/cxl/core/region.c
-@@ -978,7 +978,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
-                                rc = check_last_peer(cxled, ep, cxl_rr,
-                                                     distance);
-                                if (rc)
--                                       return rc;
-+                                       //      return rc;
-                                goto out_target_set;
-                        }
-                goto add_target;
---
-
-I might find more bugs with more testing, but this is all the ones I've
-seen so far + in Bobo's reports.  Qemu fixes are now in upstream so
-will be there in the release. 
-
-As a reminder, testing on QEMU has a few corners...
-
-Need a patch to add serial number ECAP support. It is on list for revew,
-but will have wait for after QEMU 7.1 release (which may be next week)
-
-QEMU still assumes HDM decoder on the host bridge will be programmed.
-So if you want anything to work there should be at least
-2 RP below the HB (no need to plug anything in to one of them).
-
-I don't want to add a commandline parameter to hide the decoder in QEMU
-and detecting there is only one RP would require moving a bunch of static
-stuff into runtime code (I think).
-
-I still think we should make the kernel check to see if there is a decoder,
-but if not I might see how bad a hack it is to have QEMU ignore that decoder
-if not committed in this one special case (HB HDM decoder with only one place
-it can send stuff). Obviously that would be a break from specification
-so less than idea!
-
-Thanks,
-
-Jonathan
-
-On Fri, 19 Aug 2022 09:46:55 +0100
-Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
-
->
-On Thu, 18 Aug 2022 17:37:40 +0100
->
-Jonathan Cameron via <qemu-devel@nongnu.org> wrote:
->
->
-> On Wed, 17 Aug 2022 17:16:19 +0100
->
-> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
->
->
->
-> > On Thu, 11 Aug 2022 17:46:55 -0700
->
-> > Dan Williams <dan.j.williams@intel.com> wrote:
->
-> >
->
-> > > Dan Williams wrote:
->
-> > > > Bobo WL wrote:
->
-> > > > > Hi Dan,
->
-> > > > >
->
-> > > > > Thanks for your reply!
->
-> > > > >
->
-> > > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams
->
-> > > > > <dan.j.williams@intel.com> wrote:
->
-> > > > > >
->
-> > > > > > What is the output of:
->
-> > > > > >
->
-> > > > > >     cxl list -MDTu -d decoder0.0
->
-> > > > > >
->
-> > > > > > ...? It might be the case that mem1 cannot be mapped by
->
-> > > > > > decoder0.0, or
->
-> > > > > > at least not in the specified order, or that validation check is
->
-> > > > > > broken.
->
-> > > > >
->
-> > > > > Command "cxl list -MDTu -d decoder0.0" output:
->
-> > > >
->
-> > > > Thanks for this, I think I know the problem, but will try some
->
-> > > > experiments with cxl_test first.
->
-> > >
->
-> > > Hmm, so my cxl_test experiment unfortunately passed so I'm not
->
-> > > reproducing the failure mode. This is the result of creating x4 region
->
-> > > with devices directly attached to a single host-bridge:
->
-> > >
->
-> > > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s
->
-> > > $((1<<30))
->
-> > > {
->
-> > >   "region":"region8",
->
-> > >   "resource":"0xf1f0000000",
->
-> > >   "size":"1024.00 MiB (1073.74 MB)",
->
-> > >   "interleave_ways":4,
->
-> > >   "interleave_granularity":256,
->
-> > >   "decode_state":"commit",
->
-> > >   "mappings":[
->
-> > >     {
->
-> > >       "position":3,
->
-> > >       "memdev":"mem11",
->
-> > >       "decoder":"decoder21.0"
->
-> > >     },
->
-> > >     {
->
-> > >       "position":2,
->
-> > >       "memdev":"mem9",
->
-> > >       "decoder":"decoder19.0"
->
-> > >     },
->
-> > >     {
->
-> > >       "position":1,
->
-> > >       "memdev":"mem10",
->
-> > >       "decoder":"decoder20.0"
->
-> > >     },
->
-> > >     {
->
-> > >       "position":0,
->
-> > >       "memdev":"mem12",
->
-> > >       "decoder":"decoder22.0"
->
-> > >     }
->
-> > >   ]
->
-> > > }
->
-> > > cxl region: cmd_create_region: created 1 region
->
-> > >
->
-> > > > Did the commit_store() crash stop reproducing with latest cxl/preview
->
-> > > > branch?
->
-> > >
->
-> > > I missed the answer to this question.
->
-> > >
->
-> > > All of these changes are now in Linus' tree perhaps give that a try and
->
-> > > post the debug log again?
->
-> >
->
-> > Hi Dan,
->
-> >
->
-> > I've moved onto looking at this one.
->
-> > 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy
->
-> > that up
->
-> > at some stage), 1 switch, 4 downstream switch ports each with a type 3
->
-> >
->
-> > I'm not getting a crash, but can't successfully setup a region.
->
-> > Upon adding the final target
->
-> > It's failing in check_last_peer() as pos < distance.
->
-> > Seems distance is 4 which makes me think it's using the wrong level of
->
-> > the heirarchy for
->
-> > some reason or that distance check is wrong.
->
-> > Wasn't a good idea to just skip that step though as it goes boom - though
->
-> > stack trace is not useful.
->
->
->
-> Turns out really weird corruption happens if you accidentally back two
->
-> type3 devices
->
-> with the same memory device. Who would have thought it :)
->
->
->
-> That aside ignoring the check_last_peer() failure seems to make everything
->
-> work for this
->
-> topology.  I'm not seeing the crash, so my guess is we fixed it somewhere
->
-> along the way.
->
->
->
-> Now for the fun one.  I've replicated the crash if we have
->
->
->
-> 1HB 1*RP 1SW, 4SW-DSP, 4Type3
->
->
->
-> Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be
->
-> programmed
->
-> but the null pointer dereference isn't related to that.
->
->
->
-> The bug is straight forward.  Not all decoders have commit callbacks...
->
-> Will send out
->
-> a possible fix shortly.
->
->
->
-For completeness I'm carrying this hack because I haven't gotten my head
->
-around the right fix for check_last_peer() failing on this test topology.
->
->
-diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
->
-index c49d9a5f1091..275e143bd748 100644
->
---- a/drivers/cxl/core/region.c
->
-+++ b/drivers/cxl/core/region.c
->
-@@ -978,7 +978,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
->
-rc = check_last_peer(cxled, ep, cxl_rr,
->
-distance);
->
-if (rc)
->
--                                       return rc;
->
-+                                       //      return rc;
->
-goto out_target_set;
->
-}
->
-goto add_target;
-I'm still carrying this hack and still haven't worked out the right fix.
-
-Suggestions welcome!  If not I'll hopefully get some time on this
-towards the end of the week.
-
-Jonathan
-
diff --git a/results/classifier/008/other/35170175 b/results/classifier/008/other/35170175
deleted file mode 100644
index 50cc6a327..000000000
--- a/results/classifier/008/other/35170175
+++ /dev/null
@@ -1,531 +0,0 @@
-other: 0.933
-permissions: 0.870
-graphic: 0.844
-debug: 0.830
-performance: 0.818
-semantic: 0.798
-device: 0.787
-boot: 0.719
-KVM: 0.709
-files: 0.706
-PID: 0.699
-socket: 0.681
-network: 0.666
-vnc: 0.633
-
-[Qemu-devel] [BUG] QEMU crashes with dpdk virtio pmd
-
-Qemu crashes, with pre-condition:
-vm xml config with multiqueue, and the vm's driver virtio-net support 
-multi-queue
-
-reproduce steps:
-i. start dpdk testpmd in VM with the virtio nic
-ii. stop testpmd
-iii. reboot the VM
-
-This commit "f9d6dbf0  remove virtio queues if the guest doesn't support 
-multiqueue" is introduced.
-
-Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
-VM DPDK version:  DPDK-1.6.1
-
-Call Trace:
-#0  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
-#1  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
-#2  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
-#3  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
-#4  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
-#5  0x00007f6088fea32c in iter_remove_or_steal () from 
-/usr/lib64/libglib-2.0.so.0
-#6  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at 
-qom/object.c:410
-#7  object_finalize (data=0x7f6091e74800) at qom/object.c:467
-#8  object_unref (address@hidden) at qom/object.c:903
-#9  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at 
-git/qemu/exec.c:1154
-#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
-#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514
-#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
-
-Call Trace:
-#0  0x00007fdccaeb9790 in ?? ()
-#1  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at 
-qom/object.c:405
-#2  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
-#3  object_unref (address@hidden) at qom/object.c:903
-#4  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at 
-git/qemu/exec.c:1154
-#5  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
-#6  address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514
-#7  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#8  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#9  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
-
-The q->tx_bh will free in virtio_net_del_queue() function, when remove virtio 
-queues 
-if the guest doesn't support multiqueue. But it might be still referenced by 
-others (eg . virtio_net_set_status()),
-which need so set NULL.
-
-diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
-index 7d091c9..98bd683 100644
---- a/hw/net/virtio-net.c
-+++ b/hw/net/virtio-net.c
-@@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, int index)
-     if (q->tx_timer) {
-         timer_del(q->tx_timer);
-         timer_free(q->tx_timer);
-+        q->tx_timer = NULL;
-     } else {
-         qemu_bh_delete(q->tx_bh);
-+        q->tx_bh = NULL;
-     }
-+    q->tx_waiting = 0;
-     virtio_del_queue(vdev, index * 2 + 1);
- }
-
-From: wangyunjian 
-Sent: Monday, April 24, 2017 6:10 PM
-To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason Wang' 
-<address@hidden>
-Cc: wangyunjian <address@hidden>; caihe <address@hidden>
-Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd 
-
-Qemu crashes, with pre-condition:
-vm xml config with multiqueue, and the vm's driver virtio-net support 
-multi-queue
-
-reproduce steps:
-i. start dpdk testpmd in VM with the virtio nic
-ii. stop testpmd
-iii. reboot the VM
-
-This commit "f9d6dbf0  remove virtio queues if the guest doesn't support 
-multiqueue" is introduced.
-
-Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
-VM DPDK version:  DPDK-1.6.1
-
-Call Trace:
-#0  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
-#1  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
-#2  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
-#3  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
-#4  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
-#5  0x00007f6088fea32c in iter_remove_or_steal () from 
-/usr/lib64/libglib-2.0.so.0
-#6  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at 
-qom/object.c:410
-#7  object_finalize (data=0x7f6091e74800) at qom/object.c:467
-#8  object_unref (address@hidden) at qom/object.c:903
-#9  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at 
-git/qemu/exec.c:1154
-#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
-#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514
-#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
-
-Call Trace:
-#0  0x00007fdccaeb9790 in ?? ()
-#1  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at 
-qom/object.c:405
-#2  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
-#3  object_unref (address@hidden) at qom/object.c:903
-#4  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at 
-git/qemu/exec.c:1154
-#5  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
-#6  address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514
-#7  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#8  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#9  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
-
-On 2017年04月25日 19:37, wangyunjian wrote:
-The q->tx_bh will free in virtio_net_del_queue() function, when remove virtio 
-queues
-if the guest doesn't support multiqueue. But it might be still referenced by 
-others (eg . virtio_net_set_status()),
-which need so set NULL.
-
-diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
-index 7d091c9..98bd683 100644
---- a/hw/net/virtio-net.c
-+++ b/hw/net/virtio-net.c
-@@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, int index)
-      if (q->tx_timer) {
-          timer_del(q->tx_timer);
-          timer_free(q->tx_timer);
-+        q->tx_timer = NULL;
-      } else {
-          qemu_bh_delete(q->tx_bh);
-+        q->tx_bh = NULL;
-      }
-+    q->tx_waiting = 0;
-      virtio_del_queue(vdev, index * 2 + 1);
-  }
-Thanks a lot for the fix.
-
-Two questions:
-- If virtio_net_set_status() is the only function that may access tx_bh,
-it looks like setting tx_waiting to zero is sufficient?
-- Can you post a formal patch for this?
-
-Thanks
-From: wangyunjian
-Sent: Monday, April 24, 2017 6:10 PM
-To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason Wang' 
-<address@hidden>
-Cc: wangyunjian <address@hidden>; caihe <address@hidden>
-Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd
-
-Qemu crashes, with pre-condition:
-vm xml config with multiqueue, and the vm's driver virtio-net support 
-multi-queue
-
-reproduce steps:
-i. start dpdk testpmd in VM with the virtio nic
-ii. stop testpmd
-iii. reboot the VM
-
-This commit "f9d6dbf0  remove virtio queues if the guest doesn't support 
-multiqueue" is introduced.
-
-Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
-VM DPDK version:  DPDK-1.6.1
-
-Call Trace:
-#0  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
-#1  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
-#2  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
-#3  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
-#4  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
-#5  0x00007f6088fea32c in iter_remove_or_steal () from 
-/usr/lib64/libglib-2.0.so.0
-#6  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at 
-qom/object.c:410
-#7  object_finalize (data=0x7f6091e74800) at qom/object.c:467
-#8  object_unref (address@hidden) at qom/object.c:903
-#9  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at 
-git/qemu/exec.c:1154
-#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
-#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514
-#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
-
-Call Trace:
-#0  0x00007fdccaeb9790 in ?? ()
-#1  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at 
-qom/object.c:405
-#2  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
-#3  object_unref (address@hidden) at qom/object.c:903
-#4  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at 
-git/qemu/exec.c:1154
-#5  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
-#6  address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514
-#7  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at 
-util/rcu.c:272
-#8  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
-#9  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
-
-CCing Paolo and Stefan, since it has a relationship with bh in Qemu.
-
->
------Original Message-----
->
-From: Jason Wang [
-mailto:address@hidden
->
->
->
-On 2017年04月25日 19:37, wangyunjian wrote:
->
-> The q->tx_bh will free in virtio_net_del_queue() function, when remove
->
-> virtio
->
-queues
->
-> if the guest doesn't support multiqueue. But it might be still referenced by
->
-others (eg . virtio_net_set_status()),
->
-> which need so set NULL.
->
->
->
-> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
->
-> index 7d091c9..98bd683 100644
->
-> --- a/hw/net/virtio-net.c
->
-> +++ b/hw/net/virtio-net.c
->
-> @@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n,
->
-int index)
->
->       if (q->tx_timer) {
->
->           timer_del(q->tx_timer);
->
->           timer_free(q->tx_timer);
->
-> +        q->tx_timer = NULL;
->
->       } else {
->
->           qemu_bh_delete(q->tx_bh);
->
-> +        q->tx_bh = NULL;
->
->       }
->
-> +    q->tx_waiting = 0;
->
->       virtio_del_queue(vdev, index * 2 + 1);
->
->   }
->
->
-Thanks a lot for the fix.
->
->
-Two questions:
->
->
-- If virtio_net_set_status() is the only function that may access tx_bh,
->
-it looks like setting tx_waiting to zero is sufficient?
-Currently yes, but we don't assure that it works for all scenarios, so
-we set the tx_bh and tx_timer to NULL to avoid to possibly access wild pointer,
-which is the common method for usage of bh in Qemu.
-
-I have another question about the root cause of this issure.
-
-This below trace is the path of setting tx_waiting to one in 
-virtio_net_handle_tx_bh() :
-
-Breakpoint 1, virtio_net_handle_tx_bh (vdev=0x0, vq=0x7f335ad13900) at 
-/data/wyj/git/qemu/hw/net/virtio-net.c:1398
-1398    {
-(gdb) bt
-#0  virtio_net_handle_tx_bh (vdev=0x0, vq=0x7f335ad13900) at 
-/data/wyj/git/qemu/hw/net/virtio-net.c:1398
-#1  0x00007f3357bddf9c in virtio_bus_set_host_notifier (bus=<optimized out>, 
-address@hidden, address@hidden) at hw/virtio/virtio-bus.c:297
-#2  0x00007f3357a0055d in vhost_dev_disable_notifiers (address@hidden, 
-address@hidden) at /data/wyj/git/qemu/hw/virtio/vhost.c:1422
-#3  0x00007f33579e3373 in vhost_net_stop_one (net=0x7f335ad84dc0, 
-dev=0x7f335c6f5f90) at /data/wyj/git/qemu/hw/net/vhost_net.c:289
-#4  0x00007f33579e385b in vhost_net_stop (address@hidden, ncs=<optimized out>, 
-address@hidden) at /data/wyj/git/qemu/hw/net/vhost_net.c:367
-#5  0x00007f33579e15de in virtio_net_vhost_status (status=<optimized out>, 
-n=0x7f335c6f5f90) at /data/wyj/git/qemu/hw/net/virtio-net.c:176
-#6  virtio_net_set_status (vdev=0x7f335c6f5f90, status=0 '\000') at 
-/data/wyj/git/qemu/hw/net/virtio-net.c:250
-#7  0x00007f33579f8dc6 in virtio_set_status (address@hidden, address@hidden 
-'\000') at /data/wyj/git/qemu/hw/virtio/virtio.c:1146
-#8  0x00007f3357bdd3cc in virtio_ioport_write (val=0, addr=18, 
-opaque=0x7f335c6eda80) at hw/virtio/virtio-pci.c:387
-#9  virtio_pci_config_write (opaque=0x7f335c6eda80, addr=18, val=0, 
-size=<optimized out>) at hw/virtio/virtio-pci.c:511
-#10 0x00007f33579b2155 in memory_region_write_accessor (mr=0x7f335c6ee470, 
-addr=18, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized 
-out>, attrs=...) at /data/wyj/git/qemu/memory.c:526
-#11 0x00007f33579af2e9 in access_with_adjusted_size (address@hidden, 
-address@hidden, address@hidden, access_size_min=<optimized out>, 
-access_size_max=<optimized out>, address@hidden
-    0x7f33579b20f0 <memory_region_write_accessor>, address@hidden, 
-address@hidden) at /data/wyj/git/qemu/memory.c:592
-#12 0x00007f33579b2e15 in memory_region_dispatch_write (address@hidden, 
-address@hidden, data=0, address@hidden, address@hidden) at 
-/data/wyj/git/qemu/memory.c:1319
-#13 0x00007f335796cd93 in address_space_write_continue (mr=0x7f335c6ee470, l=1, 
-addr1=18, len=1, buf=0x7f335773d000 "", attrs=..., addr=49170, 
-as=0x7f3358317060 <address_space_io>) at /data/wyj/git/qemu/exec.c:2834
-#14 address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., 
-buf=<optimized out>, len=<optimized out>) at /data/wyj/git/qemu/exec.c:2879
-#15 0x00007f335796d3ad in address_space_rw (as=<optimized out>, address@hidden, 
-attrs=..., address@hidden, buf=<optimized out>, address@hidden, address@hidden) 
-at /data/wyj/git/qemu/exec.c:2981
-#16 0x00007f33579ae226 in kvm_handle_io (count=1, size=1, direction=<optimized 
-out>, data=<optimized out>, attrs=..., port=49170) at 
-/data/wyj/git/qemu/kvm-all.c:1803
-#17 kvm_cpu_exec (address@hidden) at /data/wyj/git/qemu/kvm-all.c:2032
-#18 0x00007f335799b632 in qemu_kvm_cpu_thread_fn (arg=0x7f335ae82070) at 
-/data/wyj/git/qemu/cpus.c:1118
-#19 0x00007f3352983dc5 in start_thread () from /usr/lib64/libpthread.so.0
-#20 0x00007f335113571d in clone () from /usr/lib64/libc.so.6
-
-It calls qemu_bh_schedule(q->tx_bh) at the bottom of virtio_net_handle_tx_bh(),
-I don't know why virtio_net_tx_bh() doesn't be invoked, so that the 
-q->tx_waiting is not zero.
-[ps: we added logs in virtio_net_tx_bh() to verify that]
-
-Some other information: 
-
-It won't crash if we don't use vhost-net.
-
-
-Thanks,
--Gonglei
-
->
-- Can you post a formal patch for this?
->
->
-Thanks
->
->
-> From: wangyunjian
->
-> Sent: Monday, April 24, 2017 6:10 PM
->
-> To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason
->
-Wang' <address@hidden>
->
-> Cc: wangyunjian <address@hidden>; caihe <address@hidden>
->
-> Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd
->
->
->
-> Qemu crashes, with pre-condition:
->
-> vm xml config with multiqueue, and the vm's driver virtio-net support
->
-multi-queue
->
->
->
-> reproduce steps:
->
-> i. start dpdk testpmd in VM with the virtio nic
->
-> ii. stop testpmd
->
-> iii. reboot the VM
->
->
->
-> This commit "f9d6dbf0  remove virtio queues if the guest doesn't support
->
-multiqueue" is introduced.
->
->
->
-> Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a)
->
-> VM DPDK version:  DPDK-1.6.1
->
->
->
-> Call Trace:
->
-> #0  0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6
->
-> #1  0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6
->
-> #2  0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6
->
-> #3  0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6
->
-> #4  0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0
->
-> #5  0x00007f6088fea32c in iter_remove_or_steal () from
->
-/usr/lib64/libglib-2.0.so.0
->
-> #6  0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800)
->
-at qom/object.c:410
->
-> #7  object_finalize (data=0x7f6091e74800) at qom/object.c:467
->
-> #8  object_unref (address@hidden) at qom/object.c:903
->
-> #9  0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at
->
-git/qemu/exec.c:1154
->
-> #10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163
->
-> #11 address_space_dispatch_free (d=0x7f6090b72b90) at
->
-git/qemu/exec.c:2514
->
-> #12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at
->
-util/rcu.c:272
->
-> #13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0
->
-> #14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6
->
->
->
-> Call Trace:
->
-> #0  0x00007fdccaeb9790 in ?? ()
->
-> #1  0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at
->
-qom/object.c:405
->
-> #2  object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467
->
-> #3  object_unref (address@hidden) at qom/object.c:903
->
-> #4  0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at
->
-git/qemu/exec.c:1154
->
-> #5  phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163
->
-> #6  address_space_dispatch_free (d=0x7fdcdc86a9e0) at
->
-git/qemu/exec.c:2514
->
-> #7  0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at
->
-util/rcu.c:272
->
-> #8  0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0
->
-> #9  0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6
->
->
->
->
-
-On 25/04/2017 14:02, Jason Wang wrote:
->
->
-Thanks a lot for the fix.
->
->
-Two questions:
->
->
-- If virtio_net_set_status() is the only function that may access tx_bh,
->
-it looks like setting tx_waiting to zero is sufficient?
-I think clearing tx_bh is better anyway, as leaving a dangling pointer
-is not very hygienic.
-
-Paolo
-
->
-- Can you post a formal patch for this?
-
diff --git a/results/classifier/008/other/42974450 b/results/classifier/008/other/42974450
deleted file mode 100644
index fe2110a2f..000000000
--- a/results/classifier/008/other/42974450
+++ /dev/null
@@ -1,439 +0,0 @@
-other: 0.940
-debug: 0.924
-device: 0.921
-permissions: 0.918
-semantic: 0.917
-performance: 0.914
-PID: 0.913
-boot: 0.909
-network: 0.907
-graphic: 0.906
-KVM: 0.901
-socket: 0.899
-files: 0.877
-vnc: 0.869
-
-[Bug Report] Possible Missing Endianness Conversion
-
-The virtio packed virtqueue support patch[1] suggests converting
-endianness by lines:
-
-virtio_tswap16s(vdev, &e->off_wrap);
-virtio_tswap16s(vdev, &e->flags);
-
-Though both of these conversion statements aren't present in the
-latest qemu code here[2]
-
-Is this intentional?
-
-[1]:
-https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
-[2]:
-https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
-
-CCing Jason.
-
-On Mon, Jun 24, 2024 at 4:30 PM Xoykie <xoykie@gmail.com> wrote:
->
->
-The virtio packed virtqueue support patch[1] suggests converting
->
-endianness by lines:
->
->
-virtio_tswap16s(vdev, &e->off_wrap);
->
-virtio_tswap16s(vdev, &e->flags);
->
->
-Though both of these conversion statements aren't present in the
->
-latest qemu code here[2]
->
->
-Is this intentional?
-Good catch!
-
-It looks like it was removed (maybe by mistake) by commit
-d152cdd6f6 ("virtio: use virtio accessor to access packed event")
-
-Jason can you confirm that?
-
-Thanks,
-Stefano
-
->
->
-[1]:
-https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
->
-[2]:
-https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
->
-
-On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
->
->
-CCing Jason.
->
->
-On Mon, Jun 24, 2024 at 4:30 PM Xoykie <xoykie@gmail.com> wrote:
->
->
->
-> The virtio packed virtqueue support patch[1] suggests converting
->
-> endianness by lines:
->
->
->
-> virtio_tswap16s(vdev, &e->off_wrap);
->
-> virtio_tswap16s(vdev, &e->flags);
->
->
->
-> Though both of these conversion statements aren't present in the
->
-> latest qemu code here[2]
->
->
->
-> Is this intentional?
->
->
-Good catch!
->
->
-It looks like it was removed (maybe by mistake) by commit
->
-d152cdd6f6 ("virtio: use virtio accessor to access packed event")
-That commit changes from:
-
--    address_space_read_cached(cache, off_off, &e->off_wrap,
--                              sizeof(e->off_wrap));
--    virtio_tswap16s(vdev, &e->off_wrap);
-
-which does a byte read of 2 bytes and then swaps the bytes
-depending on the host endianness and the value of
-virtio_access_is_big_endian()
-
-to this:
-
-+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-
-virtio_lduw_phys_cached() is a small function which calls
-either lduw_be_phys_cached() or lduw_le_phys_cached()
-depending on the value of virtio_access_is_big_endian().
-(And lduw_be_phys_cached() and lduw_le_phys_cached() do
-the right thing for the host-endianness to do a "load
-a specifically big or little endian 16-bit value".)
-
-Which is to say that because we use a load/store function that's
-explicit about the size of the data type it is accessing, the
-function itself can handle doing the load as big or little
-endian, rather than the calling code having to do a manual swap after
-it has done a load-as-bag-of-bytes. This is generally preferable
-as it's less error-prone.
-
-(Explicit swap-after-loading still has a place where the
-code is doing a load of a whole structure out of the
-guest and then swapping each struct field after the fact,
-because it means we can do a single load-from-guest-memory
-rather than a whole sequence of calls all the way down
-through the memory subsystem.)
-
-thanks
--- PMM
-
-On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
-On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
-CCing Jason.
-
-On Mon, Jun 24, 2024 at 4:30 PM Xoykie <xoykie@gmail.com> wrote:
->
-> The virtio packed virtqueue support patch[1] suggests converting
-> endianness by lines:
->
-> virtio_tswap16s(vdev, &e->off_wrap);
-> virtio_tswap16s(vdev, &e->flags);
->
-> Though both of these conversion statements aren't present in the
-> latest qemu code here[2]
->
-> Is this intentional?
-
-Good catch!
-
-It looks like it was removed (maybe by mistake) by commit
-d152cdd6f6 ("virtio: use virtio accessor to access packed event")
-That commit changes from:
-
--    address_space_read_cached(cache, off_off, &e->off_wrap,
--                              sizeof(e->off_wrap));
--    virtio_tswap16s(vdev, &e->off_wrap);
-
-which does a byte read of 2 bytes and then swaps the bytes
-depending on the host endianness and the value of
-virtio_access_is_big_endian()
-
-to this:
-
-+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-
-virtio_lduw_phys_cached() is a small function which calls
-either lduw_be_phys_cached() or lduw_le_phys_cached()
-depending on the value of virtio_access_is_big_endian().
-(And lduw_be_phys_cached() and lduw_le_phys_cached() do
-the right thing for the host-endianness to do a "load
-a specifically big or little endian 16-bit value".)
-
-Which is to say that because we use a load/store function that's
-explicit about the size of the data type it is accessing, the
-function itself can handle doing the load as big or little
-endian, rather than the calling code having to do a manual swap after
-it has done a load-as-bag-of-bytes. This is generally preferable
-as it's less error-prone.
-Thanks for the details!
-
-So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
-
-I mean:
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
-index 893a072c9d..2e5e67bdb9 100644
---- a/hw/virtio/virtio.c
-+++ b/hw/virtio/virtio.c
-@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
-     /* Make sure flags is seen before off_wrap */
-     smp_rmb();
-     e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
--    virtio_tswap16s(vdev, &e->flags);
- }
-
- static void vring_packed_off_wrap_write(VirtIODevice *vdev,
-
-Thanks,
-Stefano
-(Explicit swap-after-loading still has a place where the
-code is doing a load of a whole structure out of the
-guest and then swapping each struct field after the fact,
-because it means we can do a single load-from-guest-memory
-rather than a whole sequence of calls all the way down
-through the memory subsystem.)
-
-thanks
--- PMM
-
-On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote:
->
->
-On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
->
->On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
->
->>
->
->> CCing Jason.
->
->>
->
->> On Mon, Jun 24, 2024 at 4:30 PM Xoykie <xoykie@gmail.com> wrote:
->
->> >
->
->> > The virtio packed virtqueue support patch[1] suggests converting
->
->> > endianness by lines:
->
->> >
->
->> > virtio_tswap16s(vdev, &e->off_wrap);
->
->> > virtio_tswap16s(vdev, &e->flags);
->
->> >
->
->> > Though both of these conversion statements aren't present in the
->
->> > latest qemu code here[2]
->
->> >
->
->> > Is this intentional?
->
->>
->
->> Good catch!
->
->>
->
->> It looks like it was removed (maybe by mistake) by commit
->
->> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
->
->
->
->That commit changes from:
->
->
->
->-    address_space_read_cached(cache, off_off, &e->off_wrap,
->
->-                              sizeof(e->off_wrap));
->
->-    virtio_tswap16s(vdev, &e->off_wrap);
->
->
->
->which does a byte read of 2 bytes and then swaps the bytes
->
->depending on the host endianness and the value of
->
->virtio_access_is_big_endian()
->
->
->
->to this:
->
->
->
->+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
->
->
->
->virtio_lduw_phys_cached() is a small function which calls
->
->either lduw_be_phys_cached() or lduw_le_phys_cached()
->
->depending on the value of virtio_access_is_big_endian().
->
->(And lduw_be_phys_cached() and lduw_le_phys_cached() do
->
->the right thing for the host-endianness to do a "load
->
->a specifically big or little endian 16-bit value".)
->
->
->
->Which is to say that because we use a load/store function that's
->
->explicit about the size of the data type it is accessing, the
->
->function itself can handle doing the load as big or little
->
->endian, rather than the calling code having to do a manual swap after
->
->it has done a load-as-bag-of-bytes. This is generally preferable
->
->as it's less error-prone.
->
->
-Thanks for the details!
->
->
-So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
->
->
-I mean:
->
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
->
-index 893a072c9d..2e5e67bdb9 100644
->
---- a/hw/virtio/virtio.c
->
-+++ b/hw/virtio/virtio.c
->
-@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
->
-/* Make sure flags is seen before off_wrap */
->
-smp_rmb();
->
-e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
->
--    virtio_tswap16s(vdev, &e->flags);
->
-}
-That definitely looks like it's probably not correct...
-
--- PMM
-
-On Fri, Jun 28, 2024 at 03:53:09PM GMT, Peter Maydell wrote:
-On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote:
-On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
->On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote:
->>
->> CCing Jason.
->>
->> On Mon, Jun 24, 2024 at 4:30 PM Xoykie <xoykie@gmail.com> wrote:
->> >
->> > The virtio packed virtqueue support patch[1] suggests converting
->> > endianness by lines:
->> >
->> > virtio_tswap16s(vdev, &e->off_wrap);
->> > virtio_tswap16s(vdev, &e->flags);
->> >
->> > Though both of these conversion statements aren't present in the
->> > latest qemu code here[2]
->> >
->> > Is this intentional?
->>
->> Good catch!
->>
->> It looks like it was removed (maybe by mistake) by commit
->> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
->
->That commit changes from:
->
->-    address_space_read_cached(cache, off_off, &e->off_wrap,
->-                              sizeof(e->off_wrap));
->-    virtio_tswap16s(vdev, &e->off_wrap);
->
->which does a byte read of 2 bytes and then swaps the bytes
->depending on the host endianness and the value of
->virtio_access_is_big_endian()
->
->to this:
->
->+    e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
->
->virtio_lduw_phys_cached() is a small function which calls
->either lduw_be_phys_cached() or lduw_le_phys_cached()
->depending on the value of virtio_access_is_big_endian().
->(And lduw_be_phys_cached() and lduw_le_phys_cached() do
->the right thing for the host-endianness to do a "load
->a specifically big or little endian 16-bit value".)
->
->Which is to say that because we use a load/store function that's
->explicit about the size of the data type it is accessing, the
->function itself can handle doing the load as big or little
->endian, rather than the calling code having to do a manual swap after
->it has done a load-as-bag-of-bytes. This is generally preferable
->as it's less error-prone.
-
-Thanks for the details!
-
-So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?
-
-I mean:
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
-index 893a072c9d..2e5e67bdb9 100644
---- a/hw/virtio/virtio.c
-+++ b/hw/virtio/virtio.c
-@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
-      /* Make sure flags is seen before off_wrap */
-      smp_rmb();
-      e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
--    virtio_tswap16s(vdev, &e->flags);
-  }
-That definitely looks like it's probably not correct...
-Yeah, I just sent that patch:
-20240701075208.19634-1-sgarzare@redhat.com
-">https://lore.kernel.org/qemu-devel/
-20240701075208.19634-1-sgarzare@redhat.com
-We can continue the discussion there.
-
-Thanks,
-Stefano
-
diff --git a/results/classifier/008/other/43643137 b/results/classifier/008/other/43643137
deleted file mode 100644
index 6604b50af..000000000
--- a/results/classifier/008/other/43643137
+++ /dev/null
@@ -1,548 +0,0 @@
-KVM: 0.794
-performance: 0.784
-other: 0.781
-debug: 0.775
-semantic: 0.764
-device: 0.760
-permissions: 0.755
-PID: 0.742
-vnc: 0.742
-graphic: 0.721
-network: 0.709
-socket: 0.674
-boot: 0.652
-files: 0.612
-
-[Qemu-devel] [BUG/RFC] INIT IPI lost when VM starts
-
-Hi,
-We encountered a problem that when a domain starts, seabios failed to online a 
-vCPU.
-
-After investigation, we found that the reason is in kvm-kmod, KVM_APIC_INIT bit 
-in
-vcpu->arch.apic->pending_events was overwritten by qemu, and thus an INIT IPI 
-sent
-to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp command 
-to qemu
-on VM start.
-
-In qemu, qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
-do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from kvm-kmod and
-sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
-kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus pending_events is
-overwritten by qemu.
-
-I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true after 
-‘query-cpus’,
-and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am not sure 
-whether
-it is OK for qemu to set cpu->kvm_vcpu_dirty in do_kvm_cpu_synchronize_state in 
-each caller.
-
-What’s your opinion?
-
-Let me clarify it more clearly. Time sequence is that qemu handles ‘query-cpus’ qmp 
-command, vcpu 1 (and vcpu 0) got registers from kvm-kmod (qmp_query_cpus-> 
-cpu_synchronize_state-> kvm_cpu_synchronize_state->
-> do_kvm_cpu_synchronize_state-> kvm_arch_get_registers), then vcpu 0 (BSP) 
-sends INIT-SIPI to vcpu 1(AP). In kvm-kmod, vcpu 1’s pending_events’s KVM_APIC_INIT 
-bit set.
-Then vcpu 1 continue running, vcpu1 thread in qemu calls 
-kvm_arch_put_registers-> kvm_put_vcpu_events, so KVM_APIC_INIT bit in vcpu 1’s 
-pending_events got cleared, i.e., lost.
-
-In kvm-kmod, except for pending_events, sipi_vector may also be overwritten., 
-so I am not sure if there are other fields/registers in danger, i.e., those may 
-be modified asynchronously with vcpu thread itself.
-
-BTW, using a sleep like following can reliably reproduce this problem, if VM 
-equipped with more than 2 vcpus and starting VM using libvirtd.
-
-diff --git a/target/i386/kvm.c b/target/i386/kvm.c
-index 55865db..5099290 100644
---- a/target/i386/kvm.c
-+++ b/target/i386/kvm.c
-@@ -2534,6 +2534,11 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
-             KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR;
-     }
-
-+    if (CPU(cpu)->cpu_index == 1) {
-+        fprintf(stderr, "vcpu 1 sleep!!!!\n");
-+        sleep(10);
-+    }
-+
-     return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events);
- }
-
-
-On 2017/3/20 22:21, Herongguang (Stephen) wrote:
-Hi,
-We encountered a problem that when a domain starts, seabios failed to online a 
-vCPU.
-
-After investigation, we found that the reason is in kvm-kmod, KVM_APIC_INIT bit 
-in
-vcpu->arch.apic->pending_events was overwritten by qemu, and thus an INIT IPI 
-sent
-to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp command 
-to qemu
-on VM start.
-
-In qemu, qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
-do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from kvm-kmod and
-sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
-kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus pending_events is
-overwritten by qemu.
-
-I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true after 
-‘query-cpus’,
-and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am not sure 
-whether
-it is OK for qemu to set cpu->kvm_vcpu_dirty in do_kvm_cpu_synchronize_state in 
-each caller.
-
-What’s your opinion?
-
-On 20/03/2017 15:21, Herongguang (Stephen) wrote:
->
->
-We encountered a problem that when a domain starts, seabios failed to
->
-online a vCPU.
->
->
-After investigation, we found that the reason is in kvm-kmod,
->
-KVM_APIC_INIT bit in
->
-vcpu->arch.apic->pending_events was overwritten by qemu, and thus an
->
-INIT IPI sent
->
-to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp
->
-command to qemu
->
-on VM start.
->
->
-In qemu, qmp_query_cpus-> cpu_synchronize_state->
->
-kvm_cpu_synchronize_state->
->
-do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from
->
-kvm-kmod and
->
-sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
->
-kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus
->
-pending_events is
->
-overwritten by qemu.
->
->
-I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true
->
-after ‘query-cpus’,
->
-and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am
->
-not sure whether
->
-it is OK for qemu to set cpu->kvm_vcpu_dirty in
->
-do_kvm_cpu_synchronize_state in each caller.
->
->
-What’s your opinion?
-Hi Rongguang,
-
-sorry for the late response.
-
-Where exactly is KVM_APIC_INIT dropped?  kvm_get_mp_state does clear the
-bit, but the result of the INIT is stored in mp_state.
-
-kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves
-KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes
-it back.  Maybe it should ignore events.smi.latched_init if not in SMM,
-but I would like to understand the exact sequence of events.
-
-Thanks,
-
-paolo
-
-On 2017/4/6 0:16, Paolo Bonzini wrote:
-On 20/03/2017 15:21, Herongguang (Stephen) wrote:
-We encountered a problem that when a domain starts, seabios failed to
-online a vCPU.
-
-After investigation, we found that the reason is in kvm-kmod,
-KVM_APIC_INIT bit in
-vcpu->arch.apic->pending_events was overwritten by qemu, and thus an
-INIT IPI sent
-to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp
-command to qemu
-on VM start.
-
-In qemu, qmp_query_cpus-> cpu_synchronize_state->
-kvm_cpu_synchronize_state->
-do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from
-kvm-kmod and
-sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
-kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus
-pending_events is
-overwritten by qemu.
-
-I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true
-after ‘query-cpus’,
-and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am
-not sure whether
-it is OK for qemu to set cpu->kvm_vcpu_dirty in
-do_kvm_cpu_synchronize_state in each caller.
-
-What’s your opinion?
-Hi Rongguang,
-
-sorry for the late response.
-
-Where exactly is KVM_APIC_INIT dropped?  kvm_get_mp_state does clear the
-bit, but the result of the INIT is stored in mp_state.
-It's dropped in KVM_SET_VCPU_EVENTS, see below.
-kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves
-KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes
-it back.  Maybe it should ignore events.smi.latched_init if not in SMM,
-but I would like to understand the exact sequence of events.
-time0:
-vcpu1:
-qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
-> do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to true)-> 
-kvm_arch_get_registers(KVM_APIC_INIT bit in vcpu->arch.apic->pending_events was not set)
-
-time1:
-vcpu0:
-send INIT-SIPI to all AP->(in vcpu 0's context)__apic_accept_irq(KVM_APIC_INIT bit 
-in vcpu1's arch.apic->pending_events is set)
-
-time2:
-vcpu1:
-kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is 
-true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten KVM_APIC_INIT bit in 
-vcpu->arch.apic->pending_events!)
-
-So it's a race between vcpu1 get/put registers with kvm/other vcpus changing 
-vcpu1's status/structure fields in the mean time, I am in worry of if there are 
-other fields may be overwritten,
-sipi_vector is one.
-
-also see:
-https://www.mail-archive.com/address@hidden/msg438675.html
-Thanks,
-
-paolo
-
-.
-
-Hi Paolo,
-
-What's your opinion about this patch? We found it just before finishing patches 
-for the past two days.
-
-
-Thanks,
--Gonglei
-
-
->
------Original Message-----
->
-From: address@hidden [
-mailto:address@hidden
-On
->
-Behalf Of Herongguang (Stephen)
->
-Sent: Thursday, April 06, 2017 9:47 AM
->
-To: Paolo Bonzini; address@hidden; address@hidden;
->
-address@hidden; address@hidden; address@hidden;
->
-wangxin (U); Huangweidong (C)
->
-Subject: Re: [BUG/RFC] INIT IPI lost when VM starts
->
->
->
->
-On 2017/4/6 0:16, Paolo Bonzini wrote:
->
->
->
-> On 20/03/2017 15:21, Herongguang (Stephen) wrote:
->
->> We encountered a problem that when a domain starts, seabios failed to
->
->> online a vCPU.
->
->>
->
->> After investigation, we found that the reason is in kvm-kmod,
->
->> KVM_APIC_INIT bit in
->
->> vcpu->arch.apic->pending_events was overwritten by qemu, and thus an
->
->> INIT IPI sent
->
->> to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp
->
->> command to qemu
->
->> on VM start.
->
->>
->
->> In qemu, qmp_query_cpus-> cpu_synchronize_state->
->
->> kvm_cpu_synchronize_state->
->
->> do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from
->
->> kvm-kmod and
->
->> sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
->
->> kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus
->
->> pending_events is
->
->> overwritten by qemu.
->
->>
->
->> I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true
->
->> after ‘query-cpus’,
->
->> and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am
->
->> not sure whether
->
->> it is OK for qemu to set cpu->kvm_vcpu_dirty in
->
->> do_kvm_cpu_synchronize_state in each caller.
->
->>
->
->> What’s your opinion?
->
-> Hi Rongguang,
->
->
->
-> sorry for the late response.
->
->
->
-> Where exactly is KVM_APIC_INIT dropped?  kvm_get_mp_state does clear
->
-the
->
-> bit, but the result of the INIT is stored in mp_state.
->
->
-It's dropped in KVM_SET_VCPU_EVENTS, see below.
->
->
->
->
-> kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves
->
-> KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes
->
-> it back.  Maybe it should ignore events.smi.latched_init if not in SMM,
->
-> but I would like to understand the exact sequence of events.
->
->
-time0:
->
-vcpu1:
->
-qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
->
-> do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to
->
-true)-> kvm_arch_get_registers(KVM_APIC_INIT bit in
->
-vcpu->arch.apic->pending_events was not set)
->
->
-time1:
->
-vcpu0:
->
-send INIT-SIPI to all AP->(in vcpu 0's
->
-context)__apic_accept_irq(KVM_APIC_INIT bit in vcpu1's
->
-arch.apic->pending_events is set)
->
->
-time2:
->
-vcpu1:
->
-kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is
->
-true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten
->
-KVM_APIC_INIT bit in vcpu->arch.apic->pending_events!)
->
->
-So it's a race between vcpu1 get/put registers with kvm/other vcpus changing
->
-vcpu1's status/structure fields in the mean time, I am in worry of if there
->
-are
->
-other fields may be overwritten,
->
-sipi_vector is one.
->
->
-also see:
->
-https://www.mail-archive.com/address@hidden/msg438675.html
->
->
-> Thanks,
->
->
->
-> paolo
->
->
->
-> .
->
->
->
-
-2017-11-20 06:57+0000, Gonglei (Arei):
->
-Hi Paolo,
->
->
-What's your opinion about this patch? We found it just before finishing
->
-patches
->
-for the past two days.
-I think your case was fixed by f4ef19108608 ("KVM: X86: Fix loss of
-pending INIT due to race"), but that patch didn't fix it perfectly, so
-maybe you're hitting a similar case that happens in SMM ...
-
->
-> -----Original Message-----
->
-> From: address@hidden [
-mailto:address@hidden
-On
->
-> Behalf Of Herongguang (Stephen)
->
-> On 2017/4/6 0:16, Paolo Bonzini wrote:
->
-> > Hi Rongguang,
->
-> >
->
-> > sorry for the late response.
->
-> >
->
-> > Where exactly is KVM_APIC_INIT dropped?  kvm_get_mp_state does clear
->
-> the
->
-> > bit, but the result of the INIT is stored in mp_state.
->
->
->
-> It's dropped in KVM_SET_VCPU_EVENTS, see below.
->
->
->
-> >
->
-> > kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves
->
-> > KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes
->
-> > it back.  Maybe it should ignore events.smi.latched_init if not in SMM,
->
-> > but I would like to understand the exact sequence of events.
->
->
->
-> time0:
->
-> vcpu1:
->
-> qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
->
->  > do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to
->
-> true)-> kvm_arch_get_registers(KVM_APIC_INIT bit in
->
-> vcpu->arch.apic->pending_events was not set)
->
->
->
-> time1:
->
-> vcpu0:
->
-> send INIT-SIPI to all AP->(in vcpu 0's
->
-> context)__apic_accept_irq(KVM_APIC_INIT bit in vcpu1's
->
-> arch.apic->pending_events is set)
->
->
->
-> time2:
->
-> vcpu1:
->
-> kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is
->
-> true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten
->
-> KVM_APIC_INIT bit in vcpu->arch.apic->pending_events!)
->
->
->
-> So it's a race between vcpu1 get/put registers with kvm/other vcpus changing
->
-> vcpu1's status/structure fields in the mean time, I am in worry of if there
->
-> are
->
-> other fields may be overwritten,
->
-> sipi_vector is one.
-Fields that can be asynchronously written by other VCPUs (like SIPI,
-NMI) must not be SET if other VCPUs were not paused since the last GET.
-(Looking at the interface, we can currently lose pending SMI.)
-
-INIT is one of the restricted fields, but the API unconditionally
-couples SMM with latched INIT, which means that we can lose an INIT if
-the VCPU is in SMM mode -- do you see SMM in kvm_vcpu_events?
-
-Thanks.
-
diff --git a/results/classifier/008/other/50773216 b/results/classifier/008/other/50773216
deleted file mode 100644
index b7f8c0d20..000000000
--- a/results/classifier/008/other/50773216
+++ /dev/null
@@ -1,120 +0,0 @@
-permissions: 0.813
-device: 0.764
-other: 0.737
-graphic: 0.723
-semantic: 0.669
-files: 0.666
-debug: 0.659
-vnc: 0.656
-socket: 0.652
-boot: 0.637
-PID: 0.636
-performance: 0.628
-network: 0.606
-KVM: 0.601
-
-[Qemu-devel] Can I have someone's feedback on [bug 1809075] Concurrency bug on keyboard events: capslock LED messing up keycode streams causes character misses at guest kernel
-
-Hi everyone.
-Can I please have someone's feedback on this bug?
-https://bugs.launchpad.net/qemu/+bug/1809075
-Briefly, guest OS loses characters sent to it via vnc. And I spot the
-bug in relation to ps2 driver.
-I'm thinking of possible fixes and I might want to use a memory barrier.
-But I would really like to have some suggestion from a qemu developer
-first. For example, can we brutally drop capslock LED key events in ps2
-queue?
-It is actually relevant to openQA, an automated QA tool for openSUSE.
-And this bug blocks a few test cases for us.
-Thank you in advance!
-
-Kind regards,
-Gao Zhiyuan
-
-Cc'ing Marc-André & Gerd.
-
-On 12/19/18 10:31 AM, Gao Zhiyuan wrote:
->
-Hi everyone.
->
->
-Can I please have someone's feedback on this bug?
->
-https://bugs.launchpad.net/qemu/+bug/1809075
->
-Briefly, guest OS loses characters sent to it via vnc. And I spot the
->
-bug in relation to ps2 driver.
->
->
-I'm thinking of possible fixes and I might want to use a memory barrier.
->
-But I would really like to have some suggestion from a qemu developer
->
-first. For example, can we brutally drop capslock LED key events in ps2
->
-queue?
->
->
-It is actually relevant to openQA, an automated QA tool for openSUSE.
->
-And this bug blocks a few test cases for us.
->
->
-Thank you in advance!
->
->
-Kind regards,
->
-Gao Zhiyuan
->
-
-On Thu, Jan 03, 2019 at 12:05:54PM +0100, Philippe Mathieu-Daudé wrote:
->
-Cc'ing Marc-André & Gerd.
->
->
-On 12/19/18 10:31 AM, Gao Zhiyuan wrote:
->
-> Hi everyone.
->
->
->
-> Can I please have someone's feedback on this bug?
->
->
-https://bugs.launchpad.net/qemu/+bug/1809075
->
-> Briefly, guest OS loses characters sent to it via vnc. And I spot the
->
-> bug in relation to ps2 driver.
->
->
->
-> I'm thinking of possible fixes and I might want to use a memory barrier.
->
-> But I would really like to have some suggestion from a qemu developer
->
-> first. For example, can we brutally drop capslock LED key events in ps2
->
-> queue?
-There is no "capslock LED key event".  0xfa is KBD_REPLY_ACK, and the
-device queues it in response to guest port writes.  Yes, the ack can
-race with actual key events.  But IMO that isn't a bug in qemu.
-
-Probably the linux kernel just throws away everything until it got the
-ack for the port write, and that way the key event gets lost.  On
-physical hardware you will not notice because it is next to impossible
-to type fast enough to hit the race window.
-
-So, go fix the kernel.
-
-Alternatively fix vncdotool to send uppercase letters properly with
-shift key pressed.  Then qemu wouldn't generate capslock key events
-(that happens because qemu thinks guest and host capslock state is out
-of sync) and the guests's capslock led update request wouldn't get into
-the way.
-
-cheers,
-  Gerd
-
diff --git a/results/classifier/008/other/55367348 b/results/classifier/008/other/55367348
deleted file mode 100644
index 571cb425b..000000000
--- a/results/classifier/008/other/55367348
+++ /dev/null
@@ -1,542 +0,0 @@
-other: 0.626
-permissions: 0.595
-device: 0.586
-PID: 0.559
-semantic: 0.555
-performance: 0.546
-graphic: 0.532
-network: 0.518
-debug: 0.516
-socket: 0.501
-files: 0.490
-boot: 0.486
-KVM: 0.470
-vnc: 0.465
-
-[Qemu-devel] [Bug] Docs build fails at interop.rst
-
-https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
-running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
-(Rawhide)
-
-uname - a
-Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
-UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
-
-Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
-allows for the build to occur
-
-Regards
-Aarushi Mehta
-
-On 5/20/19 7:30 AM, Aarushi Mehta wrote:
->
-https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
->
-running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
->
-(Rawhide)
->
->
-uname - a
->
-Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
->
-UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
->
->
-Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
->
-allows for the build to occur
->
->
-Regards
->
-Aarushi Mehta
->
->
-Ah, dang. The blocks aren't strictly conforming json, but the version I
-tested this under didn't seem to care. Your version is much newer. (I
-was using 1.7 as provided by Fedora 29.)
-
-For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
-which should at least turn off the "warnings as errors" option, but I
-don't think that reverting -n will turn off this warning.
-
-I'll try to get ahold of this newer version and see if I can't fix it
-more appropriately.
-
---js
-
-On 5/20/19 12:37 PM, John Snow wrote:
->
->
->
-On 5/20/19 7:30 AM, Aarushi Mehta wrote:
->
->
-https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
->
-> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
->
-> (Rawhide)
->
->
->
-> uname - a
->
-> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
->
-> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
->
->
->
-> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
->
-> allows for the build to occur
->
->
->
-> Regards
->
-> Aarushi Mehta
->
->
->
->
->
->
-Ah, dang. The blocks aren't strictly conforming json, but the version I
->
-tested this under didn't seem to care. Your version is much newer. (I
->
-was using 1.7 as provided by Fedora 29.)
->
->
-For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
->
-which should at least turn off the "warnings as errors" option, but I
->
-don't think that reverting -n will turn off this warning.
->
->
-I'll try to get ahold of this newer version and see if I can't fix it
->
-more appropriately.
->
->
---js
->
-...Sigh, okay.
-
-So, I am still not actually sure what changed from pygments 2.2 and
-sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx
-by default always tries to do add a filter to the pygments lexer that
-raises an error on highlighting failure, instead of the default behavior
-which is to just highlight those errors in the output. There is no
-option to Sphinx that I am aware of to retain this lexing behavior.
-(Effectively, it's strict or nothing.)
-
-This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the
-build works with our malformed json.
-
-There are a few options:
-
-1. Update conf.py to ignore these warnings (and all future lexing
-errors), and settle for the fact that there will be no QMP highlighting
-wherever we use the directionality indicators ('->', '<-').
-
-2. Update bitmaps.rst to remove the directionality indicators.
-
-3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON.
-
-4. Update bitmaps.rst to remove the "json" specification from the code
-block. This will cause sphinx to "guess" the formatting, and the
-pygments guesser will decide it's Python3.
-
-This will parse well enough, but will mis-highlight 'true' and 'false'
-which are not python keywords. This approach may break in the future if
-the Python3 lexer is upgraded to be stricter (because '->' and '<-' are
-still invalid), and leaves us at the mercy of both the guesser and the
-lexer.
-
-I'm not actually sure what I dislike the least; I think I dislike #1 the
-most. #4 gets us most of what we want but is perhaps porcelain.
-
-I suspect if we attempt to move more of our documentation to ReST and
-Sphinx that we will need to answer for ourselves how we intend to
-document QMP code flow examples.
-
---js
-
-On Mon, May 20, 2019 at 05:25:28PM -0400, John Snow wrote:
->
->
->
-On 5/20/19 12:37 PM, John Snow wrote:
->
->
->
->
->
-> On 5/20/19 7:30 AM, Aarushi Mehta wrote:
->
->>
-https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
->
->> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
->
->> (Rawhide)
->
->>
->
->> uname - a
->
->> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
->
->> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
->
->>
->
->> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
->
->> allows for the build to occur
->
->>
->
->> Regards
->
->> Aarushi Mehta
->
->>
->
->>
->
->
->
-> Ah, dang. The blocks aren't strictly conforming json, but the version I
->
-> tested this under didn't seem to care. Your version is much newer. (I
->
-> was using 1.7 as provided by Fedora 29.)
->
->
->
-> For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
->
-> which should at least turn off the "warnings as errors" option, but I
->
-> don't think that reverting -n will turn off this warning.
->
->
->
-> I'll try to get ahold of this newer version and see if I can't fix it
->
-> more appropriately.
->
->
->
-> --js
->
->
->
->
-...Sigh, okay.
->
->
-So, I am still not actually sure what changed from pygments 2.2 and
->
-sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx
->
-by default always tries to do add a filter to the pygments lexer that
->
-raises an error on highlighting failure, instead of the default behavior
->
-which is to just highlight those errors in the output. There is no
->
-option to Sphinx that I am aware of to retain this lexing behavior.
->
-(Effectively, it's strict or nothing.)
->
->
-This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the
->
-build works with our malformed json.
->
->
-There are a few options:
->
->
-1. Update conf.py to ignore these warnings (and all future lexing
->
-errors), and settle for the fact that there will be no QMP highlighting
->
-wherever we use the directionality indicators ('->', '<-').
->
->
-2. Update bitmaps.rst to remove the directionality indicators.
->
->
-3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON.
->
->
-4. Update bitmaps.rst to remove the "json" specification from the code
->
-block. This will cause sphinx to "guess" the formatting, and the
->
-pygments guesser will decide it's Python3.
->
->
-This will parse well enough, but will mis-highlight 'true' and 'false'
->
-which are not python keywords. This approach may break in the future if
->
-the Python3 lexer is upgraded to be stricter (because '->' and '<-' are
->
-still invalid), and leaves us at the mercy of both the guesser and the
->
-lexer.
->
->
-I'm not actually sure what I dislike the least; I think I dislike #1 the
->
-most. #4 gets us most of what we want but is perhaps porcelain.
->
->
-I suspect if we attempt to move more of our documentation to ReST and
->
-Sphinx that we will need to answer for ourselves how we intend to
->
-document QMP code flow examples.
-Writing a custom lexer that handles "<-" and "->" was simple (see below).
-
-Now, is it possible to convince Sphinx to register and use a custom lexer?
-
-$ cat > /tmp/lexer.py <<EOF
-from pygments.lexer import RegexLexer, DelegatingLexer
-from pygments.lexers.data import JsonLexer
-import re
-from pygments.token import *
-
-class QMPExampleMarkersLexer(RegexLexer):
-    tokens = {
-        'root': [
-            (r' *-> *', Generic.Prompt),
-            (r' *<- *', Generic.Output),
-        ]
-    }
-
-class QMPExampleLexer(DelegatingLexer):
-    def __init__(self, **options):
-        super(QMPExampleLexer, self).__init__(JsonLexer, 
-QMPExampleMarkersLexer, Error, **options)
-EOF
-$ pygmentize -l /tmp/lexer.py:QMPExampleLexer -x -f html <<EOF
-    -> {
-         "execute": "drive-backup",
-         "arguments": {
-           "device": "drive0",
-           "bitmap": "bitmap0",
-           "target": "drive0.inc0.qcow2",
-           "format": "qcow2",
-           "sync": "incremental",
-           "mode": "existing"
-         }
-       }
-
-    <- { "return": {} }
-EOF
-<div class="highlight"><pre><span></span><span class="gp">    -&gt; 
-</span><span class="p">{</span>
-         <span class="nt">&quot;execute&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;drive-backup&quot;</span><span class="p">,</span>
-         <span class="nt">&quot;arguments&quot;</span><span class="p">:</span> 
-<span class="p">{</span>
-           <span class="nt">&quot;device&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;drive0&quot;</span><span class="p">,</span>
-           <span class="nt">&quot;bitmap&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;bitmap0&quot;</span><span class="p">,</span>
-           <span class="nt">&quot;target&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;drive0.inc0.qcow2&quot;</span><span class="p">,</span>
-           <span class="nt">&quot;format&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;qcow2&quot;</span><span class="p">,</span>
-           <span class="nt">&quot;sync&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;incremental&quot;</span><span class="p">,</span>
-           <span class="nt">&quot;mode&quot;</span><span class="p">:</span> 
-<span class="s2">&quot;existing&quot;</span>
-         <span class="p">}</span>
-       <span class="p">}</span>
-
-<span class="go">    &lt;- </span><span class="p">{</span> <span 
-class="nt">&quot;return&quot;</span><span class="p">:</span> <span 
-class="p">{}</span> <span class="p">}</span>
-</pre></div>
-$ 
-
-
--- 
-Eduardo
-
-On 5/20/19 7:04 PM, Eduardo Habkost wrote:
->
-On Mon, May 20, 2019 at 05:25:28PM -0400, John Snow wrote:
->
->
->
->
->
-> On 5/20/19 12:37 PM, John Snow wrote:
->
->>
->
->>
->
->> On 5/20/19 7:30 AM, Aarushi Mehta wrote:
->
->>>
-https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw
->
->>> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31
->
->>> (Rawhide)
->
->>>
->
->>> uname - a
->
->>> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32
->
->>> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
->
->>>
->
->>> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431
->
->>> allows for the build to occur
->
->>>
->
->>> Regards
->
->>> Aarushi Mehta
->
->>>
->
->>>
->
->>
->
->> Ah, dang. The blocks aren't strictly conforming json, but the version I
->
->> tested this under didn't seem to care. Your version is much newer. (I
->
->> was using 1.7 as provided by Fedora 29.)
->
->>
->
->> For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead,
->
->> which should at least turn off the "warnings as errors" option, but I
->
->> don't think that reverting -n will turn off this warning.
->
->>
->
->> I'll try to get ahold of this newer version and see if I can't fix it
->
->> more appropriately.
->
->>
->
->> --js
->
->>
->
->
->
-> ...Sigh, okay.
->
->
->
-> So, I am still not actually sure what changed from pygments 2.2 and
->
-> sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx
->
-> by default always tries to do add a filter to the pygments lexer that
->
-> raises an error on highlighting failure, instead of the default behavior
->
-> which is to just highlight those errors in the output. There is no
->
-> option to Sphinx that I am aware of to retain this lexing behavior.
->
-> (Effectively, it's strict or nothing.)
->
->
->
-> This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the
->
-> build works with our malformed json.
->
->
->
-> There are a few options:
->
->
->
-> 1. Update conf.py to ignore these warnings (and all future lexing
->
-> errors), and settle for the fact that there will be no QMP highlighting
->
-> wherever we use the directionality indicators ('->', '<-').
->
->
->
-> 2. Update bitmaps.rst to remove the directionality indicators.
->
->
->
-> 3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON.
->
->
->
-> 4. Update bitmaps.rst to remove the "json" specification from the code
->
-> block. This will cause sphinx to "guess" the formatting, and the
->
-> pygments guesser will decide it's Python3.
->
->
->
-> This will parse well enough, but will mis-highlight 'true' and 'false'
->
-> which are not python keywords. This approach may break in the future if
->
-> the Python3 lexer is upgraded to be stricter (because '->' and '<-' are
->
-> still invalid), and leaves us at the mercy of both the guesser and the
->
-> lexer.
->
->
->
-> I'm not actually sure what I dislike the least; I think I dislike #1 the
->
-> most. #4 gets us most of what we want but is perhaps porcelain.
->
->
->
-> I suspect if we attempt to move more of our documentation to ReST and
->
-> Sphinx that we will need to answer for ourselves how we intend to
->
-> document QMP code flow examples.
->
->
-Writing a custom lexer that handles "<-" and "->" was simple (see below).
->
->
-Now, is it possible to convince Sphinx to register and use a custom lexer?
->
-Spoilers, yes, and I've sent a patch to list. Thanks for your help!
-
diff --git a/results/classifier/008/other/55753058 b/results/classifier/008/other/55753058
deleted file mode 100644
index 1e6b20aa7..000000000
--- a/results/classifier/008/other/55753058
+++ /dev/null
@@ -1,303 +0,0 @@
-other: 0.734
-KVM: 0.713
-vnc: 0.682
-graphic: 0.630
-device: 0.623
-debug: 0.611
-performance: 0.591
-permissions: 0.580
-semantic: 0.577
-network: 0.525
-PID: 0.512
-boot: 0.478
-socket: 0.462
-files: 0.459
-
-[RESEND][BUG FIX HELP] QEMU main thread endlessly hangs in __ppoll()
-
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may still
-exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-
-The qemu main thread endlessly hangs in the handle of the qmp statement:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-and we have the call trace looks like:
-#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
-timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
-sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
-#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
-__nfds=<optimized out>, __fds=<optimized out>)
-at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
-#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
-timeout=<optimized out>) at util/qemu-timer.c:348
-#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
-blocking=blocking@entry=true) at util/aio-posix.c:669
-#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
-ignore_bds_parents=false, parent=0x0, recursive=false,
-bs=0x55561138b0a0) at block/io.c:430
-#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
-parent=0x0, ignore_bds_parents=<optimized out>,
-poll=<optimized out>) at block/io.c:396
-#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
-child=0x7f36dc0ce380, errp=<optimized out>)
-at block/quorum.c:1063
-#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
-"colo-disk0", has_child=<optimized out>,
-child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
-errp=0x7ffc56c66f98) at blockdev.c:4494
-#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
-out>, ret=<optimized out>, errp=0x7ffc56c67018)
-at qapi/qapi-commands-block-core.c:1538
-#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
-allow_oob=<optimized out>, request=<optimized out>,
-cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
-#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
-out>, allow_oob=<optimized out>)
-at qapi/qmp-dispatch.c:175
-#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
-req=<optimized out>) at monitor/qmp.c:145
-#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized out>)
-at monitor/qmp.c:234
-#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
-util/async.c:117
-#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
-#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
-util/aio-posix.c:459
-#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
-callback=<optimized out>, user_data=<optimized out>)
-at util/async.c:260
-#17 0x00007f3c22302fbd in g_main_context_dispatch () from
-/lib/x86_64-linux-gnu/libglib-2.0.so.0
-#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
-#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
-#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
-#21 0x000055560ff600fe in main_loop () at vl.c:1814
-#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized out>,
-envp=<optimized out>) at vl.c:4503
-We found that we're doing endless check in the line of
-block/io.c:bdrv_do_drained_begin():
-BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
-and it turns out that the bdrv_drain_poll() always get true from:
-- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
-- AND atomic_read(&bs->in_flight)
-
-I personally think this is a deadlock issue in the a QEMU block layer
-(as we know, we have some #FIXME comments in related codes, such as block
-permisson update).
-Any comments are welcome and appreciated.
-
----
-thx,likexu
-
-On 2/28/21 9:39 PM, Like Xu wrote:
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
-still exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-Do you have a test case that reproduces on 5.2? It'd be nice to know if
-it was still a problem in the latest source tree or not.
---js
-The qemu main thread endlessly hangs in the handle of the qmp statement:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-and we have the call trace looks like:
-#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
-timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
-sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
-#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
-__nfds=<optimized out>, __fds=<optimized out>)
-at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
-#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
-timeout=<optimized out>) at util/qemu-timer.c:348
-#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
-blocking=blocking@entry=true) at util/aio-posix.c:669
-#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
-ignore_bds_parents=false, parent=0x0, recursive=false,
-bs=0x55561138b0a0) at block/io.c:430
-#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
-parent=0x0, ignore_bds_parents=<optimized out>,
-poll=<optimized out>) at block/io.c:396
-#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
-child=0x7f36dc0ce380, errp=<optimized out>)
-at block/quorum.c:1063
-#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
-"colo-disk0", has_child=<optimized out>,
-child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
-errp=0x7ffc56c66f98) at blockdev.c:4494
-#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
-out>, ret=<optimized out>, errp=0x7ffc56c67018)
-at qapi/qapi-commands-block-core.c:1538
-#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
-allow_oob=<optimized out>, request=<optimized out>,
-cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
-#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
-out>, allow_oob=<optimized out>)
-at qapi/qmp-dispatch.c:175
-#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
-req=<optimized out>) at monitor/qmp.c:145
-#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized
-out>) at monitor/qmp.c:234
-#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
-util/async.c:117
-#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
-#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
-util/aio-posix.c:459
-#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
-callback=<optimized out>, user_data=<optimized out>)
-at util/async.c:260
-#17 0x00007f3c22302fbd in g_main_context_dispatch () from
-/lib/x86_64-linux-gnu/libglib-2.0.so.0
-#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
-#19 os_host_main_loop_wait (timeout=<optimized out>) at
-util/main-loop.c:242
-#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
-#21 0x000055560ff600fe in main_loop () at vl.c:1814
-#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized
-out>, envp=<optimized out>) at vl.c:4503
-We found that we're doing endless check in the line of
-block/io.c:bdrv_do_drained_begin():
-    BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
-and it turns out that the bdrv_drain_poll() always get true from:
-- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
-- AND atomic_read(&bs->in_flight)
-
-I personally think this is a deadlock issue in the a QEMU block layer
-(as we know, we have some #FIXME comments in related codes, such as
-block permisson update).
-Any comments are welcome and appreciated.
-
----
-thx,likexu
-
-Hi John,
-
-Thanks for your comment.
-
-On 2021/3/5 7:53, John Snow wrote:
-On 2/28/21 9:39 PM, Like Xu wrote:
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
-still exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-Do you have a test case that reproduces on 5.2? It'd be nice to know if it
-was still a problem in the latest source tree or not.
-We narrowed down the source of the bug, which basically came from
-the following qmp usage:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-One of the test cases is the COLO usage (docs/colo-proxy.txt).
-
-This issue is sporadic,the probability may be 1/15 for a io-heavy guest.
-
-I believe it's reproducible on 5.2 and the latest tree.
---js
-The qemu main thread endlessly hangs in the handle of the qmp statement:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-and we have the call trace looks like:
-#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1,
-timeout=<optimized out>, timeout@entry=0x7ffc56c66db0,
-sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
-#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0,
-__nfds=<optimized out>, __fds=<optimized out>)
-at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
-#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
-timeout=<optimized out>) at util/qemu-timer.c:348
-#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0,
-blocking=blocking@entry=true) at util/aio-posix.c:669
-#4 0x000055561019268d in bdrv_do_drained_begin (poll=true,
-ignore_bds_parents=false, parent=0x0, recursive=false,
-bs=0x55561138b0a0) at block/io.c:430
-#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>,
-parent=0x0, ignore_bds_parents=<optimized out>,
-poll=<optimized out>) at block/io.c:396
-#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0,
-child=0x7f36dc0ce380, errp=<optimized out>)
-at block/quorum.c:1063
-#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120
-"colo-disk0", has_child=<optimized out>,
-child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0,
-errp=0x7ffc56c66f98) at blockdev.c:4494
-#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized
-out>, ret=<optimized out>, errp=0x7ffc56c67018)
-at qapi/qapi-commands-block-core.c:1538
-#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010,
-allow_oob=<optimized out>, request=<optimized out>,
-cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132
-#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized
-out>, allow_oob=<optimized out>)
-at qapi/qmp-dispatch.c:175
-#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40,
-req=<optimized out>) at monitor/qmp.c:145
-#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized
-out>) at monitor/qmp.c:234
-#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at
-util/async.c:117
-#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117
-#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at
-util/aio-posix.c:459
-#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>,
-callback=<optimized out>, user_data=<optimized out>)
-at util/async.c:260
-#17 0x00007f3c22302fbd in g_main_context_dispatch () from
-/lib/x86_64-linux-gnu/libglib-2.0.so.0
-#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219
-#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
-#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
-#21 0x000055560ff600fe in main_loop () at vl.c:1814
-#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized
-out>, envp=<optimized out>) at vl.c:4503
-We found that we're doing endless check in the line of
-block/io.c:bdrv_do_drained_begin():
-     BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent));
-and it turns out that the bdrv_drain_poll() always get true from:
-- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)
-- AND atomic_read(&bs->in_flight)
-
-I personally think this is a deadlock issue in the a QEMU block layer
-(as we know, we have some #FIXME comments in related codes, such as block
-permisson update).
-Any comments are welcome and appreciated.
-
----
-thx,likexu
-
-On 3/4/21 10:08 PM, Like Xu wrote:
-Hi John,
-
-Thanks for your comment.
-
-On 2021/3/5 7:53, John Snow wrote:
-On 2/28/21 9:39 PM, Like Xu wrote:
-Hi Genius,
-I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may
-still exist in the mainline.
-Thanks in advance to heroes who can take a look and share understanding.
-Do you have a test case that reproduces on 5.2? It'd be nice to know
-if it was still a problem in the latest source tree or not.
-We narrowed down the source of the bug, which basically came from
-the following qmp usage:
-{'execute': 'human-monitor-command', 'arguments':{ 'command-line':
-'drive_del replication0' } }
-One of the test cases is the COLO usage (docs/colo-proxy.txt).
-
-This issue is sporadic,the probability may be 1/15 for a io-heavy guest.
-
-I believe it's reproducible on 5.2 and the latest tree.
-Can you please test and confirm that this is the case, and then file a
-bug report on the LP:
-https://launchpad.net/qemu
-and include:
-- The exact commit you used (current origin/master debug build would be
-the most ideal.)
-- Which QEMU binary you are using (qemu-system-x86_64?)
-- The shortest command line you are aware of that reproduces the problem
-- The host OS and kernel version
-- An updated call trace
-- Any relevant commands issued prior to the one that caused the hang; or
-detailed reproduction steps if possible.
-Thanks,
---js
-
diff --git a/results/classifier/008/other/55961334 b/results/classifier/008/other/55961334
deleted file mode 100644
index 49ed3d129..000000000
--- a/results/classifier/008/other/55961334
+++ /dev/null
@@ -1,49 +0,0 @@
-graphic: 0.909
-KVM: 0.881
-semantic: 0.775
-files: 0.766
-device: 0.764
-performance: 0.758
-other: 0.715
-vnc: 0.709
-network: 0.697
-PID: 0.683
-socket: 0.636
-debug: 0.604
-boot: 0.569
-permissions: 0.560
-
-[Bug] "-ht" flag ignored under KVM - guest still reports HT
-
-Hi Community,
-We have observed that the 'ht' feature bit cannot be disabled when QEMU runs
-with KVM acceleration.
-qemu-system-x86_64 \
-  --enable-kvm \
-  -machine q35 \
-  -cpu host,-ht \
-  -smp 4 \
-  -m 4G \
-  -drive file=rootfs.img,format=raw \
-  -nographic \
-  -append 'console=ttyS0 root=/dev/sda rw'
-Because '-ht' is specified, the guest should expose no HT capability
-(cpuid.1.edx[28] = 0), and /proc/cpuinfo shouldn't show HT feature, but we still
-saw ht in linux guest when run 'cat /proc/cpuinfo'.
-XiaoYao mentioned that:
-
-It has been the behavior of QEMU since
-
-  commit 400281af34e5ee6aa9f5496b53d8f82c6fef9319
-  Author: Andre Przywara <andre.przywara@amd.com>
-  Date:   Wed Aug 19 15:42:42 2009 +0200
-
-    set CPUID bits to present cores and threads topology
-
-that we cannot remove HT CPUID bit from guest via "-cpu xxx,-ht" if the
-VM has >= 2 vcpus.
-I'd like to know whether there's a plan to address this issue, or if the current
-behaviour is considered acceptable.
-Best regards,
-Ewan.
-
diff --git a/results/classifier/008/other/56309929 b/results/classifier/008/other/56309929
deleted file mode 100644
index 54419195f..000000000
--- a/results/classifier/008/other/56309929
+++ /dev/null
@@ -1,190 +0,0 @@
-other: 0.690
-device: 0.646
-KVM: 0.636
-performance: 0.608
-vnc: 0.600
-network: 0.589
-permissions: 0.587
-debug: 0.585
-boot: 0.578
-graphic: 0.570
-PID: 0.561
-semantic: 0.521
-socket: 0.516
-files: 0.311
-
-[Qemu-devel] [BUG 2.6] Broken CONFIG_TPM?
-
-A compilation test with clang -Weverything reported this problem:
-
-config-host.h:112:20: warning: '$' in identifier
-[-Wdollar-in-identifier-extension]
-
-The line of code looks like this:
-
-#define CONFIG_TPM $(CONFIG_SOFTMMU)
-
-This is fine for Makefile code, but won't work as expected in C code.
-
-Am 28.04.2016 um 22:33 schrieb Stefan Weil:
->
-A compilation test with clang -Weverything reported this problem:
->
->
-config-host.h:112:20: warning: '$' in identifier
->
-[-Wdollar-in-identifier-extension]
->
->
-The line of code looks like this:
->
->
-#define CONFIG_TPM $(CONFIG_SOFTMMU)
->
->
-This is fine for Makefile code, but won't work as expected in C code.
->
-A complete 64 bit build with clang -Weverything creates a log file of
-1.7 GB.
-Here are the uniq warnings sorted by their frequency:
-
-      1 -Wflexible-array-extensions
-      1 -Wgnu-folding-constant
-      1 -Wunknown-pragmas
-      1 -Wunknown-warning-option
-      1 -Wunreachable-code-loop-increment
-      2 -Warray-bounds-pointer-arithmetic
-      2 -Wdollar-in-identifier-extension
-      3 -Woverlength-strings
-      3 -Wweak-vtables
-      4 -Wgnu-empty-struct
-      4 -Wstring-conversion
-      6 -Wclass-varargs
-      7 -Wc99-extensions
-      7 -Wc++-compat
-      8 -Wfloat-equal
-     11 -Wformat-nonliteral
-     16 -Wshift-negative-value
-     19 -Wglobal-constructors
-     28 -Wc++11-long-long
-     29 -Wembedded-directive
-     38 -Wvla
-     40 -Wcovered-switch-default
-     40 -Wmissing-variable-declarations
-     49 -Wold-style-cast
-     53 -Wgnu-conditional-omitted-operand
-     56 -Wformat-pedantic
-     61 -Wvariadic-macros
-     77 -Wc++11-extensions
-     83 -Wgnu-flexible-array-initializer
-     83 -Wzero-length-array
-     96 -Wgnu-designator
-    102 -Wmissing-noreturn
-    103 -Wconditional-uninitialized
-    107 -Wdisabled-macro-expansion
-    115 -Wunreachable-code-return
-    134 -Wunreachable-code
-    243 -Wunreachable-code-break
-    257 -Wfloat-conversion
-    280 -Wswitch-enum
-    291 -Wpointer-arith
-    298 -Wshadow
-    378 -Wassign-enum
-    395 -Wused-but-marked-unused
-    420 -Wreserved-id-macro
-    493 -Wdocumentation
-    510 -Wshift-sign-overflow
-    565 -Wgnu-case-range
-    566 -Wgnu-zero-variadic-macro-arguments
-    650 -Wbad-function-cast
-    705 -Wmissing-field-initializers
-    817 -Wgnu-statement-expression
-    968 -Wdocumentation-unknown-command
-   1021 -Wextra-semi
-   1112 -Wgnu-empty-initializer
-   1138 -Wcast-qual
-   1509 -Wcast-align
-   1766 -Wextended-offsetof
-   1937 -Wsign-compare
-   2130 -Wpacked
-   2404 -Wunused-macros
-   3081 -Wpadded
-   4182 -Wconversion
-   5430 -Wlanguage-extension-token
-   6655 -Wshorten-64-to-32
-   6995 -Wpedantic
-   7354 -Wunused-parameter
-  27659 -Wsign-conversion
-
-Stefan Weil <address@hidden> writes:
-
->
-A compilation test with clang -Weverything reported this problem:
->
->
-config-host.h:112:20: warning: '$' in identifier
->
-[-Wdollar-in-identifier-extension]
->
->
-The line of code looks like this:
->
->
-#define CONFIG_TPM $(CONFIG_SOFTMMU)
->
->
-This is fine for Makefile code, but won't work as expected in C code.
-Broken in commit 3b8acc1 "configure: fix TPM logic".  Cc'ing Paolo.
-
-Impact: #ifdef CONFIG_TPM never disables code.  There are no other uses
-of CONFIG_TPM in C code.
-
-I had a quick peek at configure and create_config, but refrained from
-attempting to fix this, since I don't understand when exactly CONFIG_TPM
-should be defined.
-
-On 29 April 2016 at 08:42, Markus Armbruster <address@hidden> wrote:
->
-Stefan Weil <address@hidden> writes:
->
->
-> A compilation test with clang -Weverything reported this problem:
->
->
->
-> config-host.h:112:20: warning: '$' in identifier
->
-> [-Wdollar-in-identifier-extension]
->
->
->
-> The line of code looks like this:
->
->
->
-> #define CONFIG_TPM $(CONFIG_SOFTMMU)
->
->
->
-> This is fine for Makefile code, but won't work as expected in C code.
->
->
-Broken in commit 3b8acc1 "configure: fix TPM logic".  Cc'ing Paolo.
->
->
-Impact: #ifdef CONFIG_TPM never disables code.  There are no other uses
->
-of CONFIG_TPM in C code.
->
->
-I had a quick peek at configure and create_config, but refrained from
->
-attempting to fix this, since I don't understand when exactly CONFIG_TPM
->
-should be defined.
-Looking at 'git blame' suggests this has been wrong like this for
-some years, so we don't need to scramble to fix it for 2.6.
-
-thanks
--- PMM
-
diff --git a/results/classifier/008/other/56937788 b/results/classifier/008/other/56937788
deleted file mode 100644
index 026b838bf..000000000
--- a/results/classifier/008/other/56937788
+++ /dev/null
@@ -1,354 +0,0 @@
-other: 0.791
-KVM: 0.755
-vnc: 0.743
-debug: 0.723
-graphic: 0.720
-semantic: 0.705
-device: 0.697
-performance: 0.692
-permissions: 0.685
-files: 0.680
-boot: 0.636
-network: 0.633
-PID: 0.620
-socket: 0.613
-
-[Qemu-devel] [Bug] virtio-blk: qemu will crash if hotplug virtio-blk device failed
-
-I found that hotplug virtio-blk device will lead to qemu crash.
-
-Re-production steps:
-
-1.       Run VM named vm001
-
-2.       Create a virtio-blk.xml which contains wrong configurations:
-<disk device="lun" rawio="yes" type="block">
-  <driver cache="none" io="native" name="qemu" type="raw" />
-  <source dev="/dev/mapper/11-dm" />
-  <target bus="virtio" dev="vdx" />
-</disk>
-
-3.       Run command : virsh attach-device vm001 vm001
-
-Libvirt will return err msg:
-
-error: Failed to attach device from blk-scsi.xml
-
-error: internal error: unable to execute QEMU command 'device_add': Please set 
-scsi=off for virtio-blk devices in order to use virtio 1.0
-
-it means hotplug virtio-blk device failed.
-
-4.       Suspend or shutdown VM will leads to qemu crash
-
-
-
-from gdb:
-
-
-(gdb) bt
-#0  object_get_class (address@hidden) at qom/object.c:750
-#1  0x00007f9a72582e01 in virtio_vmstate_change (opaque=0x7f9a73d10960, 
-running=0, state=<optimized out>) at 
-/mnt/sdb/lzc/code/open/qemu/hw/virtio/virtio.c:2203
-#2  0x00007f9a7261ef52 in vm_state_notify (address@hidden, address@hidden) at 
-vl.c:1685
-#3  0x00007f9a7252603a in do_vm_stop (state=RUN_STATE_PAUSED) at 
-/mnt/sdb/lzc/code/open/qemu/cpus.c:941
-#4  vm_stop (address@hidden) at /mnt/sdb/lzc/code/open/qemu/cpus.c:1807
-#5  0x00007f9a7262eb1b in qmp_stop (address@hidden) at qmp.c:102
-#6  0x00007f9a7262c70a in qmp_marshal_stop (args=<optimized out>, 
-ret=<optimized out>, errp=0x7ffe63e255d8) at qmp-marshal.c:5854
-#7  0x00007f9a72897e79 in do_qmp_dispatch (errp=0x7ffe63e255d0, 
-request=0x7f9a76510120, cmds=0x7f9a72ee7980 <qmp_commands>) at 
-qapi/qmp-dispatch.c:104
-#8  qmp_dispatch (cmds=0x7f9a72ee7980 <qmp_commands>, address@hidden) at 
-qapi/qmp-dispatch.c:131
-#9  0x00007f9a725288d5 in handle_qmp_command (parser=<optimized out>, 
-tokens=<optimized out>) at /mnt/sdb/lzc/code/open/qemu/monitor.c:3852
-#10 0x00007f9a7289d514 in json_message_process_token (lexer=0x7f9a73ce4498, 
-input=0x7f9a73cc6880, type=JSON_RCURLY, x=36, y=17) at 
-qobject/json-streamer.c:105
-#11 0x00007f9a728bb69b in json_lexer_feed_char (address@hidden, ch=125 '}', 
-address@hidden) at qobject/json-lexer.c:323
-#12 0x00007f9a728bb75e in json_lexer_feed (lexer=0x7f9a73ce4498, 
-buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:373
-#13 0x00007f9a7289d5d9 in json_message_parser_feed (parser=<optimized out>, 
-buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:124
-#14 0x00007f9a7252722e in monitor_qmp_read (opaque=<optimized out>, 
-buf=<optimized out>, size=<optimized out>) at 
-/mnt/sdb/lzc/code/open/qemu/monitor.c:3894
-#15 0x00007f9a7284ee1b in tcp_chr_read (chan=<optimized out>, cond=<optimized 
-out>, opaque=<optimized out>) at chardev/char-socket.c:441
-#16 0x00007f9a6e03e99a in g_main_context_dispatch () from 
-/usr/lib64/libglib-2.0.so.0
-#17 0x00007f9a728a342c in glib_pollfds_poll () at util/main-loop.c:214
-#18 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
-#19 main_loop_wait (address@hidden) at util/main-loop.c:515
-#20 0x00007f9a724e7547 in main_loop () at vl.c:1999
-#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at 
-vl.c:4877
-
-Problem happens in virtio_vmstate_change which is called by vm_state_notify,
-static void virtio_vmstate_change(void *opaque, int running, RunState state)
-{
-    VirtIODevice *vdev = opaque;
-    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
-    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-    bool backend_run = running && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK);
-    vdev->vm_running = running;
-
-    if (backend_run) {
-        virtio_set_status(vdev, vdev->status);
-    }
-
-    if (k->vmstate_change) {
-        k->vmstate_change(qbus->parent, backend_run);
-    }
-
-    if (!backend_run) {
-        virtio_set_status(vdev, vdev->status);
-    }
-}
-
-Vdev's parent_bus is NULL, so qdev_get_parent_bus(DEVICE(vdev)) will crash.
-virtio_vmstate_change is added to the list vm_change_state_head at 
-virtio_blk_device_realize(virtio_init),
-but after hotplug virtio-blk failed, virtio_vmstate_change will not be removed 
-from vm_change_state_head.
-
-
-I apply a patch as follews:
-
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
-index 5884ce3..ea532dc 100644
---- a/hw/virtio/virtio.c
-+++ b/hw/virtio/virtio.c
-@@ -2491,6 +2491,7 @@ static void virtio_device_realize(DeviceState *dev, Error 
-**errp)
-     virtio_bus_device_plugged(vdev, &err);
-     if (err != NULL) {
-         error_propagate(errp, err);
-+        vdc->unrealize(dev, NULL);
-         return;
-     }
-
-On Tue, Oct 31, 2017 at 05:19:08AM +0000, linzhecheng wrote:
->
-I found that hotplug virtio-blk device will lead to qemu crash.
-The author posted a patch in a separate email thread.  Please see
-"[PATCH] fix: unrealize virtio device if we fail to hotplug it".
-
->
-Re-production steps:
->
->
-1.       Run VM named vm001
->
->
-2.       Create a virtio-blk.xml which contains wrong configurations:
->
-<disk device="lun" rawio="yes" type="block">
->
-<driver cache="none" io="native" name="qemu" type="raw" />
->
-<source dev="/dev/mapper/11-dm" />
->
-<target bus="virtio" dev="vdx" />
->
-</disk>
->
->
-3.       Run command : virsh attach-device vm001 vm001
->
->
-Libvirt will return err msg:
->
->
-error: Failed to attach device from blk-scsi.xml
->
->
-error: internal error: unable to execute QEMU command 'device_add': Please
->
-set scsi=off for virtio-blk devices in order to use virtio 1.0
->
->
-it means hotplug virtio-blk device failed.
->
->
-4.       Suspend or shutdown VM will leads to qemu crash
->
->
->
->
-from gdb:
->
->
->
-(gdb) bt
->
-#0  object_get_class (address@hidden) at qom/object.c:750
->
-#1  0x00007f9a72582e01 in virtio_vmstate_change (opaque=0x7f9a73d10960,
->
-running=0, state=<optimized out>) at
->
-/mnt/sdb/lzc/code/open/qemu/hw/virtio/virtio.c:2203
->
-#2  0x00007f9a7261ef52 in vm_state_notify (address@hidden, address@hidden) at
->
-vl.c:1685
->
-#3  0x00007f9a7252603a in do_vm_stop (state=RUN_STATE_PAUSED) at
->
-/mnt/sdb/lzc/code/open/qemu/cpus.c:941
->
-#4  vm_stop (address@hidden) at /mnt/sdb/lzc/code/open/qemu/cpus.c:1807
->
-#5  0x00007f9a7262eb1b in qmp_stop (address@hidden) at qmp.c:102
->
-#6  0x00007f9a7262c70a in qmp_marshal_stop (args=<optimized out>,
->
-ret=<optimized out>, errp=0x7ffe63e255d8) at qmp-marshal.c:5854
->
-#7  0x00007f9a72897e79 in do_qmp_dispatch (errp=0x7ffe63e255d0,
->
-request=0x7f9a76510120, cmds=0x7f9a72ee7980 <qmp_commands>) at
->
-qapi/qmp-dispatch.c:104
->
-#8  qmp_dispatch (cmds=0x7f9a72ee7980 <qmp_commands>, address@hidden) at
->
-qapi/qmp-dispatch.c:131
->
-#9  0x00007f9a725288d5 in handle_qmp_command (parser=<optimized out>,
->
-tokens=<optimized out>) at /mnt/sdb/lzc/code/open/qemu/monitor.c:3852
->
-#10 0x00007f9a7289d514 in json_message_process_token (lexer=0x7f9a73ce4498,
->
-input=0x7f9a73cc6880, type=JSON_RCURLY, x=36, y=17) at
->
-qobject/json-streamer.c:105
->
-#11 0x00007f9a728bb69b in json_lexer_feed_char (address@hidden, ch=125 '}',
->
-address@hidden) at qobject/json-lexer.c:323
->
-#12 0x00007f9a728bb75e in json_lexer_feed (lexer=0x7f9a73ce4498,
->
-buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:373
->
-#13 0x00007f9a7289d5d9 in json_message_parser_feed (parser=<optimized out>,
->
-buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:124
->
-#14 0x00007f9a7252722e in monitor_qmp_read (opaque=<optimized out>,
->
-buf=<optimized out>, size=<optimized out>) at
->
-/mnt/sdb/lzc/code/open/qemu/monitor.c:3894
->
-#15 0x00007f9a7284ee1b in tcp_chr_read (chan=<optimized out>, cond=<optimized
->
-out>, opaque=<optimized out>) at chardev/char-socket.c:441
->
-#16 0x00007f9a6e03e99a in g_main_context_dispatch () from
->
-/usr/lib64/libglib-2.0.so.0
->
-#17 0x00007f9a728a342c in glib_pollfds_poll () at util/main-loop.c:214
->
-#18 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
->
-#19 main_loop_wait (address@hidden) at util/main-loop.c:515
->
-#20 0x00007f9a724e7547 in main_loop () at vl.c:1999
->
-#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
->
-at vl.c:4877
->
->
-Problem happens in virtio_vmstate_change which is called by vm_state_notify,
->
-static void virtio_vmstate_change(void *opaque, int running, RunState state)
->
-{
->
-VirtIODevice *vdev = opaque;
->
-BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
->
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
->
-bool backend_run = running && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK);
->
-vdev->vm_running = running;
->
->
-if (backend_run) {
->
-virtio_set_status(vdev, vdev->status);
->
-}
->
->
-if (k->vmstate_change) {
->
-k->vmstate_change(qbus->parent, backend_run);
->
-}
->
->
-if (!backend_run) {
->
-virtio_set_status(vdev, vdev->status);
->
-}
->
-}
->
->
-Vdev's parent_bus is NULL, so qdev_get_parent_bus(DEVICE(vdev)) will crash.
->
-virtio_vmstate_change is added to the list vm_change_state_head at
->
-virtio_blk_device_realize(virtio_init),
->
-but after hotplug virtio-blk failed, virtio_vmstate_change will not be
->
-removed from vm_change_state_head.
->
->
->
-I apply a patch as follews:
->
->
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
->
-index 5884ce3..ea532dc 100644
->
---- a/hw/virtio/virtio.c
->
-+++ b/hw/virtio/virtio.c
->
-@@ -2491,6 +2491,7 @@ static void virtio_device_realize(DeviceState *dev,
->
-Error **errp)
->
-virtio_bus_device_plugged(vdev, &err);
->
-if (err != NULL) {
->
-error_propagate(errp, err);
->
-+        vdc->unrealize(dev, NULL);
->
-return;
->
-}
-signature.asc
-Description:
-PGP signature
-
diff --git a/results/classifier/008/other/57195159 b/results/classifier/008/other/57195159
deleted file mode 100644
index f913074b5..000000000
--- a/results/classifier/008/other/57195159
+++ /dev/null
@@ -1,325 +0,0 @@
-device: 0.877
-other: 0.868
-graphic: 0.861
-permissions: 0.857
-performance: 0.855
-debug: 0.821
-semantic: 0.794
-boot: 0.781
-PID: 0.755
-KVM: 0.752
-socket: 0.750
-files: 0.697
-network: 0.687
-vnc: 0.626
-
-[BUG Report] Got a use-after-free error while start arm64 VM with lots of pci controllers
-
-Hi,
-
-We got a use-after-free report in our Euler Robot Test, it is can be reproduced 
-quite easily,
-It can be reproduced by start VM with lots of pci controller and virtio-scsi 
-devices.
-You can find the full qemu log from attachment.
-We have analyzed the log and got the rough process how it happened, but don't 
-know how to fix it.
-
-Could anyone help to fix it ?
-
-The key message shows bellow:
-har device redirected to /dev/pts/1 (label charserial0)
-==1517174==WARNING: ASan doesn't fully support makecontext/swapcontext 
-functions and may produce false positives in some cases!
-=================================================================
-==1517174==ERROR: AddressSanitizer: heap-use-after-free on address 
-0xfffc31a002a0 at pc 0xaaad73e1f668 bp 0xfffc319fddb0 sp 0xfffc319fddd0
-READ of size 8 at 0xfffc31a002a0 thread T1
-    #0 0xaaad73e1f667 in memory_region_unref /home/qemu/memory.c:1771
-    #1 0xaaad73e1f667 in flatview_destroy /home/qemu/memory.c:291
-    #2 0xaaad74adc85b in call_rcu_thread util/rcu.c:283
-    #3 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519
-    #4 0xfffc3a1678bb  (/lib64/libpthread.so.0+0x78bb)
-    #5 0xfffc3a0a616b  (/lib64/libc.so.6+0xd616b)
-
-0xfffc31a002a0 is located 544 bytes inside of 1440-byte region 
-[0xfffc31a00080,0xfffc31a00620)
-freed by thread T37 (CPU 0/KVM) here:
-    #0 0xfffc3c102e23 in free (/lib64/libasan.so.4+0xd2e23)
-    #1 0xfffc3bbc729f in g_free (/lib64/libglib-2.0.so.0+0x5729f)
-    #2 0xaaad745cce03 in pci_bridge_update_mappings hw/pci/pci_bridge.c:245
-    #3 0xaaad745ccf33 in pci_bridge_write_config hw/pci/pci_bridge.c:271
-    #4 0xaaad745ba867 in pci_bridge_dev_write_config 
-hw/pci-bridge/pci_bridge_dev.c:153
-    #5 0xaaad745d6013 in pci_host_config_write_common hw/pci/pci_host.c:81
-    #6 0xaaad73e2346f in memory_region_write_accessor /home/qemu/memory.c:483
-    #7 0xaaad73e1d9ff in access_with_adjusted_size /home/qemu/memory.c:544
-    #8 0xaaad73e28d1f in memory_region_dispatch_write /home/qemu/memory.c:1482
-    #9 0xaaad73d7274f in flatview_write_continue /home/qemu/exec.c:3167
-    #10 0xaaad73d72a53 in flatview_write /home/qemu/exec.c:3207
-    #11 0xaaad73d7c8c3 in address_space_write /home/qemu/exec.c:3297
-    #12 0xaaad73e5059b in kvm_cpu_exec /home/qemu/accel/kvm/kvm-all.c:2386
-    #13 0xaaad73e07ac7 in qemu_kvm_cpu_thread_fn /home/qemu/cpus.c:1246
-    #14 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519
-    #15 0xfffc3a1678bb  (/lib64/libpthread.so.0+0x78bb)
-    #16 0xfffc3a0a616b  (/lib64/libc.so.6+0xd616b)
-
-previously allocated by thread T0 here:
-    #0 0xfffc3c1031cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb)
-    #1 0xfffc3bbc7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163)
-    #2 0xaaad745ccb57 in pci_bridge_region_init hw/pci/pci_bridge.c:188
-    #3 0xaaad745cd8cb in pci_bridge_initfn hw/pci/pci_bridge.c:385
-    #4 0xaaad745baaf3 in pci_bridge_dev_realize 
-hw/pci-bridge/pci_bridge_dev.c:64
-    #5 0xaaad745cacd7 in pci_qdev_realize hw/pci/pci.c:2095
-    #6 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865
-    #7 0xaaad7485ed23 in property_set_bool qom/object.c:2102
-    #8 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26
-    #9 0xaaad74863a43 in object_property_set_bool qom/object.c:1360
-    #10 0xaaad742a53b7 in qdev_device_add /home/qemu/qdev-monitor.c:675
-    #11 0xaaad742a9c7b in device_init_func /home/qemu/vl.c:2074
-    #12 0xaaad74ad4d33 in qemu_opts_foreach util/qemu-option.c:1170
-    #13 0xaaad73d60c17 in main /home/qemu/vl.c:4313
-    #14 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
-    #15 0xaaad73d6db33  
-(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
-
-Thread T1 created by T0 here:
-    #0 0xfffc3c068f6f in __interceptor_pthread_create 
-(/lib64/libasan.so.4+0x38f6f)
-    #1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556
-    #2 0xaaad74adc6a7 in rcu_init_complete util/rcu.c:326
-    #3 0xaaad74bab2a7 in __libc_csu_init 
-(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x17cb2a7)
-    #4 0xfffc39ff0b47 in __libc_start_main (/lib64/libc.so.6+0x20b47)
-    #5 0xaaad73d6db33  (/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
-
-Thread T37 (CPU 0/KVM) created by T0 here:
-    #0 0xfffc3c068f6f in __interceptor_pthread_create 
-(/lib64/libasan.so.4+0x38f6f)
-    #1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556
-    #2 0xaaad73e09b0f in qemu_dummy_start_vcpu /home/qemu/cpus.c:2045
-    #3 0xaaad73e09b0f in qemu_init_vcpu /home/qemu/cpus.c:2077
-    #4 0xaaad740d36b7 in arm_cpu_realizefn /home/qemu/target/arm/cpu.c:1712
-    #5 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865
-    #6 0xaaad7485ed23 in property_set_bool qom/object.c:2102
-    #7 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26
-    #8 0xaaad74863a43 in object_property_set_bool qom/object.c:1360
-    #9 0xaaad73fe3e67 in machvirt_init /home/qemu/hw/arm/virt.c:1682
-    #10 0xaaad743acfc7 in machine_run_board_init hw/core/machine.c:1077
-    #11 0xaaad73d60b73 in main /home/qemu/vl.c:4292
-    #12 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
-    #13 0xaaad73d6db33  
-(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
-
-SUMMARY: AddressSanitizer: heap-use-after-free /home/qemu/memory.c:1771 in 
-memory_region_unref
-
-Thanks
-use-after-free-qemu.log
-Description:
-Text document
-
-Cc: address@hidden
-
-On 1/17/2020 4:18 PM, Pan Nengyuan wrote:
->
-Hi,
->
->
-We got a use-after-free report in our Euler Robot Test, it is can be
->
-reproduced quite easily,
->
-It can be reproduced by start VM with lots of pci controller and virtio-scsi
->
-devices.
->
-You can find the full qemu log from attachment.
->
-We have analyzed the log and got the rough process how it happened, but don't
->
-know how to fix it.
->
->
-Could anyone help to fix it ?
->
->
-The key message shows bellow:
->
-har device redirected to /dev/pts/1 (label charserial0)
->
-==1517174==WARNING: ASan doesn't fully support makecontext/swapcontext
->
-functions and may produce false positives in some cases!
->
-=================================================================
->
-==1517174==ERROR: AddressSanitizer: heap-use-after-free on address
->
-0xfffc31a002a0 at pc 0xaaad73e1f668 bp 0xfffc319fddb0 sp 0xfffc319fddd0
->
-READ of size 8 at 0xfffc31a002a0 thread T1
->
-#0 0xaaad73e1f667 in memory_region_unref /home/qemu/memory.c:1771
->
-#1 0xaaad73e1f667 in flatview_destroy /home/qemu/memory.c:291
->
-#2 0xaaad74adc85b in call_rcu_thread util/rcu.c:283
->
-#3 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519
->
-#4 0xfffc3a1678bb  (/lib64/libpthread.so.0+0x78bb)
->
-#5 0xfffc3a0a616b  (/lib64/libc.so.6+0xd616b)
->
->
-0xfffc31a002a0 is located 544 bytes inside of 1440-byte region
->
-[0xfffc31a00080,0xfffc31a00620)
->
-freed by thread T37 (CPU 0/KVM) here:
->
-#0 0xfffc3c102e23 in free (/lib64/libasan.so.4+0xd2e23)
->
-#1 0xfffc3bbc729f in g_free (/lib64/libglib-2.0.so.0+0x5729f)
->
-#2 0xaaad745cce03 in pci_bridge_update_mappings hw/pci/pci_bridge.c:245
->
-#3 0xaaad745ccf33 in pci_bridge_write_config hw/pci/pci_bridge.c:271
->
-#4 0xaaad745ba867 in pci_bridge_dev_write_config
->
-hw/pci-bridge/pci_bridge_dev.c:153
->
-#5 0xaaad745d6013 in pci_host_config_write_common hw/pci/pci_host.c:81
->
-#6 0xaaad73e2346f in memory_region_write_accessor /home/qemu/memory.c:483
->
-#7 0xaaad73e1d9ff in access_with_adjusted_size /home/qemu/memory.c:544
->
-#8 0xaaad73e28d1f in memory_region_dispatch_write /home/qemu/memory.c:1482
->
-#9 0xaaad73d7274f in flatview_write_continue /home/qemu/exec.c:3167
->
-#10 0xaaad73d72a53 in flatview_write /home/qemu/exec.c:3207
->
-#11 0xaaad73d7c8c3 in address_space_write /home/qemu/exec.c:3297
->
-#12 0xaaad73e5059b in kvm_cpu_exec /home/qemu/accel/kvm/kvm-all.c:2386
->
-#13 0xaaad73e07ac7 in qemu_kvm_cpu_thread_fn /home/qemu/cpus.c:1246
->
-#14 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519
->
-#15 0xfffc3a1678bb  (/lib64/libpthread.so.0+0x78bb)
->
-#16 0xfffc3a0a616b  (/lib64/libc.so.6+0xd616b)
->
->
-previously allocated by thread T0 here:
->
-#0 0xfffc3c1031cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb)
->
-#1 0xfffc3bbc7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163)
->
-#2 0xaaad745ccb57 in pci_bridge_region_init hw/pci/pci_bridge.c:188
->
-#3 0xaaad745cd8cb in pci_bridge_initfn hw/pci/pci_bridge.c:385
->
-#4 0xaaad745baaf3 in pci_bridge_dev_realize
->
-hw/pci-bridge/pci_bridge_dev.c:64
->
-#5 0xaaad745cacd7 in pci_qdev_realize hw/pci/pci.c:2095
->
-#6 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865
->
-#7 0xaaad7485ed23 in property_set_bool qom/object.c:2102
->
-#8 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26
->
-#9 0xaaad74863a43 in object_property_set_bool qom/object.c:1360
->
-#10 0xaaad742a53b7 in qdev_device_add /home/qemu/qdev-monitor.c:675
->
-#11 0xaaad742a9c7b in device_init_func /home/qemu/vl.c:2074
->
-#12 0xaaad74ad4d33 in qemu_opts_foreach util/qemu-option.c:1170
->
-#13 0xaaad73d60c17 in main /home/qemu/vl.c:4313
->
-#14 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
->
-#15 0xaaad73d6db33
->
-(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
->
->
-Thread T1 created by T0 here:
->
-#0 0xfffc3c068f6f in __interceptor_pthread_create
->
-(/lib64/libasan.so.4+0x38f6f)
->
-#1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556
->
-#2 0xaaad74adc6a7 in rcu_init_complete util/rcu.c:326
->
-#3 0xaaad74bab2a7 in __libc_csu_init
->
-(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x17cb2a7)
->
-#4 0xfffc39ff0b47 in __libc_start_main (/lib64/libc.so.6+0x20b47)
->
-#5 0xaaad73d6db33
->
-(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
->
->
-Thread T37 (CPU 0/KVM) created by T0 here:
->
-#0 0xfffc3c068f6f in __interceptor_pthread_create
->
-(/lib64/libasan.so.4+0x38f6f)
->
-#1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556
->
-#2 0xaaad73e09b0f in qemu_dummy_start_vcpu /home/qemu/cpus.c:2045
->
-#3 0xaaad73e09b0f in qemu_init_vcpu /home/qemu/cpus.c:2077
->
-#4 0xaaad740d36b7 in arm_cpu_realizefn /home/qemu/target/arm/cpu.c:1712
->
-#5 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865
->
-#6 0xaaad7485ed23 in property_set_bool qom/object.c:2102
->
-#7 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26
->
-#8 0xaaad74863a43 in object_property_set_bool qom/object.c:1360
->
-#9 0xaaad73fe3e67 in machvirt_init /home/qemu/hw/arm/virt.c:1682
->
-#10 0xaaad743acfc7 in machine_run_board_init hw/core/machine.c:1077
->
-#11 0xaaad73d60b73 in main /home/qemu/vl.c:4292
->
-#12 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
->
-#13 0xaaad73d6db33
->
-(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
->
->
-SUMMARY: AddressSanitizer: heap-use-after-free /home/qemu/memory.c:1771 in
->
-memory_region_unref
->
->
-Thanks
->
-use-after-free-qemu.log
-Description:
-Text document
-
diff --git a/results/classifier/008/other/57231878 b/results/classifier/008/other/57231878
deleted file mode 100644
index 9c7802717..000000000
--- a/results/classifier/008/other/57231878
+++ /dev/null
@@ -1,252 +0,0 @@
-permissions: 0.856
-device: 0.818
-other: 0.788
-semantic: 0.774
-graphic: 0.751
-debug: 0.732
-KVM: 0.708
-PID: 0.684
-network: 0.659
-performance: 0.644
-vnc: 0.640
-socket: 0.624
-boot: 0.609
-files: 0.587
-
-[Qemu-devel] [BUG] qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed.
-
-Hello all,
-I wanted to submit a bug report in the tracker, but it seem to require
-an Ubuntu One account, which I'm having trouble with, so I'll just
-give it here and hopefully somebody can make use of it.  The issue
-seems to be in an experimental format, so it's likely not very
-consequential anyway.
-
-For the sake of anyone else simply googling for a workaround, I'll
-just paste in the (cleaned up) brief IRC conversation about my issue
-from the official channel:
-<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux,
-Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine
-(FreeBSD-11.1).  The VM always aborts at the same point in the
-installation (downloading 'ports.tgz') with the following error
-message:
-"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197:
-qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed.
-zsh: abort (core dumped)  qemu-system-x86_64 -smp 2 -m 4096
--enable-kvm -hda freebsd/freebsd.qed -devic"
-The commands I ran to create the machine are as follows:
-"qemu-img create -f qed freebsd/freebsd.qed 16G"
-"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda
-freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0
--cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d"
-I tried adding logging options with the -d flag, but I didn't get
-anything that seemed relevant, since I'm not sure what to look for.
-<stsquad> ohh what's a qed device?
-<stsquad> quy: it might be a workaround to use a qcow2 image for now
-<stsquad> ahh the wiki has a statement "It is not recommended to use
-QED for any new images. "
-<danpb> 'qed' was an experimental disk image format created by IBM
-before qcow2 v3 came along
-<danpb> honestly nothing should ever use  QED these days
-<danpb> the good ideas from QED became  qcow2v3
-<stsquad> danpb: sounds like we should put a warning on the option to
-remind users of that fact
-<danpb> quy: sounds like qed driver is simply broken - please do file
-a bug against qemu bug tracker
-<danpb> quy: but you should also really switch to qcow2
-<quy> I see; some people need to update their wikis then.  I don't
-remember where which guide I read when I first learned what little
-QEMU I know, but I remember it specifically remember it saying QED was
-the newest and most optimal format.
-<stsquad> quy: we can only be responsible for our own wiki I'm afraid...
-<danpb> if you remember where you saw that please let us know so we
-can try to get it fixed
-<quy> Thank you very much for the info; I will switch to QCOW.
-Unfortunately, I'm not sure if I will be able to file any bug reports
-in the tracker as I can't seem to log Launchpad, which it seems to
-require.
-<danpb> quy:  an email to the mailing list would suffice too if you
-can't deal with launchpad
-<danpb> kwolf: ^^^ in case you're interested in possible QED
-assertions from 2.12
-
-If any more info is needed, feel free to email me; I'm not actually
-subscribed to this list though.
-Thank you,
-Quytelda Kahja
-
-CC Qemu Block; looks like QED is a bit busted.
-
-On 06/27/2018 10:25 AM, Quytelda Kahja wrote:
->
-Hello all,
->
-I wanted to submit a bug report in the tracker, but it seem to require
->
-an Ubuntu One account, which I'm having trouble with, so I'll just
->
-give it here and hopefully somebody can make use of it.  The issue
->
-seems to be in an experimental format, so it's likely not very
->
-consequential anyway.
->
->
-For the sake of anyone else simply googling for a workaround, I'll
->
-just paste in the (cleaned up) brief IRC conversation about my issue
->
-from the official channel:
->
-<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux,
->
-Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine
->
-(FreeBSD-11.1).  The VM always aborts at the same point in the
->
-installation (downloading 'ports.tgz') with the following error
->
-message:
->
-"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197:
->
-qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed.
->
-zsh: abort (core dumped)  qemu-system-x86_64 -smp 2 -m 4096
->
--enable-kvm -hda freebsd/freebsd.qed -devic"
->
-The commands I ran to create the machine are as follows:
->
-"qemu-img create -f qed freebsd/freebsd.qed 16G"
->
-"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda
->
-freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0
->
--cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d"
->
-I tried adding logging options with the -d flag, but I didn't get
->
-anything that seemed relevant, since I'm not sure what to look for.
->
-<stsquad> ohh what's a qed device?
->
-<stsquad> quy: it might be a workaround to use a qcow2 image for now
->
-<stsquad> ahh the wiki has a statement "It is not recommended to use
->
-QED for any new images. "
->
-<danpb> 'qed' was an experimental disk image format created by IBM
->
-before qcow2 v3 came along
->
-<danpb> honestly nothing should ever use  QED these days
->
-<danpb> the good ideas from QED became  qcow2v3
->
-<stsquad> danpb: sounds like we should put a warning on the option to
->
-remind users of that fact
->
-<danpb> quy: sounds like qed driver is simply broken - please do file
->
-a bug against qemu bug tracker
->
-<danpb> quy: but you should also really switch to qcow2
->
-<quy> I see; some people need to update their wikis then.  I don't
->
-remember where which guide I read when I first learned what little
->
-QEMU I know, but I remember it specifically remember it saying QED was
->
-the newest and most optimal format.
->
-<stsquad> quy: we can only be responsible for our own wiki I'm afraid...
->
-<danpb> if you remember where you saw that please let us know so we
->
-can try to get it fixed
->
-<quy> Thank you very much for the info; I will switch to QCOW.
->
-Unfortunately, I'm not sure if I will be able to file any bug reports
->
-in the tracker as I can't seem to log Launchpad, which it seems to
->
-require.
->
-<danpb> quy:  an email to the mailing list would suffice too if you
->
-can't deal with launchpad
->
-<danpb> kwolf: ^^^ in case you're interested in possible QED
->
-assertions from 2.12
->
->
-If any more info is needed, feel free to email me; I'm not actually
->
-subscribed to this list though.
->
-Thank you,
->
-Quytelda Kahja
->
-
-On 06/29/2018 03:07 PM, John Snow wrote:
-CC Qemu Block; looks like QED is a bit busted.
-
-On 06/27/2018 10:25 AM, Quytelda Kahja wrote:
-Hello all,
-I wanted to submit a bug report in the tracker, but it seem to require
-an Ubuntu One account, which I'm having trouble with, so I'll just
-give it here and hopefully somebody can make use of it.  The issue
-seems to be in an experimental format, so it's likely not very
-consequential anyway.
-Analysis in another thread may be relevant:
-https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html
---
-Eric Blake, Principal Software Engineer
-Red Hat, Inc.           +1-919-301-3266
-Virtualization:  qemu.org | libvirt.org
-
-Am 29.06.2018 um 22:16 hat Eric Blake geschrieben:
->
-On 06/29/2018 03:07 PM, John Snow wrote:
->
-> CC Qemu Block; looks like QED is a bit busted.
->
->
->
-> On 06/27/2018 10:25 AM, Quytelda Kahja wrote:
->
-> > Hello all,
->
-> > I wanted to submit a bug report in the tracker, but it seem to require
->
-> > an Ubuntu One account, which I'm having trouble with, so I'll just
->
-> > give it here and hopefully somebody can make use of it.  The issue
->
-> > seems to be in an experimental format, so it's likely not very
->
-> > consequential anyway.
->
->
-Analysis in another thread may be relevant:
->
->
-https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html
-The assertion there was:
-
-qemu-system-x86_64: block.c:3434: bdrv_replace_node: Assertion 
-`!atomic_read(&to->in_flight)' failed.
-
-Which quite clearly pointed to a drain bug. This one, however, doesn't
-seem to be related to drain, so I think it's probably a different bug.
-
-Kevin
-
diff --git a/results/classifier/008/other/57756589 b/results/classifier/008/other/57756589
deleted file mode 100644
index 94ca85fb3..000000000
--- a/results/classifier/008/other/57756589
+++ /dev/null
@@ -1,1431 +0,0 @@
-other: 0.899
-device: 0.853
-vnc: 0.851
-permissions: 0.842
-performance: 0.839
-semantic: 0.835
-boot: 0.827
-graphic: 0.824
-network: 0.822
-socket: 0.820
-PID: 0.819
-KVM: 0.817
-files: 0.816
-debug: 0.803
-
-[Qemu-devel] 答复: Re:   答复: Re:  答复: Re: [BUG]COLO failover hang
-
-amost like wiki,but panic in Primary Node.
-
-
-
-
-setp:
-
-1 
-
-Primary Node.
-
-x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio 
--vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb 
--usbdevice tablet\
-
-  -drive 
-if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1,
-
-   
-children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=qcow2
- -S \
-
-  -netdev 
-tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-
-  -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \
-
-  -netdev 
-tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-
-  -device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66 \
-
-  -chardev socket,id=mirror0,host=9.61.1.8,port=9003,server,nowait -chardev 
-socket,id=compare1,host=9.61.1.8,port=9004,server,nowait \
-
-  -chardev socket,id=compare0,host=9.61.1.8,port=9001,server,nowait -chardev 
-socket,id=compare0-0,host=9.61.1.8,port=9001 \
-
-  -chardev socket,id=compare_out,host=9.61.1.8,port=9005,server,nowait \
-
-  -chardev socket,id=compare_out0,host=9.61.1.8,port=9005 \
-
-  -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
-
-  -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out 
--object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \
-
-  -object 
-colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0
-
-2 Second node:
-
-x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 
--name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb 
--usbdevice tablet\
-
-  -drive 
-if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=qcow2,node-name=node0
- \
-
-  -drive 
-if=virtio,id=active-disk0,driver=replication,mode=secondary,file.driver=qcow2,top-id=active-disk0,file.file.filename=/mnt/ramfstest/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfstest/hidden_disk.img,file.backing.backing=colo-disk0
-  \
-
-   -netdev 
-tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \
-
-  -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \
-
-  -netdev 
-tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-
-  -device e1000,netdev=hn0,mac=52:a4:00:12:78:66 -chardev 
-socket,id=red0,host=9.61.1.8,port=9003 \
-
-  -chardev socket,id=red1,host=9.61.1.8,port=9004 \
-
-  -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \
-
-  -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \
-
-  -object filter-rewriter,id=rew0,netdev=hn0,queue=all -incoming tcp:0:8888
-
-3  Secondary node:
-
-{'execute':'qmp_capabilities'}
-
-{ 'execute': 'nbd-server-start',
-
-  'arguments': {'addr': {'type': 'inet', 'data': {'host': '9.61.1.7', 'port': 
-'8889'} } }
-
-}
-
-{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': 
-true } }
-
-4:Primary Node:
-
-{'execute':'qmp_capabilities'}
-
-
-{ 'execute': 'human-monitor-command',
-
-  'arguments': {'command-line': 'drive_add -n buddy 
-driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0'}}
-
-{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': 
-'node0' } }
-
-{ 'execute': 'migrate-set-capabilities',
-
-      'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } 
-] } }
-
-{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:9.61.1.7:8888' } }
-
-
-
-
-then can see two runing VMs, whenever you make changes to PVM, SVM will be 
-synced.  
-
-
-
-
-5:Primary Node:
-
-echo c > /proc/sysrq-trigger
-
-
-
-
-6:Secondary node:
-
-{ 'execute': 'nbd-server-stop' }
-
-{ "execute": "x-colo-lost-heartbeat" }
-
-
-
-
-then can see the Secondary node hang at recvmsg recvmsg .
-
-
-
-
-
-
-
-
-
-
-
-
-原始邮件
-
-
-
-发件人: address@hidden
-收件人:王广10165992 address@hidden
-抄送人: address@hidden address@hidden
-日 期 :2017年03月21日 16:27
-主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang
-
-
-
-
-
-Hi,
-
-On 2017/3/21 16:10, address@hidden wrote:
-> Thank you。
->
-> I have test aready。
->
-> When the Primary Node panic,the Secondary Node qemu hang at the same place。
->
-> Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary Node qemu 
-will not produce the problem,but Primary Node panic can。
->
-> I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
->
->
-
-Yes, you are right, when we do failover for primary/secondary VM, we will 
-shutdown the related
-fd in case it is stuck in the read/write fd.
-
-It seems that you didn't follow the above introduction exactly to do the test. 
-Could you
-share your test procedures ? Especially the commands used in the test.
-
-Thanks,
-Hailiang
-
-> when failover,channel_shutdown could not shut down the channel.
->
->
-> so the colo_process_incoming_thread will hang at recvmsg.
->
->
-> I test a patch:
->
->
-> diff --git a/migration/socket.c b/migration/socket.c
->
->
-> index 13966f1..d65a0ea 100644
->
->
-> --- a/migration/socket.c
->
->
-> +++ b/migration/socket.c
->
->
-> @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
->
->
->       }
->
->
->
->
->
->       trace_migration_socket_incoming_accepted()
->
->
->
->
->
->       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
->
->
-> +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
->
->
->       migration_channel_process_incoming(migrate_get_current(),
->
->
->                                          QIO_CHANNEL(sioc))
->
->
->       object_unref(OBJECT(sioc))
->
->
->
->
-> My test will not hang any more.
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
-> 原始邮件
->
->
->
-> 发件人: address@hidden
-> 收件人:王广10165992 address@hidden
-> 抄送人: address@hidden address@hidden
-> 日 期 :2017年03月21日 15:58
-> 主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
->
->
->
->
->
-> Hi,Wang.
->
-> You can test this branch:
->
->
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
->
-> and please follow wiki ensure your own configuration correctly.
->
->
-http://wiki.qemu-project.org/Features/COLO
->
->
-> Thanks
->
-> Zhang Chen
->
->
-> On 03/21/2017 03:27 PM, address@hidden wrote:
-> >
-> > hi.
-> >
-> > I test the git qemu master have the same problem.
-> >
-> > (gdb) bt
-> >
-> > #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-> > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-> >
-> > #1  0x00007f658e4aa0c2 in qio_channel_read
-> > (address@hidden, address@hidden "",
-> > address@hidden, address@hidden) at io/channel.c:114
-> >
-> > #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
-> > buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
-> > migration/qemu-file-channel.c:78
-> >
-> > #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-> > migration/qemu-file.c:295
-> >
-> > #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-> > address@hidden) at migration/qemu-file.c:555
-> >
-> > #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-> > migration/qemu-file.c:568
-> >
-> > #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-> > migration/qemu-file.c:648
-> >
-> > #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-> > address@hidden) at migration/colo.c:244
-> >
-> > #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
-> > out>, address@hidden,
-> > address@hidden)
-> >
-> >     at migration/colo.c:264
-> >
-> > #9  0x00007f658e3e740e in colo_process_incoming_thread
-> > (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
-> >
-> > #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-> >
-> > #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-> >
-> > (gdb) p ioc->name
-> >
-> > $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-> >
-> > (gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-> >
-> > $3 = 0
-> >
-> >
-> > (gdb) bt
-> >
-> > #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-> > condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-> >
-> > #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
-> > gmain.c:3054
-> >
-> > #2  g_main_context_dispatch (context=<optimized out>,
-> > address@hidden) at gmain.c:3630
-> >
-> > #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-> >
-> > #4  os_host_main_loop_wait (timeout=<optimized out>) at
-> > util/main-loop.c:258
-> >
-> > #5  main_loop_wait (address@hidden) at
-> > util/main-loop.c:506
-> >
-> > #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-> >
-> > #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
-> > out>) at vl.c:4709
-> >
-> > (gdb) p ioc->features
-> >
-> > $1 = 6
-> >
-> > (gdb) p ioc->name
-> >
-> > $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-> >
-> >
-> > May be socket_accept_incoming_migration should
-> > call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-> >
-> >
-> > thank you.
-> >
-> >
-> >
-> >
-> >
-> > 原始邮件
-> > address@hidden
-> > address@hidden
-> > address@hidden@huawei.com>
-> > *日 期 :*2017年03月16日 14:46
-> > *主 题 :**Re: [Qemu-devel] COLO failover hang*
-> >
-> >
-> >
-> >
-> > On 03/15/2017 05:06 PM, wangguang wrote:
-> > >   am testing QEMU COLO feature described here [QEMU
-> > > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-> > >
-> > > When the Primary Node panic,the Secondary Node qemu hang.
-> > > hang at recvmsg in qio_channel_socket_readv.
-> > > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-> > > "x-colo-lost-heartbeat" } in Secondary VM's
-> > > monitor,the  Secondary Node qemu still hang at recvmsg .
-> > >
-> > > I found that the colo in qemu is not complete yet.
-> > > Do the colo have any plan for development?
-> >
-> > Yes, We are developing. You can see some of patch we pushing.
-> >
-> > > Has anyone ever run it successfully? Any help is appreciated!
-> >
-> > In our internal version can run it successfully,
-> > The failover detail you can ask Zhanghailiang for help.
-> > Next time if you have some question about COLO,
-> > please cc me and zhanghailiang address@hidden
-> >
-> >
-> > Thanks
-> > Zhang Chen
-> >
-> >
-> > >
-> > >
-> > >
-> > > centos7.2+qemu2.7.50
-> > > (gdb) bt
-> > > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-> > > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
-> > > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) at
-> > > io/channel-socket.c:497
-> > > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-> > > address@hidden "", address@hidden,
-> > > address@hidden) at io/channel.c:97
-> > > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
-> > > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
-> > > migration/qemu-file-channel.c:78
-> > > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-> > > migration/qemu-file.c:257
-> > > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-> > > address@hidden) at migration/qemu-file.c:510
-> > > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-> > > migration/qemu-file.c:523
-> > > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-> > > migration/qemu-file.c:603
-> > > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-> > > address@hidden) at migration/colo..c:215
-> > > #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-> > > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
-> > > migration/colo.c:546
-> > > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-> > > migration/colo.c:649
-> > > #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-> > > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-> > >
-> > >
-> > >
-> > >
-> > >
-> > > --
-> > > View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-> > > Sent from the Developer mailing list archive at Nabble.com.
-> > >
-> > >
-> > >
-> > >
-> >
-> > --
-> > Thanks
-> > Zhang Chen
-> >
-> >
-> >
-> >
-> >
->
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-     }
-
-
- 
-
-
-     trace_migration_socket_incoming_accepted()
-
-
-    
-
-
-     qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-     migration_channel_process_incoming(migrate_get_current(),
-
-
-                                        QIO_CHANNEL(sioc))
-
-
-     object_unref(OBJECT(sioc))
-
-
-
-
-Is this patch ok? 
-
-I have test it . The test could not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-原始邮件
-
-
-
-发件人: address@hidden
-收件人: address@hidden address@hidden
-抄送人: address@hidden address@hidden address@hidden
-日 期 :2017年03月22日 09:11
-主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang
-
-
-
-
-
-On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-> * Hailiang Zhang (address@hidden) wrote:
->> Hi,
->>
->> Thanks for reporting this, and i confirmed it in my test, and it is a bug.
->>
->> Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
->> case COLO thread/incoming thread is stuck in read/write() while do failover,
->> but it didn't take effect, because all the fd used by COLO (also migration)
->> has been wrapped by qio channel, and it will not call the shutdown API if
->> we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
->>
->> Cc: Dr. David Alan Gilbert address@hidden
->>
->> I doubted migration cancel has the same problem, it may be stuck in write()
->> if we tried to cancel migration.
->>
->> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
-Error **errp)
->> {
->>      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
->>      migration_channel_connect(s, ioc, NULL)
->>      ... ...
->> We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
->> and the
->> migrate_fd_cancel()
->> {
->>   ... ...
->>      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
->>          qemu_file_shutdown(f)  --> This will not take effect. No ?
->>      }
->> }
->
-> (cc'd in Daniel Berrange).
-> I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) 
-at the
-> top of qio_channel_socket_new  so I think that's safe isn't it?
->
-
-Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-
-> Dave
->
->> Thanks,
->> Hailiang
->>
->> On 2017/3/21 16:10, address@hidden wrote:
->>> Thank you。
->>>
->>> I have test aready。
->>>
->>> When the Primary Node panic,the Secondary Node qemu hang at the same place。
->>>
->>> Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary Node 
-qemu will not produce the problem,but Primary Node panic can。
->>>
->>> I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
->>>
->>>
->>> when failover,channel_shutdown could not shut down the channel.
->>>
->>>
->>> so the colo_process_incoming_thread will hang at recvmsg.
->>>
->>>
->>> I test a patch:
->>>
->>>
->>> diff --git a/migration/socket.c b/migration/socket.c
->>>
->>>
->>> index 13966f1..d65a0ea 100644
->>>
->>>
->>> --- a/migration/socket.c
->>>
->>>
->>> +++ b/migration/socket.c
->>>
->>>
->>> @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
->>>
->>>
->>>        }
->>>
->>>
->>>
->>>
->>>
->>>        trace_migration_socket_incoming_accepted()
->>>
->>>
->>>
->>>
->>>
->>>        qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
->>>
->>>
->>> +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN)
->>>
->>>
->>>        migration_channel_process_incoming(migrate_get_current(),
->>>
->>>
->>>                                           QIO_CHANNEL(sioc))
->>>
->>>
->>>        object_unref(OBJECT(sioc))
->>>
->>>
->>>
->>>
->>> My test will not hang any more.
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>> 原始邮件
->>>
->>>
->>>
->>> 发件人: address@hidden
->>> 收件人:王广10165992 address@hidden
->>> 抄送人: address@hidden address@hidden
->>> 日 期 :2017年03月21日 15:58
->>> 主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
->>>
->>>
->>>
->>>
->>>
->>> Hi,Wang.
->>>
->>> You can test this branch:
->>>
->>>
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
->>>
->>> and please follow wiki ensure your own configuration correctly.
->>>
->>>
-http://wiki.qemu-project.org/Features/COLO
->>>
->>>
->>> Thanks
->>>
->>> Zhang Chen
->>>
->>>
->>> On 03/21/2017 03:27 PM, address@hidden wrote:
->>> >
->>> > hi.
->>> >
->>> > I test the git qemu master have the same problem.
->>> >
->>> > (gdb) bt
->>> >
->>> > #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
->>> > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->>> >
->>> > #1  0x00007f658e4aa0c2 in qio_channel_read
->>> > (address@hidden, address@hidden "",
->>> > address@hidden, address@hidden) at io/channel.c:114
->>> >
->>> > #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
->>> > buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
->>> > migration/qemu-file-channel.c:78
->>> >
->>> > #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
->>> > migration/qemu-file.c:295
->>> >
->>> > #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
->>> > address@hidden) at migration/qemu-file.c:555
->>> >
->>> > #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
->>> > migration/qemu-file.c:568
->>> >
->>> > #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
->>> > migration/qemu-file.c:648
->>> >
->>> > #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
->>> > address@hidden) at migration/colo.c:244
->>> >
->>> > #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
->>> > out>, address@hidden,
->>> > address@hidden)
->>> >
->>> >     at migration/colo.c:264
->>> >
->>> > #9  0x00007f658e3e740e in colo_process_incoming_thread
->>> > (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
->>> >
->>> > #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->>> >
->>> > #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->>> >
->>> > (gdb) p ioc->name
->>> >
->>> > $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->>> >
->>> > (gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
->>> >
->>> > $3 = 0
->>> >
->>> >
->>> > (gdb) bt
->>> >
->>> > #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
->>> > condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->>> >
->>> > #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
->>> > gmain.c:3054
->>> >
->>> > #2  g_main_context_dispatch (context=<optimized out>,
->>> > address@hidden) at gmain.c:3630
->>> >
->>> > #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->>> >
->>> > #4  os_host_main_loop_wait (timeout=<optimized out>) at
->>> > util/main-loop.c:258
->>> >
->>> > #5  main_loop_wait (address@hidden) at
->>> > util/main-loop.c:506
->>> >
->>> > #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->>> >
->>> > #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
->>> > out>) at vl.c:4709
->>> >
->>> > (gdb) p ioc->features
->>> >
->>> > $1 = 6
->>> >
->>> > (gdb) p ioc->name
->>> >
->>> > $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->>> >
->>> >
->>> > May be socket_accept_incoming_migration should
->>> > call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->>> >
->>> >
->>> > thank you.
->>> >
->>> >
->>> >
->>> >
->>> >
->>> > 原始邮件
->>> > address@hidden
->>> > address@hidden
->>> > address@hidden@huawei.com>
->>> > *日 期 :*2017年03月16日 14:46
->>> > *主 题 :**Re: [Qemu-devel] COLO failover hang*
->>> >
->>> >
->>> >
->>> >
->>> > On 03/15/2017 05:06 PM, wangguang wrote:
->>> > >   am testing QEMU COLO feature described here [QEMU
->>> > > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
->>> > >
->>> > > When the Primary Node panic,the Secondary Node qemu hang.
->>> > > hang at recvmsg in qio_channel_socket_readv.
->>> > > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
->>> > > "x-colo-lost-heartbeat" } in Secondary VM's
->>> > > monitor,the  Secondary Node qemu still hang at recvmsg .
->>> > >
->>> > > I found that the colo in qemu is not complete yet.
->>> > > Do the colo have any plan for development?
->>> >
->>> > Yes, We are developing. You can see some of patch we pushing.
->>> >
->>> > > Has anyone ever run it successfully? Any help is appreciated!
->>> >
->>> > In our internal version can run it successfully,
->>> > The failover detail you can ask Zhanghailiang for help.
->>> > Next time if you have some question about COLO,
->>> > please cc me and zhanghailiang address@hidden
->>> >
->>> >
->>> > Thanks
->>> > Zhang Chen
->>> >
->>> >
->>> > >
->>> > >
->>> > >
->>> > > centos7.2+qemu2.7.50
->>> > > (gdb) bt
->>> > > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
->>> > > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
->>> > > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) 
-at
->>> > > io/channel-socket.c:497
->>> > > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
->>> > > address@hidden "", address@hidden,
->>> > > address@hidden) at io/channel.c:97
->>> > > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
->>> > > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
->>> > > migration/qemu-file-channel.c:78
->>> > > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
->>> > > migration/qemu-file.c:257
->>> > > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
->>> > > address@hidden) at migration/qemu-file.c:510
->>> > > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
->>> > > migration/qemu-file.c:523
->>> > > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
->>> > > migration/qemu-file.c:603
->>> > > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
->>> > > address@hidden) at migration/colo.c:215
->>> > > #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
->>> > > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
->>> > > migration/colo.c:546
->>> > > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
->>> > > migration/colo.c:649
->>> > > #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
->>> > > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
->>> > >
->>> > >
->>> > >
->>> > >
->>> > >
->>> > > --
->>> > > View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
->>> > > Sent from the Developer mailing list archive at Nabble.com.
->>> > >
->>> > >
->>> > >
->>> > >
->>> >
->>> > --
->>> > Thanks
->>> > Zhang Chen
->>> >
->>> >
->>> >
->>> >
->>> >
->>>
->>
-> --
-> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
-> .
->
-
-Hi,
-
-On 2017/3/22 9:42, address@hidden wrote:
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-      }
-
-
-
-
-
-      trace_migration_socket_incoming_accepted()
-
-
-
-
-
-      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-      migration_channel_process_incoming(migrate_get_current(),
-
-
-                                         QIO_CHANNEL(sioc))
-
-
-      object_unref(OBJECT(sioc))
-
-
-
-
-Is this patch ok?
-Yes, i think this works, but a better way maybe to call 
-qio_channel_set_feature()
-in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
-socket accept fd,
-Or fix it by this:
-
-diff --git a/io/channel-socket.c b/io/channel-socket.c
-index f546c68..ce6894c 100644
---- a/io/channel-socket.c
-+++ b/io/channel-socket.c
-@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
-                           Error **errp)
- {
-     QIOChannelSocket *cioc;
--
--    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
--    cioc->fd = -1;
-+
-+    cioc = qio_channel_socket_new();
-     cioc->remoteAddrLen = sizeof(ioc->remoteAddr);
-     cioc->localAddrLen = sizeof(ioc->localAddr);
-
-
-Thanks,
-Hailiang
-I have test it . The test could not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-原始邮件
-
-
-
-发件人: address@hidden
-收件人: address@hidden address@hidden
-抄送人: address@hidden address@hidden address@hidden
-日 期 :2017年03月22日 09:11
-主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang
-
-
-
-
-
-On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-> * Hailiang Zhang (address@hidden) wrote:
->> Hi,
->>
->> Thanks for reporting this, and i confirmed it in my test, and it is a bug.
->>
->> Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
->> case COLO thread/incoming thread is stuck in read/write() while do failover,
->> but it didn't take effect, because all the fd used by COLO (also migration)
->> has been wrapped by qio channel, and it will not call the shutdown API if
->> we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
->>
->> Cc: Dr. David Alan Gilbert address@hidden
->>
->> I doubted migration cancel has the same problem, it may be stuck in write()
->> if we tried to cancel migration.
->>
->> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
-Error **errp)
->> {
->>      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
->>      migration_channel_connect(s, ioc, NULL)
->>      ... ...
->> We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
->> and the
->> migrate_fd_cancel()
->> {
->>   ... ...
->>      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
->>          qemu_file_shutdown(f)  --> This will not take effect. No ?
->>      }
->> }
->
-> (cc'd in Daniel Berrange).
-> I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) 
-at the
-> top of qio_channel_socket_new  so I think that's safe isn't it?
->
-
-Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-
-> Dave
->
->> Thanks,
->> Hailiang
->>
->> On 2017/3/21 16:10, address@hidden wrote:
->>> Thank you。
->>>
->>> I have test aready。
->>>
->>> When the Primary Node panic,the Secondary Node qemu hang at the same place。
->>>
->>> Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary Node 
-qemu will not produce the problem,but Primary Node panic can。
->>>
->>> I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
->>>
->>>
->>> when failover,channel_shutdown could not shut down the channel.
->>>
->>>
->>> so the colo_process_incoming_thread will hang at recvmsg.
->>>
->>>
->>> I test a patch:
->>>
->>>
->>> diff --git a/migration/socket.c b/migration/socket.c
->>>
->>>
->>> index 13966f1..d65a0ea 100644
->>>
->>>
->>> --- a/migration/socket.c
->>>
->>>
->>> +++ b/migration/socket.c
->>>
->>>
->>> @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
->>>
->>>
->>>        }
->>>
->>>
->>>
->>>
->>>
->>>        trace_migration_socket_incoming_accepted()
->>>
->>>
->>>
->>>
->>>
->>>        qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
->>>
->>>
->>> +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN)
->>>
->>>
->>>        migration_channel_process_incoming(migrate_get_current(),
->>>
->>>
->>>                                           QIO_CHANNEL(sioc))
->>>
->>>
->>>        object_unref(OBJECT(sioc))
->>>
->>>
->>>
->>>
->>> My test will not hang any more.
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>>
->>> 原始邮件
->>>
->>>
->>>
->>> 发件人: address@hidden
->>> 收件人:王广10165992 address@hidden
->>> 抄送人: address@hidden address@hidden
->>> 日 期 :2017年03月21日 15:58
->>> 主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
->>>
->>>
->>>
->>>
->>>
->>> Hi,Wang.
->>>
->>> You can test this branch:
->>>
->>>
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
->>>
->>> and please follow wiki ensure your own configuration correctly.
->>>
->>>
-http://wiki.qemu-project.org/Features/COLO
->>>
->>>
->>> Thanks
->>>
->>> Zhang Chen
->>>
->>>
->>> On 03/21/2017 03:27 PM, address@hidden wrote:
->>> >
->>> > hi.
->>> >
->>> > I test the git qemu master have the same problem.
->>> >
->>> > (gdb) bt
->>> >
->>> > #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
->>> > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->>> >
->>> > #1  0x00007f658e4aa0c2 in qio_channel_read
->>> > (address@hidden, address@hidden "",
->>> > address@hidden, address@hidden) at io/channel.c:114
->>> >
->>> > #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
->>> > buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
->>> > migration/qemu-file-channel.c:78
->>> >
->>> > #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
->>> > migration/qemu-file.c:295
->>> >
->>> > #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
->>> > address@hidden) at migration/qemu-file.c:555
->>> >
->>> > #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
->>> > migration/qemu-file.c:568
->>> >
->>> > #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
->>> > migration/qemu-file.c:648
->>> >
->>> > #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
->>> > address@hidden) at migration/colo.c:244
->>> >
->>> > #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
->>> > out>, address@hidden,
->>> > address@hidden)
->>> >
->>> >     at migration/colo.c:264
->>> >
->>> > #9  0x00007f658e3e740e in colo_process_incoming_thread
->>> > (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
->>> >
->>> > #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->>> >
->>> > #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->>> >
->>> > (gdb) p ioc->name
->>> >
->>> > $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->>> >
->>> > (gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
->>> >
->>> > $3 = 0
->>> >
->>> >
->>> > (gdb) bt
->>> >
->>> > #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
->>> > condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->>> >
->>> > #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
->>> > gmain.c:3054
->>> >
->>> > #2  g_main_context_dispatch (context=<optimized out>,
->>> > address@hidden) at gmain.c:3630
->>> >
->>> > #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->>> >
->>> > #4  os_host_main_loop_wait (timeout=<optimized out>) at
->>> > util/main-loop.c:258
->>> >
->>> > #5  main_loop_wait (address@hidden) at
->>> > util/main-loop.c:506
->>> >
->>> > #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->>> >
->>> > #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
->>> > out>) at vl.c:4709
->>> >
->>> > (gdb) p ioc->features
->>> >
->>> > $1 = 6
->>> >
->>> > (gdb) p ioc->name
->>> >
->>> > $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->>> >
->>> >
->>> > May be socket_accept_incoming_migration should
->>> > call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->>> >
->>> >
->>> > thank you.
->>> >
->>> >
->>> >
->>> >
->>> >
->>> > 原始邮件
->>> > address@hidden
->>> > address@hidden
->>> > address@hidden@huawei.com>
->>> > *日 期 :*2017年03月16日 14:46
->>> > *主 题 :**Re: [Qemu-devel] COLO failover hang*
->>> >
->>> >
->>> >
->>> >
->>> > On 03/15/2017 05:06 PM, wangguang wrote:
->>> > >   am testing QEMU COLO feature described here [QEMU
->>> > > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
->>> > >
->>> > > When the Primary Node panic,the Secondary Node qemu hang.
->>> > > hang at recvmsg in qio_channel_socket_readv.
->>> > > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
->>> > > "x-colo-lost-heartbeat" } in Secondary VM's
->>> > > monitor,the  Secondary Node qemu still hang at recvmsg .
->>> > >
->>> > > I found that the colo in qemu is not complete yet.
->>> > > Do the colo have any plan for development?
->>> >
->>> > Yes, We are developing. You can see some of patch we pushing.
->>> >
->>> > > Has anyone ever run it successfully? Any help is appreciated!
->>> >
->>> > In our internal version can run it successfully,
->>> > The failover detail you can ask Zhanghailiang for help.
->>> > Next time if you have some question about COLO,
->>> > please cc me and zhanghailiang address@hidden
->>> >
->>> >
->>> > Thanks
->>> > Zhang Chen
->>> >
->>> >
->>> > >
->>> > >
->>> > >
->>> > > centos7.2+qemu2.7.50
->>> > > (gdb) bt
->>> > > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
->>> > > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
->>> > > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) 
-at
->>> > > io/channel-socket.c:497
->>> > > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
->>> > > address@hidden "", address@hidden,
->>> > > address@hidden) at io/channel.c:97
->>> > > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
->>> > > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
->>> > > migration/qemu-file-channel.c:78
->>> > > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
->>> > > migration/qemu-file.c:257
->>> > > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
->>> > > address@hidden) at migration/qemu-file.c:510
->>> > > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
->>> > > migration/qemu-file.c:523
->>> > > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
->>> > > migration/qemu-file.c:603
->>> > > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
->>> > > address@hidden) at migration/colo.c:215
->>> > > #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
->>> > > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
->>> > > migration/colo.c:546
->>> > > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
->>> > > migration/colo.c:649
->>> > > #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
->>> > > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
->>> > >
->>> > >
->>> > >
->>> > >
->>> > >
->>> > > --
->>> > > View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
->>> > > Sent from the Developer mailing list archive at Nabble.com.
->>> > >
->>> > >
->>> > >
->>> > >
->>> >
->>> > --
->>> > Thanks
->>> > Zhang Chen
->>> >
->>> >
->>> >
->>> >
->>> >
->>>
->>
-> --
-> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
-> .
->
-
diff --git a/results/classifier/008/other/60339453 b/results/classifier/008/other/60339453
deleted file mode 100644
index b7db4b510..000000000
--- a/results/classifier/008/other/60339453
+++ /dev/null
@@ -1,71 +0,0 @@
-boot: 0.782
-other: 0.776
-performance: 0.764
-permissions: 0.750
-device: 0.706
-PID: 0.685
-network: 0.682
-vnc: 0.680
-debug: 0.672
-graphic: 0.671
-KVM: 0.669
-semantic: 0.662
-files: 0.623
-socket: 0.607
-
-[BUG] scsi: vmw_pvscsi: Boot hangs during scsi under qemu, post commit e662502b3a78
-
-Hi,
-
-Commit e662502b3a78 ("scsi: vmw_pvscsi: Set correct residual data length"),
-and its backports to stable trees, makes kernel hang during boot, when
-ran as a VM under qemu with following parameters:
-
-  -drive file=$DISKFILE,if=none,id=sda
-  -device pvscsi
-  -device scsi-hd,bus=scsi.0,drive=sda
-
-Diving deeper, commit e662502b3a78
-
-  @@ -585,7 +585,13 @@ static void pvscsi_complete_request(struct 
-pvscsi_adapter *adapter,
-                case BTSTAT_SUCCESS:
-  +                     /*
-  +                      * Commands like INQUIRY may transfer less data than
-  +                      * requested by the initiator via bufflen. Set residual
-  +                      * count to make upper layer aware of the actual amount
-  +                      * of data returned.
-  +                      */
-  +                     scsi_set_resid(cmd, scsi_bufflen(cmd) - e->dataLen);
-
-assumes 'e->dataLen' is properly armed with actual num of bytes
-transferred; alas qemu's hw/scsi/vmw_pvscsi.c never arms the 'dataLen'
-field of the completion descriptor (kept zero).
-
-As a result, the residual count is set as the *entire* 'scsi_bufflen' of a
-good transfer, which makes upper scsi layers repeatedly ignore this
-valid transfer.
-
-Not properly arming 'dataLen' seems as an oversight in qemu, which needs
-to be fixed.
-
-However, since kernels with commit e662502b3a78 (and backports) now fail
-to boot under qemu's "-device pvscsi", a suggested workaround is to set
-the residual count *only* if 'e->dataLen' is armed, e.g:
-
-  @@ -588,7 +588,8 @@ static void pvscsi_complete_request(struct pvscsi_adapter 
-*adapter,
-                           * count to make upper layer aware of the actual 
-amount
-                           * of data returned.
-                           */
-  -                       scsi_set_resid(cmd, scsi_bufflen(cmd) - e->dataLen);
-  +                       if (e->dataLen)
-  +                               scsi_set_resid(cmd, scsi_bufflen(cmd) - 
-e->dataLen);
-
-in order to make kernels boot on old qemu binaries.
-
-Best,
-Shmulik
-
diff --git a/results/classifier/008/other/63565653 b/results/classifier/008/other/63565653
deleted file mode 100644
index ade63b0d0..000000000
--- a/results/classifier/008/other/63565653
+++ /dev/null
@@ -1,59 +0,0 @@
-other: 0.898
-device: 0.889
-boot: 0.889
-PID: 0.887
-network: 0.861
-debug: 0.855
-performance: 0.834
-KVM: 0.827
-semantic: 0.825
-socket: 0.745
-permissions: 0.739
-graphic: 0.734
-files: 0.705
-vnc: 0.588
-
-[Qemu-devel] [BUG]pcibus_reset assertion failure on guest reboot
-
-Qemu-2.6.2
-
-Start a vm with vhost-net , do reboot and hot-unplug viritio-net nic in short 
-time, we touch 
-pcibus_reset assertion failure.
-
-Here is qemu log:
-22:29:46.359386+08:00  acpi_pm1_cnt_write -> guest do soft power off
-22:29:46.785310+08:00  qemu_devices_reset
-22:29:46.788093+08:00  virtio_pci_device_unplugged -> virtio net unpluged
-22:29:46.803427+08:00  pcibus_reset: Assertion `bus->irq_count[i] == 0' failed.
-
-Here is stack info: 
-(gdb) bt
-#0  0x00007f9a336795d7 in raise () from /usr/lib64/libc.so.6
-#1  0x00007f9a3367acc8 in abort () from /usr/lib64/libc.so.6
-#2  0x00007f9a33672546 in __assert_fail_base () from /usr/lib64/libc.so.6
-#3  0x00007f9a336725f2 in __assert_fail () from /usr/lib64/libc.so.6
-#4  0x0000000000641884 in pcibus_reset (qbus=0x29eee60) at hw/pci/pci.c:283
-#5  0x00000000005bfc30 in qbus_reset_one (bus=0x29eee60, opaque=<optimized 
-out>) at hw/core/qdev.c:319
-#6  0x00000000005c1b19 in qdev_walk_children (dev=0x29ed2b0, pre_devfn=0x0, 
-pre_busfn=0x0, post_devfn=0x5c2440 ...
-#7  0x00000000005c1c59 in qbus_walk_children (bus=0x2736f80, pre_devfn=0x0, 
-pre_busfn=0x0, post_devfn=0x5c2440 ...
-#8  0x00000000005513f5 in qemu_devices_reset () at vl.c:1998
-#9  0x00000000004cab9d in pc_machine_reset () at 
-/home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/i386/pc.c:1976
-#10 0x000000000055148b in qemu_system_reset (address@hidden) at vl.c:2011
-#11 0x000000000055164f in main_loop_should_exit () at vl.c:2169
-#12 0x0000000000551719 in main_loop () at vl.c:2212
-#13 0x000000000041c9a8 in main (argc=<optimized out>, argv=<optimized out>, 
-envp=<optimized out>) at vl.c:5130
-(gdb) f 4
-...
-(gdb) p bus->irq_count[0]
-$6 = 1
-
-Seems pci_update_irq_disabled doesn't work well
-
-can anyone help?
-
diff --git a/results/classifier/008/other/65781993 b/results/classifier/008/other/65781993
deleted file mode 100644
index 4655ac69e..000000000
--- a/results/classifier/008/other/65781993
+++ /dev/null
@@ -1,2803 +0,0 @@
-other: 0.727
-PID: 0.673
-debug: 0.673
-semantic: 0.665
-graphic: 0.664
-socket: 0.660
-permissions: 0.658
-network: 0.657
-files: 0.657
-device: 0.647
-performance: 0.636
-boot: 0.635
-KVM: 0.627
-vnc: 0.590
-
-[Qemu-devel] 答复: Re:   答复: Re:  [BUG]COLO failover hang
-
-Thank you。
-
-I have test aready。
-
-When the Primary Node panic,the Secondary Node qemu hang at the same place。
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary Node qemu 
-will not produce the problem,but Primary Node panic can。
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-
-
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-     }
-
-
- 
-
-
-     trace_migration_socket_incoming_accepted()
-
-
-    
-
-
-     qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-     migration_channel_process_incoming(migrate_get_current(),
-
-
-                                        QIO_CHANNEL(sioc))
-
-
-     object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-原始邮件
-
-
-
-发件人: address@hidden
-收件人:王广10165992 address@hidden
-抄送人: address@hidden address@hidden
-日 期 :2017年03月21日 15:58
-主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
->
-> hi.
->
-> I test the git qemu master have the same problem.
->
-> (gdb) bt
->
-> #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, 
-> niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->
-> #1  0x00007f658e4aa0c2 in qio_channel_read 
-> (address@hidden, address@hidden "", 
-> address@hidden, address@hidden) at io/channel.c:114
->
-> #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>, 
-> buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at 
-> migration/qemu-file-channel.c:78
->
-> #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at 
-> migration/qemu-file.c:295
->
-> #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, 
-> address@hidden) at migration/qemu-file.c:555
->
-> #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at 
-> migration/qemu-file.c:568
->
-> #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at 
-> migration/qemu-file.c:648
->
-> #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, 
-> address@hidden) at migration/colo.c:244
->
-> #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized 
-> out>, address@hidden, 
-> address@hidden)
->
->     at migration/colo.c:264
->
-> #9  0x00007f658e3e740e in colo_process_incoming_thread 
-> (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
->
-> #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->
-> #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->
-> (gdb) p ioc->name
->
-> $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->
-> (gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
->
-> $3 = 0
->
->
-> (gdb) bt
->
-> #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90, 
-> condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->
-> #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at 
-> gmain.c:3054
->
-> #2  g_main_context_dispatch (context=<optimized out>, 
-> address@hidden) at gmain.c:3630
->
-> #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->
-> #4  os_host_main_loop_wait (timeout=<optimized out>) at 
-> util/main-loop.c:258
->
-> #5  main_loop_wait (address@hidden) at 
-> util/main-loop.c:506
->
-> #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->
-> #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized 
-> out>) at vl.c:4709
->
-> (gdb) p ioc->features
->
-> $1 = 6
->
-> (gdb) p ioc->name
->
-> $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->
->
-> May be socket_accept_incoming_migration should 
-> call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->
->
-> thank you.
->
->
->
->
->
-> 原始邮件
-> address@hidden
-> address@hidden
-> address@hidden@huawei.com>
-> *日 期 :*2017年03月16日 14:46
-> *主 题 :**Re: [Qemu-devel] COLO failover hang*
->
->
->
->
-> On 03/15/2017 05:06 PM, wangguang wrote:
-> >   am testing QEMU COLO feature described here [QEMU
-> > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-> >
-> > When the Primary Node panic,the Secondary Node qemu hang.
-> > hang at recvmsg in qio_channel_socket_readv.
-> > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-> > "x-colo-lost-heartbeat" } in Secondary VM's
-> > monitor,the  Secondary Node qemu still hang at recvmsg .
-> >
-> > I found that the colo in qemu is not complete yet.
-> > Do the colo have any plan for development?
->
-> Yes, We are developing. You can see some of patch we pushing.
->
-> > Has anyone ever run it successfully? Any help is appreciated!
->
-> In our internal version can run it successfully,
-> The failover detail you can ask Zhanghailiang for help.
-> Next time if you have some question about COLO,
-> please cc me and zhanghailiang address@hidden
->
->
-> Thanks
-> Zhang Chen
->
->
-> >
-> >
-> >
-> > centos7.2+qemu2.7.50
-> > (gdb) bt
-> > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-> > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
-> > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) at
-> > io/channel-socket.c:497
-> > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-> > address@hidden "", address@hidden,
-> > address@hidden) at io/channel.c:97
-> > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
-> > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
-> > migration/qemu-file-channel.c:78
-> > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-> > migration/qemu-file.c:257
-> > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-> > address@hidden) at migration/qemu-file.c:510
-> > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-> > migration/qemu-file.c:523
-> > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-> > migration/qemu-file.c:603
-> > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-> > address@hidden) at migration/colo.c:215
-> > #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-> > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
-> > migration/colo.c:546
-> > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-> > migration/colo.c:649
-> > #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-> > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-> >
-> >
-> >
-> >
-> >
-> > --
-> > View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-> > Sent from the Developer mailing list archive at Nabble.com.
-> >
-> >
-> >
-> >
->
-> -- 
-> Thanks
-> Zhang Chen
->
->
->
->
->
-
--- 
-Thanks
-Zhang Chen
-
-Hi,
-
-On 2017/3/21 16:10, address@hidden wrote:
-Thank you。
-
-I have test aready。
-
-When the Primary Node panic,the Secondary Node qemu hang at the same place。
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary Node qemu 
-will not produce the problem,but Primary Node panic can。
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-Yes, you are right, when we do failover for primary/secondary VM, we will 
-shutdown the related
-fd in case it is stuck in the read/write fd.
-
-It seems that you didn't follow the above introduction exactly to do the test. 
-Could you
-share your test procedures ? Especially the commands used in the test.
-
-Thanks,
-Hailiang
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-      }
-
-
-
-
-
-      trace_migration_socket_incoming_accepted()
-
-
-
-
-
-      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-      migration_channel_process_incoming(migrate_get_current(),
-
-
-                                         QIO_CHANNEL(sioc))
-
-
-      object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-原始邮件
-
-
-
-发件人: address@hidden
-收件人:王广10165992 address@hidden
-抄送人: address@hidden address@hidden
-日 期 :2017年03月21日 15:58
-主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
->
-> hi.
->
-> I test the git qemu master have the same problem.
->
-> (gdb) bt
->
-> #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-> niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->
-> #1  0x00007f658e4aa0c2 in qio_channel_read
-> (address@hidden, address@hidden "",
-> address@hidden, address@hidden) at io/channel.c:114
->
-> #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
-> buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
-> migration/qemu-file-channel.c:78
->
-> #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-> migration/qemu-file.c:295
->
-> #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-> address@hidden) at migration/qemu-file.c:555
->
-> #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-> migration/qemu-file.c:568
->
-> #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-> migration/qemu-file.c:648
->
-> #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-> address@hidden) at migration/colo.c:244
->
-> #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
-> out>, address@hidden,
-> address@hidden)
->
->     at migration/colo.c:264
->
-> #9  0x00007f658e3e740e in colo_process_incoming_thread
-> (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
->
-> #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->
-> #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->
-> (gdb) p ioc->name
->
-> $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->
-> (gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
->
-> $3 = 0
->
->
-> (gdb) bt
->
-> #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-> condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->
-> #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
-> gmain.c:3054
->
-> #2  g_main_context_dispatch (context=<optimized out>,
-> address@hidden) at gmain.c:3630
->
-> #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->
-> #4  os_host_main_loop_wait (timeout=<optimized out>) at
-> util/main-loop.c:258
->
-> #5  main_loop_wait (address@hidden) at
-> util/main-loop.c:506
->
-> #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->
-> #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
-> out>) at vl.c:4709
->
-> (gdb) p ioc->features
->
-> $1 = 6
->
-> (gdb) p ioc->name
->
-> $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->
->
-> May be socket_accept_incoming_migration should
-> call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->
->
-> thank you.
->
->
->
->
->
-> 原始邮件
-> address@hidden
-> address@hidden
-> address@hidden@huawei.com>
-> *日 期 :*2017年03月16日 14:46
-> *主 题 :**Re: [Qemu-devel] COLO failover hang*
->
->
->
->
-> On 03/15/2017 05:06 PM, wangguang wrote:
-> >   am testing QEMU COLO feature described here [QEMU
-> > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-> >
-> > When the Primary Node panic,the Secondary Node qemu hang.
-> > hang at recvmsg in qio_channel_socket_readv.
-> > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-> > "x-colo-lost-heartbeat" } in Secondary VM's
-> > monitor,the  Secondary Node qemu still hang at recvmsg .
-> >
-> > I found that the colo in qemu is not complete yet.
-> > Do the colo have any plan for development?
->
-> Yes, We are developing. You can see some of patch we pushing.
->
-> > Has anyone ever run it successfully? Any help is appreciated!
->
-> In our internal version can run it successfully,
-> The failover detail you can ask Zhanghailiang for help.
-> Next time if you have some question about COLO,
-> please cc me and zhanghailiang address@hidden
->
->
-> Thanks
-> Zhang Chen
->
->
-> >
-> >
-> >
-> > centos7.2+qemu2.7.50
-> > (gdb) bt
-> > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-> > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
-> > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) at
-> > io/channel-socket.c:497
-> > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-> > address@hidden "", address@hidden,
-> > address@hidden) at io/channel.c:97
-> > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
-> > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
-> > migration/qemu-file-channel.c:78
-> > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-> > migration/qemu-file.c:257
-> > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-> > address@hidden) at migration/qemu-file.c:510
-> > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-> > migration/qemu-file.c:523
-> > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-> > migration/qemu-file.c:603
-> > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-> > address@hidden) at migration/colo.c:215
-> > #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-> > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
-> > migration/colo.c:546
-> > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-> > migration/colo.c:649
-> > #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-> > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-> >
-> >
-> >
-> >
-> >
-> > --
-> > View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-> > Sent from the Developer mailing list archive at Nabble.com.
-> >
-> >
-> >
-> >
->
-> --
-> Thanks
-> Zhang Chen
->
->
->
->
->
-
-Hi,
-
-Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-
-Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-case COLO thread/incoming thread is stuck in read/write() while do failover,
-but it didn't take effect, because all the fd used by COLO (also migration)
-has been wrapped by qio channel, and it will not call the shutdown API if
-we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-
-Cc: Dr. David Alan Gilbert <address@hidden>
-
-I doubted migration cancel has the same problem, it may be stuck in write()
-if we tried to cancel migration.
-
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
-**errp)
-{
-    qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
-    migration_channel_connect(s, ioc, NULL);
-    ... ...
-We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-and the
-migrate_fd_cancel()
-{
- ... ...
-    if (s->state == MIGRATION_STATUS_CANCELLING && f) {
-        qemu_file_shutdown(f);  --> This will not take effect. No ?
-    }
-}
-
-Thanks,
-Hailiang
-
-On 2017/3/21 16:10, address@hidden wrote:
-Thank you。
-
-I have test aready。
-
-When the Primary Node panic,the Secondary Node qemu hang at the same place。
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary Node qemu 
-will not produce the problem,but Primary Node panic can。
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-
-
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-      }
-
-
-
-
-
-      trace_migration_socket_incoming_accepted()
-
-
-
-
-
-      qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-      migration_channel_process_incoming(migrate_get_current(),
-
-
-                                         QIO_CHANNEL(sioc))
-
-
-      object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-原始邮件
-
-
-
-发件人: address@hidden
-收件人:王广10165992 address@hidden
-抄送人: address@hidden address@hidden
-日 期 :2017年03月21日 15:58
-主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
->
-> hi.
->
-> I test the git qemu master have the same problem.
->
-> (gdb) bt
->
-> #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-> niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->
-> #1  0x00007f658e4aa0c2 in qio_channel_read
-> (address@hidden, address@hidden "",
-> address@hidden, address@hidden) at io/channel.c:114
->
-> #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
-> buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
-> migration/qemu-file-channel.c:78
->
-> #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-> migration/qemu-file.c:295
->
-> #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-> address@hidden) at migration/qemu-file.c:555
->
-> #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-> migration/qemu-file.c:568
->
-> #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-> migration/qemu-file.c:648
->
-> #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-> address@hidden) at migration/colo.c:244
->
-> #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
-> out>, address@hidden,
-> address@hidden)
->
->     at migration/colo.c:264
->
-> #9  0x00007f658e3e740e in colo_process_incoming_thread
-> (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
->
-> #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->
-> #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->
-> (gdb) p ioc->name
->
-> $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->
-> (gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
->
-> $3 = 0
->
->
-> (gdb) bt
->
-> #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-> condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->
-> #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
-> gmain.c:3054
->
-> #2  g_main_context_dispatch (context=<optimized out>,
-> address@hidden) at gmain.c:3630
->
-> #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->
-> #4  os_host_main_loop_wait (timeout=<optimized out>) at
-> util/main-loop.c:258
->
-> #5  main_loop_wait (address@hidden) at
-> util/main-loop.c:506
->
-> #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->
-> #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
-> out>) at vl.c:4709
->
-> (gdb) p ioc->features
->
-> $1 = 6
->
-> (gdb) p ioc->name
->
-> $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->
->
-> May be socket_accept_incoming_migration should
-> call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->
->
-> thank you.
->
->
->
->
->
-> 原始邮件
-> address@hidden
-> address@hidden
-> address@hidden@huawei.com>
-> *日 期 :*2017年03月16日 14:46
-> *主 题 :**Re: [Qemu-devel] COLO failover hang*
->
->
->
->
-> On 03/15/2017 05:06 PM, wangguang wrote:
-> >   am testing QEMU COLO feature described here [QEMU
-> > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-> >
-> > When the Primary Node panic,the Secondary Node qemu hang.
-> > hang at recvmsg in qio_channel_socket_readv.
-> > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-> > "x-colo-lost-heartbeat" } in Secondary VM's
-> > monitor,the  Secondary Node qemu still hang at recvmsg .
-> >
-> > I found that the colo in qemu is not complete yet.
-> > Do the colo have any plan for development?
->
-> Yes, We are developing. You can see some of patch we pushing.
->
-> > Has anyone ever run it successfully? Any help is appreciated!
->
-> In our internal version can run it successfully,
-> The failover detail you can ask Zhanghailiang for help.
-> Next time if you have some question about COLO,
-> please cc me and zhanghailiang address@hidden
->
->
-> Thanks
-> Zhang Chen
->
->
-> >
-> >
-> >
-> > centos7.2+qemu2.7.50
-> > (gdb) bt
-> > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-> > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
-> > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) at
-> > io/channel-socket.c:497
-> > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-> > address@hidden "", address@hidden,
-> > address@hidden) at io/channel.c:97
-> > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
-> > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
-> > migration/qemu-file-channel.c:78
-> > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-> > migration/qemu-file.c:257
-> > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-> > address@hidden) at migration/qemu-file.c:510
-> > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-> > migration/qemu-file.c:523
-> > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-> > migration/qemu-file.c:603
-> > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-> > address@hidden) at migration/colo.c:215
-> > #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-> > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
-> > migration/colo.c:546
-> > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-> > migration/colo.c:649
-> > #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-> > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-> >
-> >
-> >
-> >
-> >
-> > --
-> > View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-> > Sent from the Developer mailing list archive at Nabble.com.
-> >
-> >
-> >
-> >
->
-> --
-> Thanks
-> Zhang Chen
->
->
->
->
->
-
-* Hailiang Zhang (address@hidden) wrote:
->
-Hi,
->
->
-Thanks for reporting this, and i confirmed it in my test, and it is a bug.
->
->
-Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
->
-case COLO thread/incoming thread is stuck in read/write() while do failover,
->
-but it didn't take effect, because all the fd used by COLO (also migration)
->
-has been wrapped by qio channel, and it will not call the shutdown API if
->
-we didn't qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-QIO_CHANNEL_FEATURE_SHUTDOWN).
->
->
-Cc: Dr. David Alan Gilbert <address@hidden>
->
->
-I doubted migration cancel has the same problem, it may be stuck in write()
->
-if we tried to cancel migration.
->
->
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error
->
-**errp)
->
-{
->
-qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
->
-migration_channel_connect(s, ioc, NULL);
->
-... ...
->
-We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
->
-and the
->
-migrate_fd_cancel()
->
-{
->
-... ...
->
-if (s->state == MIGRATION_STATUS_CANCELLING && f) {
->
-qemu_file_shutdown(f);  --> This will not take effect. No ?
->
-}
->
-}
-(cc'd in Daniel Berrange).
-I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
-at the
-top of qio_channel_socket_new;  so I think that's safe isn't it?
-
-Dave
-
->
-Thanks,
->
-Hailiang
->
->
-On 2017/3/21 16:10, address@hidden wrote:
->
-> Thank you。
->
->
->
-> I have test aready。
->
->
->
-> When the Primary Node panic,the Secondary Node qemu hang at the same place。
->
->
->
-> Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary Node
->
-> qemu will not produce the problem,but Primary Node panic can。
->
->
->
-> I think due to the feature of channel does not support
->
-> QIO_CHANNEL_FEATURE_SHUTDOWN.
->
->
->
->
->
-> when failover,channel_shutdown could not shut down the channel.
->
->
->
->
->
-> so the colo_process_incoming_thread will hang at recvmsg.
->
->
->
->
->
-> I test a patch:
->
->
->
->
->
-> diff --git a/migration/socket.c b/migration/socket.c
->
->
->
->
->
-> index 13966f1..d65a0ea 100644
->
->
->
->
->
-> --- a/migration/socket.c
->
->
->
->
->
-> +++ b/migration/socket.c
->
->
->
->
->
-> @@ -147,8 +147,9 @@ static gboolean
->
-> socket_accept_incoming_migration(QIOChannel *ioc,
->
->
->
->
->
->       }
->
->
->
->
->
->
->
->
->
->
->
->       trace_migration_socket_incoming_accepted()
->
->
->
->
->
->
->
->
->
->
->
->       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
->
->
->
->
->
-> +    qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> QIO_CHANNEL_FEATURE_SHUTDOWN)
->
->
->
->
->
->       migration_channel_process_incoming(migrate_get_current(),
->
->
->
->
->
->                                          QIO_CHANNEL(sioc))
->
->
->
->
->
->       object_unref(OBJECT(sioc))
->
->
->
->
->
->
->
->
->
-> My test will not hang any more.
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
->
-> 原始邮件
->
->
->
->
->
->
->
-> 发件人: address@hidden
->
-> 收件人:王广10165992 address@hidden
->
-> 抄送人: address@hidden address@hidden
->
-> 日 期 :2017年03月21日 15:58
->
-> 主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
->
->
->
->
->
->
->
->
->
->
->
-> Hi,Wang.
->
->
->
-> You can test this branch:
->
->
->
->
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
->
->
->
-> and please follow wiki ensure your own configuration correctly.
->
->
->
->
-http://wiki.qemu-project.org/Features/COLO
->
->
->
->
->
-> Thanks
->
->
->
-> Zhang Chen
->
->
->
->
->
-> On 03/21/2017 03:27 PM, address@hidden wrote:
->
-> >
->
-> > hi.
->
-> >
->
-> > I test the git qemu master have the same problem.
->
-> >
->
-> > (gdb) bt
->
-> >
->
-> > #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
->
-> > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->
-> >
->
-> > #1  0x00007f658e4aa0c2 in qio_channel_read
->
-> > (address@hidden, address@hidden "",
->
-> > address@hidden, address@hidden) at io/channel.c:114
->
-> >
->
-> > #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
->
-> > buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
->
-> > migration/qemu-file-channel.c:78
->
-> >
->
-> > #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
->
-> > migration/qemu-file.c:295
->
-> >
->
-> > #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
->
-> > address@hidden) at migration/qemu-file.c:555
->
-> >
->
-> > #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
->
-> > migration/qemu-file.c:568
->
-> >
->
-> > #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
->
-> > migration/qemu-file.c:648
->
-> >
->
-> > #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
->
-> > address@hidden) at migration/colo.c:244
->
-> >
->
-> > #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
->
-> > out>, address@hidden,
->
-> > address@hidden)
->
-> >
->
-> >     at migration/colo.c:264
->
-> >
->
-> > #9  0x00007f658e3e740e in colo_process_incoming_thread
->
-> > (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
->
-> >
->
-> > #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->
-> >
->
-> > #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->
-> >
->
-> > (gdb) p ioc->name
->
-> >
->
-> > $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->
-> >
->
-> > (gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
->
-> >
->
-> > $3 = 0
->
-> >
->
-> >
->
-> > (gdb) bt
->
-> >
->
-> > #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
->
-> > condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->
-> >
->
-> > #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
->
-> > gmain.c:3054
->
-> >
->
-> > #2  g_main_context_dispatch (context=<optimized out>,
->
-> > address@hidden) at gmain.c:3630
->
-> >
->
-> > #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->
-> >
->
-> > #4  os_host_main_loop_wait (timeout=<optimized out>) at
->
-> > util/main-loop.c:258
->
-> >
->
-> > #5  main_loop_wait (address@hidden) at
->
-> > util/main-loop.c:506
->
-> >
->
-> > #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->
-> >
->
-> > #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
->
-> > out>) at vl.c:4709
->
-> >
->
-> > (gdb) p ioc->features
->
-> >
->
-> > $1 = 6
->
-> >
->
-> > (gdb) p ioc->name
->
-> >
->
-> > $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->
-> >
->
-> >
->
-> > May be socket_accept_incoming_migration should
->
-> > call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->
-> >
->
-> >
->
-> > thank you.
->
-> >
->
-> >
->
-> >
->
-> >
->
-> >
->
-> > 原始邮件
->
-> > address@hidden
->
-> > address@hidden
->
-> > address@hidden@huawei.com>
->
-> > *日 期 :*2017年03月16日 14:46
->
-> > *主 题 :**Re: [Qemu-devel] COLO failover hang*
->
-> >
->
-> >
->
-> >
->
-> >
->
-> > On 03/15/2017 05:06 PM, wangguang wrote:
->
-> > >   am testing QEMU COLO feature described here [QEMU
->
-> > > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
->
-> > >
->
-> > > When the Primary Node panic,the Secondary Node qemu hang.
->
-> > > hang at recvmsg in qio_channel_socket_readv.
->
-> > > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
->
-> > > "x-colo-lost-heartbeat" } in Secondary VM's
->
-> > > monitor,the  Secondary Node qemu still hang at recvmsg .
->
-> > >
->
-> > > I found that the colo in qemu is not complete yet.
->
-> > > Do the colo have any plan for development?
->
-> >
->
-> > Yes, We are developing. You can see some of patch we pushing.
->
-> >
->
-> > > Has anyone ever run it successfully? Any help is appreciated!
->
-> >
->
-> > In our internal version can run it successfully,
->
-> > The failover detail you can ask Zhanghailiang for help.
->
-> > Next time if you have some question about COLO,
->
-> > please cc me and zhanghailiang address@hidden
->
-> >
->
-> >
->
-> > Thanks
->
-> > Zhang Chen
->
-> >
->
-> >
->
-> > >
->
-> > >
->
-> > >
->
-> > > centos7.2+qemu2.7.50
->
-> > > (gdb) bt
->
-> > > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
->
-> > > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
->
-> > > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0)
->
-> at
->
-> > > io/channel-socket.c:497
->
-> > > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
->
-> > > address@hidden "", address@hidden,
->
-> > > address@hidden) at io/channel.c:97
->
-> > > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
->
-> > > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
->
-> > > migration/qemu-file-channel.c:78
->
-> > > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
->
-> > > migration/qemu-file.c:257
->
-> > > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
->
-> > > address@hidden) at migration/qemu-file.c:510
->
-> > > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
->
-> > > migration/qemu-file.c:523
->
-> > > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
->
-> > > migration/qemu-file.c:603
->
-> > > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
->
-> > > address@hidden) at migration/colo.c:215
->
-> > > #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
->
-> > > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
->
-> > > migration/colo.c:546
->
-> > > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
->
-> > > migration/colo.c:649
->
-> > > #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
->
-> > > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > > --
->
-> > > View this message in context:
->
->
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
->
-> > > Sent from the Developer mailing list archive at Nabble.com.
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> >
->
-> > --
->
-> > Thanks
->
-> > Zhang Chen
->
-> >
->
-> >
->
-> >
->
-> >
->
-> >
->
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-* Hailiang Zhang (address@hidden) wrote:
-Hi,
-
-Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-
-Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-case COLO thread/incoming thread is stuck in read/write() while do failover,
-but it didn't take effect, because all the fd used by COLO (also migration)
-has been wrapped by qio channel, and it will not call the shutdown API if
-we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-
-Cc: Dr. David Alan Gilbert <address@hidden>
-
-I doubted migration cancel has the same problem, it may be stuck in write()
-if we tried to cancel migration.
-
-void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
-**errp)
-{
-     qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
-     migration_channel_connect(s, ioc, NULL);
-     ... ...
-We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-and the
-migrate_fd_cancel()
-{
-  ... ...
-     if (s->state == MIGRATION_STATUS_CANCELLING && f) {
-         qemu_file_shutdown(f);  --> This will not take effect. No ?
-     }
-}
-(cc'd in Daniel Berrange).
-I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); 
-at the
-top of qio_channel_socket_new;  so I think that's safe isn't it?
-Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
-Dave
-Thanks,
-Hailiang
-
-On 2017/3/21 16:10, address@hidden wrote:
-Thank you。
-
-I have test aready。
-
-When the Primary Node panic,the Secondary Node qemu hang at the same place。
-
-Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary Node qemu 
-will not produce the problem,but Primary Node panic can。
-
-I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-
-
-when failover,channel_shutdown could not shut down the channel.
-
-
-so the colo_process_incoming_thread will hang at recvmsg.
-
-
-I test a patch:
-
-
-diff --git a/migration/socket.c b/migration/socket.c
-
-
-index 13966f1..d65a0ea 100644
-
-
---- a/migration/socket.c
-
-
-+++ b/migration/socket.c
-
-
-@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel 
-*ioc,
-
-
-       }
-
-
-
-
-
-       trace_migration_socket_incoming_accepted()
-
-
-
-
-
-       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
-
-
-+    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
-
-
-       migration_channel_process_incoming(migrate_get_current(),
-
-
-                                          QIO_CHANNEL(sioc))
-
-
-       object_unref(OBJECT(sioc))
-
-
-
-
-My test will not hang any more.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-原始邮件
-
-
-
-发件人: address@hidden
-收件人:王广10165992 address@hidden
-抄送人: address@hidden address@hidden
-日 期 :2017年03月21日 15:58
-主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
-
-
-
-
-
-Hi,Wang.
-
-You can test this branch:
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-and please follow wiki ensure your own configuration correctly.
-http://wiki.qemu-project.org/Features/COLO
-Thanks
-
-Zhang Chen
-
-
-On 03/21/2017 03:27 PM, address@hidden wrote:
->
-> hi.
->
-> I test the git qemu master have the same problem.
->
-> (gdb) bt
->
-> #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-> niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->
-> #1  0x00007f658e4aa0c2 in qio_channel_read
-> (address@hidden, address@hidden "",
-> address@hidden, address@hidden) at io/channel.c:114
->
-> #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
-> buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
-> migration/qemu-file-channel.c:78
->
-> #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-> migration/qemu-file.c:295
->
-> #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-> address@hidden) at migration/qemu-file.c:555
->
-> #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-> migration/qemu-file.c:568
->
-> #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-> migration/qemu-file.c:648
->
-> #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-> address@hidden) at migration/colo.c:244
->
-> #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
-> out>, address@hidden,
-> address@hidden)
->
->     at migration/colo.c:264
->
-> #9  0x00007f658e3e740e in colo_process_incoming_thread
-> (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
->
-> #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->
-> #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->
-> (gdb) p ioc->name
->
-> $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->
-> (gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
->
-> $3 = 0
->
->
-> (gdb) bt
->
-> #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-> condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->
-> #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
-> gmain.c:3054
->
-> #2  g_main_context_dispatch (context=<optimized out>,
-> address@hidden) at gmain.c:3630
->
-> #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->
-> #4  os_host_main_loop_wait (timeout=<optimized out>) at
-> util/main-loop.c:258
->
-> #5  main_loop_wait (address@hidden) at
-> util/main-loop.c:506
->
-> #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->
-> #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
-> out>) at vl.c:4709
->
-> (gdb) p ioc->features
->
-> $1 = 6
->
-> (gdb) p ioc->name
->
-> $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->
->
-> May be socket_accept_incoming_migration should
-> call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->
->
-> thank you.
->
->
->
->
->
-> 原始邮件
-> address@hidden
-> address@hidden
-> address@hidden@huawei.com>
-> *日 期 :*2017年03月16日 14:46
-> *主 题 :**Re: [Qemu-devel] COLO failover hang*
->
->
->
->
-> On 03/15/2017 05:06 PM, wangguang wrote:
-> >   am testing QEMU COLO feature described here [QEMU
-> > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-> >
-> > When the Primary Node panic,the Secondary Node qemu hang.
-> > hang at recvmsg in qio_channel_socket_readv.
-> > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-> > "x-colo-lost-heartbeat" } in Secondary VM's
-> > monitor,the  Secondary Node qemu still hang at recvmsg .
-> >
-> > I found that the colo in qemu is not complete yet.
-> > Do the colo have any plan for development?
->
-> Yes, We are developing. You can see some of patch we pushing.
->
-> > Has anyone ever run it successfully? Any help is appreciated!
->
-> In our internal version can run it successfully,
-> The failover detail you can ask Zhanghailiang for help.
-> Next time if you have some question about COLO,
-> please cc me and zhanghailiang address@hidden
->
->
-> Thanks
-> Zhang Chen
->
->
-> >
-> >
-> >
-> > centos7.2+qemu2.7.50
-> > (gdb) bt
-> > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-> > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized out>,
-> > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0) at
-> > io/channel-socket.c:497
-> > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-> > address@hidden "", address@hidden,
-> > address@hidden) at io/channel.c:97
-> > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
-> > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
-> > migration/qemu-file-channel.c:78
-> > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-> > migration/qemu-file.c:257
-> > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-> > address@hidden) at migration/qemu-file.c:510
-> > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-> > migration/qemu-file.c:523
-> > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-> > migration/qemu-file.c:603
-> > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-> > address@hidden) at migration/colo.c:215
-> > #9  0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48,
-> > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
-> > migration/colo.c:546
-> > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-> > migration/colo.c:649
-> > #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-> > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
-> >
-> >
-> >
-> >
-> >
-> > --
-> > View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-> > Sent from the Developer mailing list archive at Nabble.com.
-> >
-> >
-> >
-> >
->
-> --
-> Thanks
-> Zhang Chen
->
->
->
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-.
-
-* Hailiang Zhang (address@hidden) wrote:
->
-On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
->
-> * Hailiang Zhang (address@hidden) wrote:
->
-> > Hi,
->
-> >
->
-> > Thanks for reporting this, and i confirmed it in my test, and it is a bug.
->
-> >
->
-> > Though we tried to call qemu_file_shutdown() to shutdown the related fd,
->
-> > in
->
-> > case COLO thread/incoming thread is stuck in read/write() while do
->
-> > failover,
->
-> > but it didn't take effect, because all the fd used by COLO (also
->
-> > migration)
->
-> > has been wrapped by qio channel, and it will not call the shutdown API if
->
-> > we didn't qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> > QIO_CHANNEL_FEATURE_SHUTDOWN).
->
-> >
->
-> > Cc: Dr. David Alan Gilbert <address@hidden>
->
-> >
->
-> > I doubted migration cancel has the same problem, it may be stuck in
->
-> > write()
->
-> > if we tried to cancel migration.
->
-> >
->
-> > void fd_start_outgoing_migration(MigrationState *s, const char *fdname,
->
-> > Error **errp)
->
-> > {
->
-> >      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing");
->
-> >      migration_channel_connect(s, ioc, NULL);
->
-> >      ... ...
->
-> > We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> > QIO_CHANNEL_FEATURE_SHUTDOWN) above,
->
-> > and the
->
-> > migrate_fd_cancel()
->
-> > {
->
-> >   ... ...
->
-> >      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
->
-> >          qemu_file_shutdown(f);  --> This will not take effect. No ?
->
-> >      }
->
-> > }
->
->
->
-> (cc'd in Daniel Berrange).
->
-> I see that we call qio_channel_set_feature(ioc,
->
-> QIO_CHANNEL_FEATURE_SHUTDOWN); at the
->
-> top of qio_channel_socket_new;  so I think that's safe isn't it?
->
->
->
->
-Hmm, you are right, this problem is only exist for the migration incoming fd,
->
-thanks.
-Yes, and I don't think we normally do a cancel on the incoming side of a 
-migration.
-
-Dave
-
->
-> Dave
->
->
->
-> > Thanks,
->
-> > Hailiang
->
-> >
->
-> > On 2017/3/21 16:10, address@hidden wrote:
->
-> > > Thank you。
->
-> > >
->
-> > > I have test aready。
->
-> > >
->
-> > > When the Primary Node panic,the Secondary Node qemu hang at the same
->
-> > > place。
->
-> > >
->
-> > > Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary
->
-> > > Node qemu will not produce the problem,but Primary Node panic can。
->
-> > >
->
-> > > I think due to the feature of channel does not support
->
-> > > QIO_CHANNEL_FEATURE_SHUTDOWN.
->
-> > >
->
-> > >
->
-> > > when failover,channel_shutdown could not shut down the channel.
->
-> > >
->
-> > >
->
-> > > so the colo_process_incoming_thread will hang at recvmsg.
->
-> > >
->
-> > >
->
-> > > I test a patch:
->
-> > >
->
-> > >
->
-> > > diff --git a/migration/socket.c b/migration/socket.c
->
-> > >
->
-> > >
->
-> > > index 13966f1..d65a0ea 100644
->
-> > >
->
-> > >
->
-> > > --- a/migration/socket.c
->
-> > >
->
-> > >
->
-> > > +++ b/migration/socket.c
->
-> > >
->
-> > >
->
-> > > @@ -147,8 +147,9 @@ static gboolean
->
-> > > socket_accept_incoming_migration(QIOChannel *ioc,
->
-> > >
->
-> > >
->
-> > >        }
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >        trace_migration_socket_incoming_accepted()
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >        qio_channel_set_name(QIO_CHANNEL(sioc),
->
-> > > "migration-socket-incoming")
->
-> > >
->
-> > >
->
-> > > +    qio_channel_set_feature(QIO_CHANNEL(sioc),
->
-> > > QIO_CHANNEL_FEATURE_SHUTDOWN)
->
-> > >
->
-> > >
->
-> > >        migration_channel_process_incoming(migrate_get_current(),
->
-> > >
->
-> > >
->
-> > >                                           QIO_CHANNEL(sioc))
->
-> > >
->
-> > >
->
-> > >        object_unref(OBJECT(sioc))
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > > My test will not hang any more.
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > > 原始邮件
->
-> > >
->
-> > >
->
-> > >
->
-> > > 发件人: address@hidden
->
-> > > 收件人:王广10165992 address@hidden
->
-> > > 抄送人: address@hidden address@hidden
->
-> > > 日 期 :2017年03月21日 15:58
->
-> > > 主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > >
->
-> > > Hi,Wang.
->
-> > >
->
-> > > You can test this branch:
->
-> > >
->
-> > >
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
->
-> > >
->
-> > > and please follow wiki ensure your own configuration correctly.
->
-> > >
->
-> > >
-http://wiki.qemu-project.org/Features/COLO
->
-> > >
->
-> > >
->
-> > > Thanks
->
-> > >
->
-> > > Zhang Chen
->
-> > >
->
-> > >
->
-> > > On 03/21/2017 03:27 PM, address@hidden wrote:
->
-> > > >
->
-> > > > hi.
->
-> > > >
->
-> > > > I test the git qemu master have the same problem.
->
-> > > >
->
-> > > > (gdb) bt
->
-> > > >
->
-> > > > #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
->
-> > > > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
->
-> > > >
->
-> > > > #1  0x00007f658e4aa0c2 in qio_channel_read
->
-> > > > (address@hidden, address@hidden "",
->
-> > > > address@hidden, address@hidden) at io/channel.c:114
->
-> > > >
->
-> > > > #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
->
-> > > > buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
->
-> > > > migration/qemu-file-channel.c:78
->
-> > > >
->
-> > > > #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
->
-> > > > migration/qemu-file.c:295
->
-> > > >
->
-> > > > #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
->
-> > > > address@hidden) at migration/qemu-file.c:555
->
-> > > >
->
-> > > > #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
->
-> > > > migration/qemu-file.c:568
->
-> > > >
->
-> > > > #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
->
-> > > > migration/qemu-file.c:648
->
-> > > >
->
-> > > > #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
->
-> > > > address@hidden) at migration/colo.c:244
->
-> > > >
->
-> > > > #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
->
-> > > > out>, address@hidden,
->
-> > > > address@hidden)
->
-> > > >
->
-> > > >     at migration/colo.c:264
->
-> > > >
->
-> > > > #9  0x00007f658e3e740e in colo_process_incoming_thread
->
-> > > > (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
->
-> > > >
->
-> > > > #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
->
-> > > >
->
-> > > > #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
->
-> > > >
->
-> > > > (gdb) p ioc->name
->
-> > > >
->
-> > > > $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
->
-> > > >
->
-> > > > (gdb) p ioc->features        Do not support
->
-> > > QIO_CHANNEL_FEATURE_SHUTDOWN
->
-> > > >
->
-> > > > $3 = 0
->
-> > > >
->
-> > > >
->
-> > > > (gdb) bt
->
-> > > >
->
-> > > > #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
->
-> > > > condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
->
-> > > >
->
-> > > > #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
->
-> > > > gmain.c:3054
->
-> > > >
->
-> > > > #2  g_main_context_dispatch (context=<optimized out>,
->
-> > > > address@hidden) at gmain.c:3630
->
-> > > >
->
-> > > > #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
->
-> > > >
->
-> > > > #4  os_host_main_loop_wait (timeout=<optimized out>) at
->
-> > > > util/main-loop.c:258
->
-> > > >
->
-> > > > #5  main_loop_wait (address@hidden) at
->
-> > > > util/main-loop.c:506
->
-> > > >
->
-> > > > #6  0x00007fdccb526187 in main_loop () at vl.c:1898
->
-> > > >
->
-> > > > #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
->
-> > > > out>) at vl.c:4709
->
-> > > >
->
-> > > > (gdb) p ioc->features
->
-> > > >
->
-> > > > $1 = 6
->
-> > > >
->
-> > > > (gdb) p ioc->name
->
-> > > >
->
-> > > > $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
->
-> > > >
->
-> > > >
->
-> > > > May be socket_accept_incoming_migration should
->
-> > > > call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
->
-> > > >
->
-> > > >
->
-> > > > thank you.
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > 原始邮件
->
-> > > > address@hidden
->
-> > > > address@hidden
->
-> > > > address@hidden@huawei.com>
->
-> > > > *日 期 :*2017年03月16日 14:46
->
-> > > > *主 题 :**Re: [Qemu-devel] COLO failover hang*
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > On 03/15/2017 05:06 PM, wangguang wrote:
->
-> > > > >   am testing QEMU COLO feature described here [QEMU
->
-> > > > > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
->
-> > > > >
->
-> > > > > When the Primary Node panic,the Secondary Node qemu hang.
->
-> > > > > hang at recvmsg in qio_channel_socket_readv.
->
-> > > > > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
->
-> > > > > "x-colo-lost-heartbeat" } in Secondary VM's
->
-> > > > > monitor,the  Secondary Node qemu still hang at recvmsg .
->
-> > > > >
->
-> > > > > I found that the colo in qemu is not complete yet.
->
-> > > > > Do the colo have any plan for development?
->
-> > > >
->
-> > > > Yes, We are developing. You can see some of patch we pushing.
->
-> > > >
->
-> > > > > Has anyone ever run it successfully? Any help is appreciated!
->
-> > > >
->
-> > > > In our internal version can run it successfully,
->
-> > > > The failover detail you can ask Zhanghailiang for help.
->
-> > > > Next time if you have some question about COLO,
->
-> > > > please cc me and zhanghailiang address@hidden
->
-> > > >
->
-> > > >
->
-> > > > Thanks
->
-> > > > Zhang Chen
->
-> > > >
->
-> > > >
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > centos7.2+qemu2.7.50
->
-> > > > > (gdb) bt
->
-> > > > > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
->
-> > > > > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized
->
-> > > out>,
->
-> > > > > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0,
->
-> > > errp=0x0) at
->
-> > > > > io/channel-socket.c:497
->
-> > > > > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
->
-> > > > > address@hidden "", address@hidden,
->
-> > > > > address@hidden) at io/channel.c:97
->
-> > > > > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized
->
-> > > out>,
->
-> > > > > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
->
-> > > > > migration/qemu-file-channel.c:78
->
-> > > > > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
->
-> > > > > migration/qemu-file.c:257
->
-> > > > > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
->
-> > > > > address@hidden) at migration/qemu-file.c:510
->
-> > > > > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
->
-> > > > > migration/qemu-file.c:523
->
-> > > > > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
->
-> > > > > migration/qemu-file.c:603
->
-> > > > > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
->
-> > > > > address@hidden) at migration/colo.c:215
->
-> > > > > #9  0x00007f3e0327250d in colo_wait_handle_message
->
-> > > (errp=0x7f3d62bfaa48,
->
-> > > > > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
->
-> > > > > migration/colo.c:546
->
-> > > > > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
->
-> > > > > migration/colo.c:649
->
-> > > > > #11 0x00007f3e00cc1df3 in start_thread () from
->
-> > > /lib64/libpthread.so.0
->
-> > > > > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > --
->
-> > > > > View this message in context:
->
-> > >
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
->
-> > > > > Sent from the Developer mailing list archive at Nabble.com.
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > >
->
-> > > > --
->
-> > > > Thanks
->
-> > > > Zhang Chen
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > >
->
-> >
->
-> --
->
-> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->
->
-> .
->
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
diff --git a/results/classifier/008/other/66743673 b/results/classifier/008/other/66743673
deleted file mode 100644
index 8919cd22c..000000000
--- a/results/classifier/008/other/66743673
+++ /dev/null
@@ -1,374 +0,0 @@
-other: 0.967
-permissions: 0.963
-semantic: 0.951
-PID: 0.949
-debug: 0.945
-boot: 0.938
-files: 0.937
-network: 0.930
-graphic: 0.927
-socket: 0.926
-vnc: 0.926
-device: 0.926
-performance: 0.905
-KVM: 0.891
-
-[Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT / CMP_LEG bits
-
-Hi Community,
-
-This email contains 3 bugs appear to share the same root cause.
-
-[1] We ran into the following warnings when running QEMU v10.0.0 in TCG mode:
-
-qemu-system-x86_64 \
-  -machine q35 \
-  -m 4G -smp 4 \
-  -kernel ./arch/x86/boot/bzImage \
-  -bios /usr/share/ovmf/OVMF.fd \
-  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
-  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
-  -nographic \
-  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.01H:EDX.ht [bit 28]
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.80000001H:ECX.cmp-legacy [bit 1]
-(repeats 4 times, once per vCPU)
-Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up CPUID_HT in
-x86_cpu_expand_features() instead of cpu_x86_cpuid()" is what introduced the
-warnings.
-Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) and
-CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT support, these
-bits trigger the warnings above.
-[2] Also, Zhao pointed me to a similar report on GitLab:
-https://gitlab.com/qemu-project/qemu/-/issues/2894
-The symptoms there look identical to what we're seeing.
-By convention we file one issue per email, but these two appear to share the
-same root cause, so I'm describing them together here.
-[3] My colleague Alan noticed what appears to be a related problem: if we launch
-a guest with '-cpu <model>,-ht --enable-kvm', which means explicitly removing
-the ht flag, but the guest still reports HT(cat /proc/cpuinfo in linux guest)
-enabled. In other words, under KVM the ht bit seems to be forced on even when
-the user tries to disable it.
-Best regards,
-Ewan
-
-On 4/29/25 11:02 AM, Ewan Hai wrote:
-Hi Community,
-
-This email contains 3 bugs appear to share the same root cause.
-
-[1] We ran into the following warnings when running QEMU v10.0.0 in TCG mode:
-
-qemu-system-x86_64 \
-   -machine q35 \
-   -m 4G -smp 4 \
-   -kernel ./arch/x86/boot/bzImage \
-   -bios /usr/share/ovmf/OVMF.fd \
-   -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
-   -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
-   -nographic \
-   -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.01H:EDX.ht [bit 28]
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.80000001H:ECX.cmp-legacy [bit 1]
-(repeats 4 times, once per vCPU)
-Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up CPUID_HT in
-x86_cpu_expand_features() instead of cpu_x86_cpuid()" is what introduced the
-warnings.
-Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) and
-CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT support, these
-bits trigger the warnings above.
-[2] Also, Zhao pointed me to a similar report on GitLab:
-https://gitlab.com/qemu-project/qemu/-/issues/2894
-The symptoms there look identical to what we're seeing.
-By convention we file one issue per email, but these two appear to share the
-same root cause, so I'm describing them together here.
-[3] My colleague Alan noticed what appears to be a related problem: if we launch
-a guest with '-cpu <model>,-ht --enable-kvm', which means explicitly removing
-the ht flag, but the guest still reports HT(cat /proc/cpuinfo in linux guest)
-enabled. In other words, under KVM the ht bit seems to be forced on even when
-the user tries to disable it.
-XiaoYao reminded me that issue [3] stems from a different patch. Please ignore
-it for now—I'll start a separate thread to discuss that one independently.
-Best regards,
-Ewan
-
-On 4/29/2025 11:02 AM, Ewan Hai wrote:
-Hi Community,
-
-This email contains 3 bugs appear to share the same root cause.
-[1] We ran into the following warnings when running QEMU v10.0.0 in TCG
-mode:
-qemu-system-x86_64 \
-   -machine q35 \
-   -m 4G -smp 4 \
-   -kernel ./arch/x86/boot/bzImage \
-   -bios /usr/share/ovmf/OVMF.fd \
-   -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
-   -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
-   -nographic \
-   -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.01H:EDX.ht [bit 28]
-qemu-system-x86_64: warning: TCG doesn't support requested feature:
-CPUID.80000001H:ECX.cmp-legacy [bit 1]
-(repeats 4 times, once per vCPU)
-Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up
-CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is
-what introduced the warnings.
-Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28])
-and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT
-support, these bits trigger the warnings above.
-[2] Also, Zhao pointed me to a similar report on GitLab:
-https://gitlab.com/qemu-project/qemu/-/issues/2894
-The symptoms there look identical to what we're seeing.
-By convention we file one issue per email, but these two appear to share
-the same root cause, so I'm describing them together here.
-It was caused by my two patches. I think the fix can be as follow.
-If no objection from the community, I can submit the formal patch.
-
-diff --git a/target/i386/cpu.c b/target/i386/cpu.c
-index 1f970aa4daa6..fb95aadd6161 100644
---- a/target/i386/cpu.c
-+++ b/target/i386/cpu.c
-@@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
-vendor1,
-CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \
-           CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \
-           CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \
--          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE)
-+          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \
-+          CPUID_HT)
-           /* partly implemented:
-           CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */
-           /* missing:
--          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */
-+          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */
-
- /*
-  * Kernel-only features that can be shown to usermode programs even if
-@@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
-vendor1,
-#define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
-           CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \
--          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES)
-+          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \
-+          CPUID_EXT3_CMP_LEG)
-
- #define TCG_EXT4_FEATURES 0
-[3] My colleague Alan noticed what appears to be a related problem: if
-we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
-explicitly removing the ht flag, but the guest still reports HT(cat /
-proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
-bit seems to be forced on even when the user tries to disable it.
-This has been the behavior of QEMU for many years, not some regression
-introduced by my patches. We can discuss how to address it separately.
-Best regards,
-Ewan
-
-On Tue, Apr 29, 2025 at 01:55:59PM +0800, Xiaoyao Li wrote:
->
-Date: Tue, 29 Apr 2025 13:55:59 +0800
->
-From: Xiaoyao Li <xiaoyao.li@intel.com>
->
-Subject: Re: [Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT /
->
-CMP_LEG bits
->
->
-On 4/29/2025 11:02 AM, Ewan Hai wrote:
->
-> Hi Community,
->
->
->
-> This email contains 3 bugs appear to share the same root cause.
->
->
->
-> [1] We ran into the following warnings when running QEMU v10.0.0 in TCG
->
-> mode:
->
->
->
-> qemu-system-x86_64 \
->
->    -machine q35 \
->
->    -m 4G -smp 4 \
->
->    -kernel ./arch/x86/boot/bzImage \
->
->    -bios /usr/share/ovmf/OVMF.fd \
->
->    -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \
->
->    -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \
->
->    -nographic \
->
->    -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr'
->
->
->
-> qemu-system-x86_64: warning: TCG doesn't support requested feature:
->
-> CPUID.01H:EDX.ht [bit 28]
->
-> qemu-system-x86_64: warning: TCG doesn't support requested feature:
->
-> CPUID.80000001H:ECX.cmp-legacy [bit 1]
->
-> (repeats 4 times, once per vCPU)
->
->
->
-> Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up
->
-> CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is
->
-> what introduced the warnings.
->
->
->
-> Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28])
->
-> and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT
->
-> support, these bits trigger the warnings above.
->
->
->
-> [2] Also, Zhao pointed me to a similar report on GitLab:
->
->
-https://gitlab.com/qemu-project/qemu/-/issues/2894
->
-> The symptoms there look identical to what we're seeing.
->
->
->
-> By convention we file one issue per email, but these two appear to share
->
-> the same root cause, so I'm describing them together here.
->
->
-It was caused by my two patches. I think the fix can be as follow.
->
-If no objection from the community, I can submit the formal patch.
->
->
-diff --git a/target/i386/cpu.c b/target/i386/cpu.c
->
-index 1f970aa4daa6..fb95aadd6161 100644
->
---- a/target/i386/cpu.c
->
-+++ b/target/i386/cpu.c
->
-@@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
->
-vendor1,
->
-CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \
->
-CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \
->
-CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \
->
--          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE)
->
-+          CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \
->
-+          CPUID_HT)
->
-/* partly implemented:
->
-CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */
->
-/* missing:
->
--          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */
->
-+          CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */
->
->
-/*
->
-* Kernel-only features that can be shown to usermode programs even if
->
-@@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t
->
-vendor1,
->
->
-#define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
->
-CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \
->
--          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES)
->
-+          CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \
->
-+          CPUID_EXT3_CMP_LEG)
->
->
-#define TCG_EXT4_FEATURES 0
-This fix is fine for me...at least from SDM, HTT depends on topology and
-it should exist when user sets "-smp 4".
-
->
-> [3] My colleague Alan noticed what appears to be a related problem: if
->
-> we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
->
-> explicitly removing the ht flag, but the guest still reports HT(cat
->
-> /proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
->
-> bit seems to be forced on even when the user tries to disable it.
->
->
-XiaoYao reminded me that issue [3] stems from a different patch. Please
->
-ignore it for now—I'll start a separate thread to discuss that one
->
-independently.
-I haven't found any other thread :-).
-
-By the way, just curious, in what cases do you need to disbale the HT
-flag? "-smp 4" means 4 cores with 1 thread per core, and is it not
-enough?
-
-As for the “-ht” behavior, I'm also unsure whether this should be fixed
-or not - one possible consideration is whether “-ht” would be useful.
-
-On 5/8/25 5:04 PM, Zhao Liu wrote:
-[3] My colleague Alan noticed what appears to be a related problem: if
-we launch a guest with '-cpu <model>,-ht --enable-kvm', which means
-explicitly removing the ht flag, but the guest still reports HT(cat
-/proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht
-bit seems to be forced on even when the user tries to disable it.
-XiaoYao reminded me that issue [3] stems from a different patch. Please
-ignore it for now—I'll start a separate thread to discuss that one
-independently.
-I haven't found any other thread :-).
-Please refer to
-https://lore.kernel.org/all/db6ae3bb-f4e5-4719-9beb-623fcff56af2@zhaoxin.com/
-.
-By the way, just curious, in what cases do you need to disbale the HT
-flag? "-smp 4" means 4 cores with 1 thread per core, and is it not
-enough?
-
-As for the “-ht” behavior, I'm also unsure whether this should be fixed
-or not - one possible consideration is whether “-ht” would be useful.
-I wasn't trying to target any specific use case, using "-ht" was simply a way to
-check how the ht feature behaves under both KVM and TCG. There's no special
-workload behind it; I just wanted to confirm that the flag is respected (or not)
-in each mode.
-
diff --git a/results/classifier/008/other/68897003 b/results/classifier/008/other/68897003
deleted file mode 100644
index 3cae4c9bb..000000000
--- a/results/classifier/008/other/68897003
+++ /dev/null
@@ -1,726 +0,0 @@
-other: 0.714
-graphic: 0.694
-permissions: 0.677
-PID: 0.677
-performance: 0.673
-semantic: 0.671
-debug: 0.663
-device: 0.647
-network: 0.614
-files: 0.608
-KVM: 0.598
-socket: 0.585
-boot: 0.569
-vnc: 0.525
-
-[Qemu-devel] [BUG] VM abort after migration
-
-Hi guys,
-
-We found a qemu core in our testing environment, the assertion
-'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
-the bus->irq_count[i] is '-1'.
-
-Through analysis, it was happened after VM migration and we think
-it was caused by the following sequence:
-
-*Migration Source*
-1. save bus pci.0 state, including irq_count[x] ( =0 , old )
-2. save E1000:
-   e1000_pre_save
-    e1000_mit_timer
-     set_interrupt_cause
-      pci_set_irq --> update pci_dev->irq_state to 1 and
-                  update bus->irq_count[x] to 1 ( new )
-    the irq_state sent to dest.
-
-*Migration Dest*
-1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
-2. If the e1000 need change irqline , it would call to pci_irq_handler(),
-  the irq_state maybe change to 0 and bus->irq_count[x] will become
-  -1 in this situation.
-3. do VM reboot then the assertion will be triggered.
-
-We also found some guys faced the similar problem:
-[1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
-[2]
-https://bugs.launchpad.net/qemu/+bug/1702621
-Is there some patches to fix this problem ?
-Can we save pcibus state after all the pci devs are saved ?
-
-Thanks,
-Longpeng(Mike)
-
-* longpeng (address@hidden) wrote:
->
-Hi guys,
->
->
-We found a qemu core in our testing environment, the assertion
->
-'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
->
-the bus->irq_count[i] is '-1'.
->
->
-Through analysis, it was happened after VM migration and we think
->
-it was caused by the following sequence:
->
->
-*Migration Source*
->
-1. save bus pci.0 state, including irq_count[x] ( =0 , old )
->
-2. save E1000:
->
-e1000_pre_save
->
-e1000_mit_timer
->
-set_interrupt_cause
->
-pci_set_irq --> update pci_dev->irq_state to 1 and
->
-update bus->irq_count[x] to 1 ( new )
->
-the irq_state sent to dest.
->
->
-*Migration Dest*
->
-1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
->
-2. If the e1000 need change irqline , it would call to pci_irq_handler(),
->
-the irq_state maybe change to 0 and bus->irq_count[x] will become
->
--1 in this situation.
->
-3. do VM reboot then the assertion will be triggered.
->
->
-We also found some guys faced the similar problem:
->
-[1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
->
-[2]
-https://bugs.launchpad.net/qemu/+bug/1702621
->
->
-Is there some patches to fix this problem ?
-I don't remember any.
-
->
-Can we save pcibus state after all the pci devs are saved ?
-Does this problem only happen with e1000? I think so.
-If it's only e1000 I think we should fix it - I think once the VM is
-stopped for doing the device migration it shouldn't be raising
-interrupts.
-
-Dave
-
->
-Thanks,
->
-Longpeng(Mike)
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-On 2019/7/8 下午5:47, Dr. David Alan Gilbert wrote:
-* longpeng (address@hidden) wrote:
-Hi guys,
-
-We found a qemu core in our testing environment, the assertion
-'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
-the bus->irq_count[i] is '-1'.
-
-Through analysis, it was happened after VM migration and we think
-it was caused by the following sequence:
-
-*Migration Source*
-1. save bus pci.0 state, including irq_count[x] ( =0 , old )
-2. save E1000:
-    e1000_pre_save
-     e1000_mit_timer
-      set_interrupt_cause
-       pci_set_irq --> update pci_dev->irq_state to 1 and
-                   update bus->irq_count[x] to 1 ( new )
-     the irq_state sent to dest.
-
-*Migration Dest*
-1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
-2. If the e1000 need change irqline , it would call to pci_irq_handler(),
-   the irq_state maybe change to 0 and bus->irq_count[x] will become
-   -1 in this situation.
-3. do VM reboot then the assertion will be triggered.
-
-We also found some guys faced the similar problem:
-[1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
-[2]
-https://bugs.launchpad.net/qemu/+bug/1702621
-Is there some patches to fix this problem ?
-I don't remember any.
-Can we save pcibus state after all the pci devs are saved ?
-Does this problem only happen with e1000? I think so.
-If it's only e1000 I think we should fix it - I think once the VM is
-stopped for doing the device migration it shouldn't be raising
-interrupts.
-I wonder maybe we can simply fix this by no setting ICS on pre_save()
-but scheduling mit timer unconditionally in post_load().
-Thanks
-Dave
-Thanks,
-Longpeng(Mike)
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-在 2019/7/10 11:25, Jason Wang 写道:
->
->
-On 2019/7/8 下午5:47, Dr. David Alan Gilbert wrote:
->
-> * longpeng (address@hidden) wrote:
->
->> Hi guys,
->
->>
->
->> We found a qemu core in our testing environment, the assertion
->
->> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
->
->> the bus->irq_count[i] is '-1'.
->
->>
->
->> Through analysis, it was happened after VM migration and we think
->
->> it was caused by the following sequence:
->
->>
->
->> *Migration Source*
->
->> 1. save bus pci.0 state, including irq_count[x] ( =0 , old )
->
->> 2. save E1000:
->
->>     e1000_pre_save
->
->>      e1000_mit_timer
->
->>       set_interrupt_cause
->
->>        pci_set_irq --> update pci_dev->irq_state to 1 and
->
->>                    update bus->irq_count[x] to 1 ( new )
->
->>      the irq_state sent to dest.
->
->>
->
->> *Migration Dest*
->
->> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
->
->> 2. If the e1000 need change irqline , it would call to pci_irq_handler(),
->
->>    the irq_state maybe change to 0 and bus->irq_count[x] will become
->
->>    -1 in this situation.
->
->> 3. do VM reboot then the assertion will be triggered.
->
->>
->
->> We also found some guys faced the similar problem:
->
->> [1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
->
->> [2]
-https://bugs.launchpad.net/qemu/+bug/1702621
->
->>
->
->> Is there some patches to fix this problem ?
->
-> I don't remember any.
->
->
->
->> Can we save pcibus state after all the pci devs are saved ?
->
-> Does this problem only happen with e1000? I think so.
->
-> If it's only e1000 I think we should fix it - I think once the VM is
->
-> stopped for doing the device migration it shouldn't be raising
->
-> interrupts.
->
->
->
-I wonder maybe we can simply fix this by no setting ICS on pre_save() but
->
-scheduling mit timer unconditionally in post_load().
->
-I also think this is a bug of e1000 because we find more cores with the same
-frame thease days.
-
-I'm not familiar with e1000 so hope someone could fix it, thanks. :)
-
->
-Thanks
->
->
->
->
->
-> Dave
->
->
->
->> Thanks,
->
->> Longpeng(Mike)
->
-> --
->
-> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->
-.
->
--- 
-Regards,
-Longpeng(Mike)
-
-On 2019/7/10 上午11:36, Longpeng (Mike) wrote:
-在 2019/7/10 11:25, Jason Wang 写道:
-On 2019/7/8 下午5:47, Dr. David Alan Gilbert wrote:
-* longpeng (address@hidden) wrote:
-Hi guys,
-
-We found a qemu core in our testing environment, the assertion
-'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
-the bus->irq_count[i] is '-1'.
-
-Through analysis, it was happened after VM migration and we think
-it was caused by the following sequence:
-
-*Migration Source*
-1. save bus pci.0 state, including irq_count[x] ( =0 , old )
-2. save E1000:
-     e1000_pre_save
-      e1000_mit_timer
-       set_interrupt_cause
-        pci_set_irq --> update pci_dev->irq_state to 1 and
-                    update bus->irq_count[x] to 1 ( new )
-      the irq_state sent to dest.
-
-*Migration Dest*
-1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
-2. If the e1000 need change irqline , it would call to pci_irq_handler(),
-    the irq_state maybe change to 0 and bus->irq_count[x] will become
-    -1 in this situation.
-3. do VM reboot then the assertion will be triggered.
-
-We also found some guys faced the similar problem:
-[1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
-[2]
-https://bugs.launchpad.net/qemu/+bug/1702621
-Is there some patches to fix this problem ?
-I don't remember any.
-Can we save pcibus state after all the pci devs are saved ?
-Does this problem only happen with e1000? I think so.
-If it's only e1000 I think we should fix it - I think once the VM is
-stopped for doing the device migration it shouldn't be raising
-interrupts.
-I wonder maybe we can simply fix this by no setting ICS on pre_save() but
-scheduling mit timer unconditionally in post_load().
-I also think this is a bug of e1000 because we find more cores with the same
-frame thease days.
-
-I'm not familiar with e1000 so hope someone could fix it, thanks. :)
-Draft a path in attachment, please test.
-
-Thanks
-Thanks
-Dave
-Thanks,
-Longpeng(Mike)
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-.
-0001-e1000-don-t-raise-interrupt-in-pre_save.patch
-Description:
-Text Data
-
-在 2019/7/10 11:57, Jason Wang 写道:
->
->
-On 2019/7/10 上午11:36, Longpeng (Mike) wrote:
->
-> 在 2019/7/10 11:25, Jason Wang 写道:
->
->> On 2019/7/8 下午5:47, Dr. David Alan Gilbert wrote:
->
->>> * longpeng (address@hidden) wrote:
->
->>>> Hi guys,
->
->>>>
->
->>>> We found a qemu core in our testing environment, the assertion
->
->>>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
->
->>>> the bus->irq_count[i] is '-1'.
->
->>>>
->
->>>> Through analysis, it was happened after VM migration and we think
->
->>>> it was caused by the following sequence:
->
->>>>
->
->>>> *Migration Source*
->
->>>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old )
->
->>>> 2. save E1000:
->
->>>>      e1000_pre_save
->
->>>>       e1000_mit_timer
->
->>>>        set_interrupt_cause
->
->>>>         pci_set_irq --> update pci_dev->irq_state to 1 and
->
->>>>                     update bus->irq_count[x] to 1 ( new )
->
->>>>       the irq_state sent to dest.
->
->>>>
->
->>>> *Migration Dest*
->
->>>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is
->
->>>> 1.
->
->>>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(),
->
->>>>     the irq_state maybe change to 0 and bus->irq_count[x] will become
->
->>>>     -1 in this situation.
->
->>>> 3. do VM reboot then the assertion will be triggered.
->
->>>>
->
->>>> We also found some guys faced the similar problem:
->
->>>> [1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
->
->>>> [2]
-https://bugs.launchpad.net/qemu/+bug/1702621
->
->>>>
->
->>>> Is there some patches to fix this problem ?
->
->>> I don't remember any.
->
->>>
->
->>>> Can we save pcibus state after all the pci devs are saved ?
->
->>> Does this problem only happen with e1000? I think so.
->
->>> If it's only e1000 I think we should fix it - I think once the VM is
->
->>> stopped for doing the device migration it shouldn't be raising
->
->>> interrupts.
->
->>
->
->> I wonder maybe we can simply fix this by no setting ICS on pre_save() but
->
->> scheduling mit timer unconditionally in post_load().
->
->>
->
-> I also think this is a bug of e1000 because we find more cores with the same
->
-> frame thease days.
->
->
->
-> I'm not familiar with e1000 so hope someone could fix it, thanks. :)
->
->
->
->
-Draft a path in attachment, please test.
->
-Thanks. We'll test it for a few weeks and then give you the feedback. :)
-
->
-Thanks
->
->
->
->> Thanks
->
->>
->
->>
->
->>> Dave
->
->>>
->
->>>> Thanks,
->
->>>> Longpeng(Mike)
->
->>> --Â
->
->>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->> .
->
->>
--- 
-Regards,
-Longpeng(Mike)
-
-在 2019/7/10 11:57, Jason Wang 写道:
->
->
-On 2019/7/10 上午11:36, Longpeng (Mike) wrote:
->
-> 在 2019/7/10 11:25, Jason Wang 写道:
->
->> On 2019/7/8 下午5:47, Dr. David Alan Gilbert wrote:
->
->>> * longpeng (address@hidden) wrote:
->
->>>> Hi guys,
->
->>>>
->
->>>> We found a qemu core in our testing environment, the assertion
->
->>>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
->
->>>> the bus->irq_count[i] is '-1'.
->
->>>>
->
->>>> Through analysis, it was happened after VM migration and we think
->
->>>> it was caused by the following sequence:
->
->>>>
->
->>>> *Migration Source*
->
->>>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old )
->
->>>> 2. save E1000:
->
->>>>      e1000_pre_save
->
->>>>       e1000_mit_timer
->
->>>>        set_interrupt_cause
->
->>>>         pci_set_irq --> update pci_dev->irq_state to 1 and
->
->>>>                     update bus->irq_count[x] to 1 ( new )
->
->>>>       the irq_state sent to dest.
->
->>>>
->
->>>> *Migration Dest*
->
->>>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is
->
->>>> 1.
->
->>>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(),
->
->>>>     the irq_state maybe change to 0 and bus->irq_count[x] will become
->
->>>>     -1 in this situation.
->
->>>> 3. do VM reboot then the assertion will be triggered.
->
->>>>
->
->>>> We also found some guys faced the similar problem:
->
->>>> [1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
->
->>>> [2]
-https://bugs.launchpad.net/qemu/+bug/1702621
->
->>>>
->
->>>> Is there some patches to fix this problem ?
->
->>> I don't remember any.
->
->>>
->
->>>> Can we save pcibus state after all the pci devs are saved ?
->
->>> Does this problem only happen with e1000? I think so.
->
->>> If it's only e1000 I think we should fix it - I think once the VM is
->
->>> stopped for doing the device migration it shouldn't be raising
->
->>> interrupts.
->
->>
->
->> I wonder maybe we can simply fix this by no setting ICS on pre_save() but
->
->> scheduling mit timer unconditionally in post_load().
->
->>
->
-> I also think this is a bug of e1000 because we find more cores with the same
->
-> frame thease days.
->
->
->
-> I'm not familiar with e1000 so hope someone could fix it, thanks. :)
->
->
->
->
-Draft a path in attachment, please test.
->
-Hi Jason,
-
-We've tested the patch for about two weeks, everything went well, thanks!
-
-Feel free to add my:
-Reported-and-tested-by: Longpeng <address@hidden>
-
->
-Thanks
->
->
->
->> Thanks
->
->>
->
->>
->
->>> Dave
->
->>>
->
->>>> Thanks,
->
->>>> Longpeng(Mike)
->
->>> --Â
->
->>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->> .
->
->>
--- 
-Regards,
-Longpeng(Mike)
-
-On 2019/7/27 下午2:10, Longpeng (Mike) wrote:
-在 2019/7/10 11:57, Jason Wang 写道:
-On 2019/7/10 上午11:36, Longpeng (Mike) wrote:
-在 2019/7/10 11:25, Jason Wang 写道:
-On 2019/7/8 下午5:47, Dr. David Alan Gilbert wrote:
-* longpeng (address@hidden) wrote:
-Hi guys,
-
-We found a qemu core in our testing environment, the assertion
-'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and
-the bus->irq_count[i] is '-1'.
-
-Through analysis, it was happened after VM migration and we think
-it was caused by the following sequence:
-
-*Migration Source*
-1. save bus pci.0 state, including irq_count[x] ( =0 , old )
-2. save E1000:
-      e1000_pre_save
-       e1000_mit_timer
-        set_interrupt_cause
-         pci_set_irq --> update pci_dev->irq_state to 1 and
-                     update bus->irq_count[x] to 1 ( new )
-       the irq_state sent to dest.
-
-*Migration Dest*
-1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1.
-2. If the e1000 need change irqline , it would call to pci_irq_handler(),
-     the irq_state maybe change to 0 and bus->irq_count[x] will become
-     -1 in this situation.
-3. do VM reboot then the assertion will be triggered.
-
-We also found some guys faced the similar problem:
-[1]
-https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html
-[2]
-https://bugs.launchpad.net/qemu/+bug/1702621
-Is there some patches to fix this problem ?
-I don't remember any.
-Can we save pcibus state after all the pci devs are saved ?
-Does this problem only happen with e1000? I think so.
-If it's only e1000 I think we should fix it - I think once the VM is
-stopped for doing the device migration it shouldn't be raising
-interrupts.
-I wonder maybe we can simply fix this by no setting ICS on pre_save() but
-scheduling mit timer unconditionally in post_load().
-I also think this is a bug of e1000 because we find more cores with the same
-frame thease days.
-
-I'm not familiar with e1000 so hope someone could fix it, thanks. :)
-Draft a path in attachment, please test.
-Hi Jason,
-
-We've tested the patch for about two weeks, everything went well, thanks!
-
-Feel free to add my:
-Reported-and-tested-by: Longpeng <address@hidden>
-Applied.
-
-Thanks
-Thanks
-Thanks
-Dave
-Thanks,
-Longpeng(Mike)
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-.
-
diff --git a/results/classifier/008/other/70021271 b/results/classifier/008/other/70021271
deleted file mode 100644
index f35111615..000000000
--- a/results/classifier/008/other/70021271
+++ /dev/null
@@ -1,7458 +0,0 @@
-other: 0.963
-graphic: 0.958
-permissions: 0.957
-debug: 0.956
-performance: 0.949
-KVM: 0.949
-semantic: 0.946
-vnc: 0.900
-device: 0.887
-PID: 0.880
-socket: 0.873
-network: 0.873
-boot: 0.872
-files: 0.867
-
-[Qemu-devel] [BUG]Unassigned mem write during pci device hot-plug
-
-Hi all,
-
-In our test, we configured VM with several pci-bridges and a virtio-net nic 
-been attached with bus 4,
-After VM is startup, We ping this nic from host to judge if it is working 
-normally. Then, we hot add pci devices to this VM with bus 0.
-We  found the virtio-net NIC in bus 4 is not working (can not connect) 
-occasionally, as it kick virtio backend failure with error below:
-    Unassigned mem write 00000000fc803004 = 0x1
-
-memory-region: pci_bridge_pci
-  0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
-    00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
-      00000000fc800000-00000000fc800fff (prio 0, RW): virtio-pci-common
-      00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
-      00000000fc802000-00000000fc802fff (prio 0, RW): virtio-pci-device
-      00000000fc803000-00000000fc803fff (prio 0, RW): virtio-pci-notify  <- io 
-mem unassigned
-      …
-
-We caught an exceptional address changing while this problem happened, show as 
-follow:
-Before pci_bridge_update_mappings:
-      00000000fc000000-00000000fc1fffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fc000000-00000000fc1fffff
-      00000000fc200000-00000000fc3fffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fc200000-00000000fc3fffff
-      00000000fc400000-00000000fc5fffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fc400000-00000000fc5fffff
-      00000000fc600000-00000000fc7fffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fc600000-00000000fc7fffff
-      00000000fc800000-00000000fc9fffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fc800000-00000000fc9fffff <- correct Adress Spce
-      00000000fca00000-00000000fcbfffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fca00000-00000000fcbfffff
-      00000000fcc00000-00000000fcdfffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fcc00000-00000000fcdfffff
-      00000000fce00000-00000000fcffffff (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci 00000000fce00000-00000000fcffffff
-
-After pci_bridge_update_mappings:
-      00000000fda00000-00000000fdbfffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fda00000-00000000fdbfffff
-      00000000fdc00000-00000000fddfffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fdc00000-00000000fddfffff
-      00000000fde00000-00000000fdffffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fde00000-00000000fdffffff
-      00000000fe000000-00000000fe1fffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fe000000-00000000fe1fffff
-      00000000fe200000-00000000fe3fffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fe200000-00000000fe3fffff
-      00000000fe400000-00000000fe5fffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fe400000-00000000fe5fffff
-      00000000fe600000-00000000fe7fffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fe600000-00000000fe7fffff
-      00000000fe800000-00000000fe9fffff (prio 1, RW): alias pci_bridge_mem 
-@pci_bridge_pci 00000000fe800000-00000000fe9fffff
-      fffffffffc800000-fffffffffc800000 (prio 1, RW): alias pci_bridge_pref_mem 
-@pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress Space
-
-We have figured out why this address becomes this value,  according to pci 
-spec,  pci driver can get BAR address size by writing 0xffffffff to
-the pci register firstly, and then read back the value from this register.
-We didn't handle this value  specially while process pci write in qemu, the 
-function call stack is:
-Pci_bridge_dev_write_config
--> pci_bridge_write_config
--> pci_default_write_config (we update the config[address] value here to 
-fffffffffc800000, which should be 0xfc800000 )
--> pci_bridge_update_mappings
-                ->pci_bridge_region_del(br, br->windows);
--> pci_bridge_region_init
-                                                                
-->pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong value 
-fffffffffc800000)
-                                                -> 
-memory_region_transaction_commit
-
-So, as we can see, we use the wrong base address in qemu to update the memory 
-regions, though, we update the base address to
-The correct value after pci driver in VM write the original value back, the 
-virtio NIC in bus 4 may still sends net packets concurrently with
-The wrong memory region address.
-
-We have tried to skip the memory region update action in qemu while detect pci 
-write with 0xffffffff value, and it does work, but
-This seems to be not gently.
-
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
-index b2e50c3..84b405d 100644
---- a/hw/pci/pci_bridge.c
-+++ b/hw/pci/pci_bridge.c
-@@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
-     pci_default_write_config(d, address, val, len);
--    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
-+    if ( (val != 0xffffffff) &&
-+        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
-         /* io base/limit */
-         ranges_overlap(address, len, PCI_IO_BASE, 2) ||
-@@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
-         ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
-         /* vga enable */
--        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
-+        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
-         pci_bridge_update_mappings(s);
-     }
-
-Thinks,
-Xu
-
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-Hi all,
->
->
->
->
-In our test, we configured VM with several pci-bridges and a virtio-net nic
->
-been attached with bus 4,
->
->
-After VM is startup, We ping this nic from host to judge if it is working
->
-normally. Then, we hot add pci devices to this VM with bus 0.
->
->
-We  found the virtio-net NIC in bus 4 is not working (can not connect)
->
-occasionally, as it kick virtio backend failure with error below:
->
->
-Unassigned mem write 00000000fc803004 = 0x1
-Thanks for the report. Which guest was used to produce this problem?
-
--- 
-MST
-
-n Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> Hi all,
->
->
->
->
->
->
->
-> In our test, we configured VM with several pci-bridges and a
->
-> virtio-net nic been attached with bus 4,
->
->
->
-> After VM is startup, We ping this nic from host to judge if it is
->
-> working normally. Then, we hot add pci devices to this VM with bus 0.
->
->
->
-> We  found the virtio-net NIC in bus 4 is not working (can not connect)
->
-> occasionally, as it kick virtio backend failure with error below:
->
->
->
->     Unassigned mem write 00000000fc803004 = 0x1
->
->
-Thanks for the report. Which guest was used to produce this problem?
->
->
---
->
-MST
-I was seeing this problem when I hotplug a VFIO device to guest CentOS 7.4,
-after that I compiled the latest Linux kernel and it also contains this problem.
-
-Thinks,
-Xu
-
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-Hi all,
->
->
->
->
-In our test, we configured VM with several pci-bridges and a virtio-net nic
->
-been attached with bus 4,
->
->
-After VM is startup, We ping this nic from host to judge if it is working
->
-normally. Then, we hot add pci devices to this VM with bus 0.
->
->
-We  found the virtio-net NIC in bus 4 is not working (can not connect)
->
-occasionally, as it kick virtio backend failure with error below:
->
->
-Unassigned mem write 00000000fc803004 = 0x1
->
->
->
->
-memory-region: pci_bridge_pci
->
->
-0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
->
->
-00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
->
-00000000fc800000-00000000fc800fff (prio 0, RW): virtio-pci-common
->
->
-00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
->
->
-00000000fc802000-00000000fc802fff (prio 0, RW): virtio-pci-device
->
->
-00000000fc803000-00000000fc803fff (prio 0, RW): virtio-pci-notify  <- io
->
-mem unassigned
->
->
-…
->
->
->
->
-We caught an exceptional address changing while this problem happened, show as
->
-follow:
->
->
-Before pci_bridge_update_mappings:
->
->
-00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fc000000-00000000fc1fffff
->
->
-00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fc200000-00000000fc3fffff
->
->
-00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fc400000-00000000fc5fffff
->
->
-00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fc600000-00000000fc7fffff
->
->
-00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fc800000-00000000fc9fffff <- correct Adress Spce
->
->
-00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fca00000-00000000fcbfffff
->
->
-00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fcc00000-00000000fcdfffff
->
->
-00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci 00000000fce00000-00000000fcffffff
->
->
->
->
-After pci_bridge_update_mappings:
->
->
-00000000fda00000-00000000fdbfffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fda00000-00000000fdbfffff
->
->
-00000000fdc00000-00000000fddfffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fdc00000-00000000fddfffff
->
->
-00000000fde00000-00000000fdffffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fde00000-00000000fdffffff
->
->
-00000000fe000000-00000000fe1fffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fe000000-00000000fe1fffff
->
->
-00000000fe200000-00000000fe3fffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fe200000-00000000fe3fffff
->
->
-00000000fe400000-00000000fe5fffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fe400000-00000000fe5fffff
->
->
-00000000fe600000-00000000fe7fffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fe600000-00000000fe7fffff
->
->
-00000000fe800000-00000000fe9fffff (prio 1, RW): alias pci_bridge_mem
->
-@pci_bridge_pci 00000000fe800000-00000000fe9fffff
->
->
-fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-@pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress
->
-Space
-This one is empty though right?
-
->
->
->
-We have figured out why this address becomes this value,  according to pci
->
-spec,  pci driver can get BAR address size by writing 0xffffffff to
->
->
-the pci register firstly, and then read back the value from this register.
-OK however as you show below the BAR being sized is the BAR
-if a bridge. Are you then adding a bridge device by hotplug?
-
-
-
->
-We didn't handle this value  specially while process pci write in qemu, the
->
-function call stack is:
->
->
-Pci_bridge_dev_write_config
->
->
--> pci_bridge_write_config
->
->
--> pci_default_write_config (we update the config[address] value here to
->
-fffffffffc800000, which should be 0xfc800000 )
->
->
--> pci_bridge_update_mappings
->
->
-->pci_bridge_region_del(br, br->windows);
->
->
--> pci_bridge_region_init
->
->
-->
->
-pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong value
->
-fffffffffc800000)
->
->
-->
->
-memory_region_transaction_commit
->
->
->
->
-So, as we can see, we use the wrong base address in qemu to update the memory
->
-regions, though, we update the base address to
->
->
-The correct value after pci driver in VM write the original value back, the
->
-virtio NIC in bus 4 may still sends net packets concurrently with
->
->
-The wrong memory region address.
->
->
->
->
-We have tried to skip the memory region update action in qemu while detect pci
->
-write with 0xffffffff value, and it does work, but
->
->
-This seems to be not gently.
-For sure. But I'm still puzzled as to why does Linux try to
-size the BAR of the bridge while a device behind it is
-used.
-
-Can you pls post your QEMU command line?
-
-
-
->
->
->
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
->
->
-index b2e50c3..84b405d 100644
->
->
---- a/hw/pci/pci_bridge.c
->
->
-+++ b/hw/pci/pci_bridge.c
->
->
-@@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
->
->
-pci_default_write_config(d, address, val, len);
->
->
--    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
-+    if ( (val != 0xffffffff) &&
->
->
-+        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
-/* io base/limit */
->
->
-ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
->
-@@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
->
->
-ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
->
-/* vga enable */
->
->
--        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
->
-+        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
->
->
-pci_bridge_update_mappings(s);
->
->
-}
->
->
->
->
-Thinks,
->
->
-Xu
->
-
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> Hi all,
->
->
->
->
->
->
->
-> In our test, we configured VM with several pci-bridges and a
->
-> virtio-net nic been attached with bus 4,
->
->
->
-> After VM is startup, We ping this nic from host to judge if it is
->
-> working normally. Then, we hot add pci devices to this VM with bus 0.
->
->
->
-> We  found the virtio-net NIC in bus 4 is not working (can not connect)
->
-> occasionally, as it kick virtio backend failure with error below:
->
->
->
->     Unassigned mem write 00000000fc803004 = 0x1
->
->
->
->
->
->
->
-> memory-region: pci_bridge_pci
->
->
->
->   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
->
->
->
->     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
->
->
->       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> virtio-pci-common
->
->
->
->       00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
->
->
->
->       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> virtio-pci-device
->
->
->
->       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> virtio-pci-notify  <- io mem unassigned
->
->
->
->       …
->
->
->
->
->
->
->
-> We caught an exceptional address changing while this problem happened,
->
-> show as
->
-> follow:
->
->
->
-> Before pci_bridge_update_mappings:
->
->
->
->       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fc000000-00000000fc1fffff
->
->
->
->       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fc200000-00000000fc3fffff
->
->
->
->       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fc400000-00000000fc5fffff
->
->
->
->       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fc600000-00000000fc7fffff
->
->
->
->       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fc800000-00000000fc9fffff
->
-> <- correct Adress Spce
->
->
->
->       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fca00000-00000000fcbfffff
->
->
->
->       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fcc00000-00000000fcdfffff
->
->
->
->       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> pci_bridge_pref_mem @pci_bridge_pci 00000000fce00000-00000000fcffffff
->
->
->
->
->
->
->
-> After pci_bridge_update_mappings:
->
->
->
->       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
->
->
->
->       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
->
->
->
->       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
->
->
->
->       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
->
->
->
->       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
->
->
->
->       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
->
->
->
->       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
->
->
->
->       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
->
->
->
->       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-> pci_bridge_pref_mem
->
-> @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress
->
-Space
->
->
-This one is empty though right?
->
->
->
->
->
->
-> We have figured out why this address becomes this value,  according to
->
-> pci spec,  pci driver can get BAR address size by writing 0xffffffff
->
-> to
->
->
->
-> the pci register firstly, and then read back the value from this register.
->
->
->
-OK however as you show below the BAR being sized is the BAR if a bridge. Are
->
-you then adding a bridge device by hotplug?
-No, I just simply hot plugged a VFIO device to Bus 0, another interesting 
-phenomenon is
-If I hot plug the device to other bus, this doesn't happened.
- 
->
->
->
-> We didn't handle this value  specially while process pci write in
->
-> qemu, the function call stack is:
->
->
->
-> Pci_bridge_dev_write_config
->
->
->
-> -> pci_bridge_write_config
->
->
->
-> -> pci_default_write_config (we update the config[address] value here
->
-> -> to
->
-> fffffffffc800000, which should be 0xfc800000 )
->
->
->
-> -> pci_bridge_update_mappings
->
->
->
->                 ->pci_bridge_region_del(br, br->windows);
->
->
->
-> -> pci_bridge_region_init
->
->
->
->                                                                 ->
->
-> pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
->
-> value
->
-> fffffffffc800000)
->
->
->
->                                                 ->
->
-> memory_region_transaction_commit
->
->
->
->
->
->
->
-> So, as we can see, we use the wrong base address in qemu to update the
->
-> memory regions, though, we update the base address to
->
->
->
-> The correct value after pci driver in VM write the original value
->
-> back, the virtio NIC in bus 4 may still sends net packets concurrently
->
-> with
->
->
->
-> The wrong memory region address.
->
->
->
->
->
->
->
-> We have tried to skip the memory region update action in qemu while
->
-> detect pci write with 0xffffffff value, and it does work, but
->
->
->
-> This seems to be not gently.
->
->
-For sure. But I'm still puzzled as to why does Linux try to size the BAR of
->
-the
->
-bridge while a device behind it is used.
->
->
-Can you pls post your QEMU command line?
-My QEMU command line:
-/root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -object 
-secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-Linux/master-key.aes
- -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu 
-host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m 
-size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp 
-20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -numa 
-node,nodeid=1,cpus=5-9,mem=1024 -numa node,nodeid=2,cpus=10-14,mem=1024 -numa 
-node,nodeid=3,cpus=15-19,mem=1024 -uuid 34a588c7-b0f2-4952-b39c-47fae3411439 
--no-user-config -nodefaults -chardev 
-socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/monitor.sock,server,nowait
- -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet 
--global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device 
-pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device 
-pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device 
-pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device 
-pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device 
-pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device 
-piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
-usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device 
-nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device 
-virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device 
-virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device 
-virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device 
-virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device 
-virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive 
-file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none
- -device 
-virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
- -drive if=none,id=drive-ide0-1-1,readonly=on,cache=none -device 
-ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev 
-tap,fd=35,id=hostnet0 -device 
-virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4,addr=0x1 
--chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
--device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device 
-cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device 
-virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
-
-I am also very curious about this issue, in the linux kernel code, maybe double 
-check in function pci_bridge_check_ranges triggered this problem.
-
-
->
->
->
->
->
->
->
->
-> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
->
->
->
-> index b2e50c3..84b405d 100644
->
->
->
-> --- a/hw/pci/pci_bridge.c
->
->
->
-> +++ b/hw/pci/pci_bridge.c
->
->
->
-> @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
->
->
->
->      pci_default_write_config(d, address, val, len);
->
->
->
-> -    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
->
-> +    if ( (val != 0xffffffff) &&
->
->
->
-> +        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
->
->          /* io base/limit */
->
->
->
->          ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
->
->
-> @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
->
->
->
->          ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
->
->
->          /* vga enable */
->
->
->
-> -        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
->
->
-> +        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
->
->
->
->          pci_bridge_update_mappings(s);
->
->
->
->      }
->
->
->
->
->
->
->
-> Thinks,
->
->
->
-> Xu
->
->
-
-On Mon, Dec 10, 2018 at 03:12:53AM +0000, xuyandong wrote:
->
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > Hi all,
->
-> >
->
-> >
->
-> >
->
-> > In our test, we configured VM with several pci-bridges and a
->
-> > virtio-net nic been attached with bus 4,
->
-> >
->
-> > After VM is startup, We ping this nic from host to judge if it is
->
-> > working normally. Then, we hot add pci devices to this VM with bus 0.
->
-> >
->
-> > We  found the virtio-net NIC in bus 4 is not working (can not connect)
->
-> > occasionally, as it kick virtio backend failure with error below:
->
-> >
->
-> >     Unassigned mem write 00000000fc803004 = 0x1
->
-> >
->
-> >
->
-> >
->
-> > memory-region: pci_bridge_pci
->
-> >
->
-> >   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
->
-> >
->
-> >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
-> >
->
-> >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > virtio-pci-common
->
-> >
->
-> >       00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr
->
-> >
->
-> >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > virtio-pci-device
->
-> >
->
-> >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > virtio-pci-notify  <- io mem unassigned
->
-> >
->
-> >       …
->
-> >
->
-> >
->
-> >
->
-> > We caught an exceptional address changing while this problem happened,
->
-> > show as
->
-> > follow:
->
-> >
->
-> > Before pci_bridge_update_mappings:
->
-> >
->
-> >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc000000-00000000fc1fffff
->
-> >
->
-> >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc200000-00000000fc3fffff
->
-> >
->
-> >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc400000-00000000fc5fffff
->
-> >
->
-> >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc600000-00000000fc7fffff
->
-> >
->
-> >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc800000-00000000fc9fffff
->
-> > <- correct Adress Spce
->
-> >
->
-> >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fca00000-00000000fcbfffff
->
-> >
->
-> >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fcc00000-00000000fcdfffff
->
-> >
->
-> >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> > pci_bridge_pref_mem @pci_bridge_pci 00000000fce00000-00000000fcffffff
->
-> >
->
-> >
->
-> >
->
-> > After pci_bridge_update_mappings:
->
-> >
->
-> >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
->
-> >
->
-> >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
->
-> >
->
-> >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
->
-> >
->
-> >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
->
-> >
->
-> >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
->
-> >
->
-> >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
->
-> >
->
-> >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
->
-> >
->
-> >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
->
-> >
->
-> >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-> > pci_bridge_pref_mem
->
-> > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional Adress
->
-> Space
->
->
->
-> This one is empty though right?
->
->
->
-> >
->
-> >
->
-> > We have figured out why this address becomes this value,  according to
->
-> > pci spec,  pci driver can get BAR address size by writing 0xffffffff
->
-> > to
->
-> >
->
-> > the pci register firstly, and then read back the value from this register.
->
->
->
->
->
-> OK however as you show below the BAR being sized is the BAR if a bridge. Are
->
-> you then adding a bridge device by hotplug?
->
->
-No, I just simply hot plugged a VFIO device to Bus 0, another interesting
->
-phenomenon is
->
-If I hot plug the device to other bus, this doesn't happened.
->
->
->
->
->
->
-> > We didn't handle this value  specially while process pci write in
->
-> > qemu, the function call stack is:
->
-> >
->
-> > Pci_bridge_dev_write_config
->
-> >
->
-> > -> pci_bridge_write_config
->
-> >
->
-> > -> pci_default_write_config (we update the config[address] value here
->
-> > -> to
->
-> > fffffffffc800000, which should be 0xfc800000 )
->
-> >
->
-> > -> pci_bridge_update_mappings
->
-> >
->
-> >                 ->pci_bridge_region_del(br, br->windows);
->
-> >
->
-> > -> pci_bridge_region_init
->
-> >
->
-> >                                                                 ->
->
-> > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
->
-> > value
->
-> > fffffffffc800000)
->
-> >
->
-> >                                                 ->
->
-> > memory_region_transaction_commit
->
-> >
->
-> >
->
-> >
->
-> > So, as we can see, we use the wrong base address in qemu to update the
->
-> > memory regions, though, we update the base address to
->
-> >
->
-> > The correct value after pci driver in VM write the original value
->
-> > back, the virtio NIC in bus 4 may still sends net packets concurrently
->
-> > with
->
-> >
->
-> > The wrong memory region address.
->
-> >
->
-> >
->
-> >
->
-> > We have tried to skip the memory region update action in qemu while
->
-> > detect pci write with 0xffffffff value, and it does work, but
->
-> >
->
-> > This seems to be not gently.
->
->
->
-> For sure. But I'm still puzzled as to why does Linux try to size the BAR of
->
-> the
->
-> bridge while a device behind it is used.
->
->
->
-> Can you pls post your QEMU command line?
->
->
-My QEMU command line:
->
-/root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -object
->
-secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-Linux/master-key.aes
->
--machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
->
-20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -numa
->
-node,nodeid=1,cpus=5-9,mem=1024 -numa node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-node,nodeid=3,cpus=15-19,mem=1024 -uuid 34a588c7-b0f2-4952-b39c-47fae3411439
->
--no-user-config -nodefaults -chardev
->
-socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/monitor.sock,server,nowait
->
--mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
->
--global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device
->
-pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none
->
--device
->
-virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
->
--drive if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
->
-tap,fd=35,id=hostnet0 -device
->
-virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4,addr=0x1
->
--chardev pty,id=charserial0 -device
->
-isa-serial,chardev=charserial0,id=serial0 -device
->
-usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
->
->
-I am also very curious about this issue, in the linux kernel code, maybe
->
-double check in function pci_bridge_check_ranges triggered this problem.
-If you can get the stacktrace in Linux when it tries to write this
-fffff value, that would be quite helpful.
-
-
->
->
->
->
->
->
->
->
-> >
->
-> >
->
-> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
->
-> >
->
-> > index b2e50c3..84b405d 100644
->
-> >
->
-> > --- a/hw/pci/pci_bridge.c
->
-> >
->
-> > +++ b/hw/pci/pci_bridge.c
->
-> >
->
-> > @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
->
-> >
->
-> >      pci_default_write_config(d, address, val, len);
->
-> >
->
-> > -    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
-> >
->
-> > +    if ( (val != 0xffffffff) &&
->
-> >
->
-> > +        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
-> >
->
-> >          /* io base/limit */
->
-> >
->
-> >          ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-> >
->
-> > @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
->
-> >
->
-> >          ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-> >
->
-> >          /* vga enable */
->
-> >
->
-> > -        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
-> >
->
-> > +        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
->
-> >
->
-> >          pci_bridge_update_mappings(s);
->
-> >
->
-> >      }
->
-> >
->
-> >
->
-> >
->
-> > Thinks,
->
-> >
->
-> > Xu
->
-> >
-
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > Hi all,
->
-> > >
->
-> > >
->
-> > >
->
-> > > In our test, we configured VM with several pci-bridges and a
->
-> > > virtio-net nic been attached with bus 4,
->
-> > >
->
-> > > After VM is startup, We ping this nic from host to judge if it is
->
-> > > working normally. Then, we hot add pci devices to this VM with bus 0.
->
-> > >
->
-> > > We  found the virtio-net NIC in bus 4 is not working (can not
->
-> > > connect) occasionally, as it kick virtio backend failure with error
->
-> > > below:
->
-> > >
->
-> > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > >
->
-> > >
->
-> > >
->
-> > > memory-region: pci_bridge_pci
->
-> > >
->
-> > >   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
->
-> > >
->
-> > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
-> > >
->
-> > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > virtio-pci-common
->
-> > >
->
-> > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > virtio-pci-isr
->
-> > >
->
-> > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > virtio-pci-device
->
-> > >
->
-> > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > virtio-pci-notify  <- io mem unassigned
->
-> > >
->
-> > >       …
->
-> > >
->
-> > >
->
-> > >
->
-> > > We caught an exceptional address changing while this problem
->
-> > > happened, show as
->
-> > > follow:
->
-> > >
->
-> > > Before pci_bridge_update_mappings:
->
-> > >
->
-> > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fc000000-00000000fc1fffff
->
-> > >
->
-> > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fc200000-00000000fc3fffff
->
-> > >
->
-> > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fc400000-00000000fc5fffff
->
-> > >
->
-> > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fc600000-00000000fc7fffff
->
-> > >
->
-> > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fc800000-00000000fc9fffff
->
-> > > <- correct Adress Spce
->
-> > >
->
-> > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fca00000-00000000fcbfffff
->
-> > >
->
-> > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fcc00000-00000000fcdfffff
->
-> > >
->
-> > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > 00000000fce00000-00000000fcffffff
->
-> > >
->
-> > >
->
-> > >
->
-> > > After pci_bridge_update_mappings:
->
-> > >
->
-> > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
->
-> > >
->
-> > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
->
-> > >
->
-> > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
->
-> > >
->
-> > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
->
-> > >
->
-> > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
->
-> > >
->
-> > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
->
-> > >
->
-> > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
->
-> > >
->
-> > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> > > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
->
-> > >
->
-> > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-pci_bridge_pref_mem
->
-> > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
->
-> > > Adress
->
-> > Space
->
-> >
->
-> > This one is empty though right?
->
-> >
->
-> > >
->
-> > >
->
-> > > We have figured out why this address becomes this value,
->
-> > > according to pci spec,  pci driver can get BAR address size by
->
-> > > writing 0xffffffff to
->
-> > >
->
-> > > the pci register firstly, and then read back the value from this
->
-> > > register.
->
-> >
->
-> >
->
-> > OK however as you show below the BAR being sized is the BAR if a
->
-> > bridge. Are you then adding a bridge device by hotplug?
->
->
->
-> No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> interesting phenomenon is If I hot plug the device to other bus, this
->
-> doesn't
->
-happened.
->
->
->
-> >
->
-> >
->
-> > > We didn't handle this value  specially while process pci write in
->
-> > > qemu, the function call stack is:
->
-> > >
->
-> > > Pci_bridge_dev_write_config
->
-> > >
->
-> > > -> pci_bridge_write_config
->
-> > >
->
-> > > -> pci_default_write_config (we update the config[address] value
->
-> > > -> here to
->
-> > > fffffffffc800000, which should be 0xfc800000 )
->
-> > >
->
-> > > -> pci_bridge_update_mappings
->
-> > >
->
-> > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > >
->
-> > > -> pci_bridge_region_init
->
-> > >
->
-> > >                                                                 ->
->
-> > > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
->
-> > > value
->
-> > > fffffffffc800000)
->
-> > >
->
-> > >                                                 ->
->
-> > > memory_region_transaction_commit
->
-> > >
->
-> > >
->
-> > >
->
-> > > So, as we can see, we use the wrong base address in qemu to update
->
-> > > the memory regions, though, we update the base address to
->
-> > >
->
-> > > The correct value after pci driver in VM write the original value
->
-> > > back, the virtio NIC in bus 4 may still sends net packets
->
-> > > concurrently with
->
-> > >
->
-> > > The wrong memory region address.
->
-> > >
->
-> > >
->
-> > >
->
-> > > We have tried to skip the memory region update action in qemu
->
-> > > while detect pci write with 0xffffffff value, and it does work,
->
-> > > but
->
-> > >
->
-> > > This seems to be not gently.
->
-> >
->
-> > For sure. But I'm still puzzled as to why does Linux try to size the
->
-> > BAR of the bridge while a device behind it is used.
->
-> >
->
-> > Can you pls post your QEMU command line?
->
->
->
-> My QEMU command line:
->
-> /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
->
-> -object
->
-> secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-
->
-> Linux/master-key.aes -machine
->
-> pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
->
-> 20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024
->
-> -numa node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
->
-> -chardev
->
-> socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/moni
->
-> tor.sock,server,nowait -mon
->
-> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
->
-> -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on
->
-> -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-v
->
-> irtio-disk0,cache=none -device
->
-> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id
->
-> =virtio-disk0,bootindex=1 -drive
->
-> if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
->
-> tap,fd=35,id=hostnet0 -device
->
-> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4
->
-> ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> isa-serial,chardev=charserial0,id=serial0 -device
->
-> usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
->
->
->
-> I am also very curious about this issue, in the linux kernel code, maybe
->
-> double
->
-check in function pci_bridge_check_ranges triggered this problem.
->
->
-If you can get the stacktrace in Linux when it tries to write this fffff
->
-value, that
->
-would be quite helpful.
->
-After I add mdelay(100) in function pci_bridge_check_ranges, this phenomenon is
-easier to reproduce, below is my modify in kernel:
-diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
-index cb389277..86e232d 100644
---- a/drivers/pci/setup-bus.c
-+++ b/drivers/pci/setup-bus.c
-@@ -27,7 +27,7 @@
- #include <linux/slab.h>
- #include <linux/acpi.h>
- #include "pci.h"
--
-+#include <linux/delay.h>
- unsigned int pci_flags;
- 
- struct pci_dev_resource {
-@@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
-                pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
-                                               0xffffffff);
-                pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp);
-+               mdelay(100);
-+               printk(KERN_ERR "sleep\n");
-+                dump_stack();
-                if (!tmp)
-                        b_res[2].flags &= ~IORESOURCE_MEM_64;
-                pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
-
-After hot plugging, we get the following log:
-
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:14.0: BAR 0: assigned [mem 
-0xc2360000-0xc237ffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:14.0: BAR 3: assigned [mem 
-0xc2328000-0xc232bfff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:18 uefi-linux kernel: sleep
-Dec 11 09:28:18 uefi-linux kernel: CPU: 16 PID: 502 Comm: kworker/u40:1 Not 
-tainted 4.11.0-rc3+ #11
-Dec 11 09:28:18 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
-PIIX, 1996), BIOS 0.0.0 02/06/2015
-Dec 11 09:28:18 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
-Dec 11 09:28:18 uefi-linux kernel: Call Trace:
-Dec 11 09:28:18 uefi-linux kernel: dump_stack+0x63/0x87
-Dec 11 09:28:18 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
-Dec 11 09:28:18 uefi-linux kernel: ? dev_printk+0x4d/0x50
-Dec 11 09:28:18 uefi-linux kernel: enable_slot+0x140/0x2f0
-Dec 11 09:28:18 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
-Dec 11 09:28:18 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:18 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
-Dec 11 09:28:18 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
-Dec 11 09:28:18 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
-Dec 11 09:28:18 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
-Dec 11 09:28:18 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
-Dec 11 09:28:18 uefi-linux kernel: process_one_work+0x165/0x410
-Dec 11 09:28:18 uefi-linux kernel: worker_thread+0x137/0x4c0
-Dec 11 09:28:18 uefi-linux kernel: kthread+0x101/0x140
-Dec 11 09:28:18 uefi-linux kernel: ? rescuer_thread+0x380/0x380
-Dec 11 09:28:18 uefi-linux kernel: ? kthread_park+0x90/0x90
-Dec 11 09:28:18 uefi-linux kernel: ret_from_fork+0x2c/0x40
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:18 uefi-linux kernel: sleep
-Dec 11 09:28:18 uefi-linux kernel: CPU: 16 PID: 502 Comm: kworker/u40:1 Not 
-tainted 4.11.0-rc3+ #11
-Dec 11 09:28:18 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
-PIIX, 1996), BIOS 0.0.0 02/06/2015
-Dec 11 09:28:18 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
-Dec 11 09:28:18 uefi-linux kernel: Call Trace:
-Dec 11 09:28:18 uefi-linux kernel: dump_stack+0x63/0x87
-Dec 11 09:28:18 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
-Dec 11 09:28:18 uefi-linux kernel: ? dev_printk+0x4d/0x50
-Dec 11 09:28:18 uefi-linux kernel: enable_slot+0x140/0x2f0
-Dec 11 09:28:18 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
-Dec 11 09:28:18 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:18 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
-Dec 11 09:28:18 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
-Dec 11 09:28:18 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
-Dec 11 09:28:18 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
-Dec 11 09:28:18 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
-Dec 11 09:28:18 uefi-linux kernel: process_one_work+0x165/0x410
-Dec 11 09:28:18 uefi-linux kernel: worker_thread+0x137/0x4c0
-Dec 11 09:28:18 uefi-linux kernel: kthread+0x101/0x140
-Dec 11 09:28:18 uefi-linux kernel: ? rescuer_thread+0x380/0x380
-Dec 11 09:28:18 uefi-linux kernel: ? kthread_park+0x90/0x90
-Dec 11 09:28:18 uefi-linux kernel: ret_from_fork+0x2c/0x40
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:19 uefi-linux kernel: sleep
-Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not 
-tainted 4.11.0-rc3+ #11
-Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
-PIIX, 1996), BIOS 0.0.0 02/06/2015
-Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
-Dec 11 09:28:19 uefi-linux kernel: Call Trace:
-Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87
-Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
-Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50
-Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0
-Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
-Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
-Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
-Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
-Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
-Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410
-Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0
-Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140
-Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380
-Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90
-Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:19 uefi-linux kernel: sleep
-Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not 
-tainted 4.11.0-rc3+ #11
-Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
-PIIX, 1996), BIOS 0.0.0 02/06/2015
-Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
-Dec 11 09:28:19 uefi-linux kernel: Call Trace:
-Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87
-Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
-Dec 11 09:28:19 uefi-linux kernel: ? pci_conf1_read+0xba/0x100
-Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0xe9/0x960
-Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50
-Dec 11 09:28:19 uefi-linux kernel: ? pcibios_allocate_rom_resources+0x45/0x80
-Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0
-Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
-Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
-Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
-Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
-Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
-Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410
-Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0
-Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140
-Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380
-Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90
-Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40
-Dec 11 09:28:19 uefi-linux kernel: sleep
-Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not 
-tainted 4.11.0-rc3+ #11
-Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + 
-PIIX, 1996), BIOS 0.0.0 02/06/2015
-Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
-Dec 11 09:28:19 uefi-linux kernel: Call Trace:
-Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87
-Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960
-Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50
-Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0
-Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80
-Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120
-Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0
-Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0
-Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3
-Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29
-Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410
-Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0
-Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140
-Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380
-Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90
-Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 lost sync at byte 1
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at 
-isa0060/serio1/input0 - driver resynced.
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0:   bridge window [io  
-0xf000-0xffff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2800000-0xc29fffff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0:   bridge window [mem 
-0xc2b00000-0xc2cfffff 64bit pref]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0:   bridge window [io  
-0xe000-0xefff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2600000-0xc27fffff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0:   bridge window [mem 
-0xc2d00000-0xc2efffff 64bit pref]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [io  
-0xd000-0xdfff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2400000-0xc25fffff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0:   bridge window [mem 
-0xc2f00000-0xc30fffff 64bit pref]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [io  
-0xc000-0xcfff]
-Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0:   bridge window [mem 
-0xc2000000-0xc21fffff]
-
->
->
->
-> >
->
-> >
->
-> >
->
-> > >
->
-> > >
->
-> > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
->
-> > >
->
-> > > index b2e50c3..84b405d 100644
->
-> > >
->
-> > > --- a/hw/pci/pci_bridge.c
->
-> > >
->
-> > > +++ b/hw/pci/pci_bridge.c
->
-> > >
->
-> > > @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d,
->
-> > >
->
-> > >      pci_default_write_config(d, address, val, len);
->
-> > >
->
-> > > -    if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
-> > >
->
-> > > +    if ( (val != 0xffffffff) &&
->
-> > >
->
-> > > +        (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
-> > >
->
-> > >          /* io base/limit */
->
-> > >
->
-> > >          ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-> > >
->
-> > > @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d,
->
-> > >
->
-> > >          ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-> > >
->
-> > >          /* vga enable */
->
-> > >
->
-> > > -        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
-> > >
->
-> > > +        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) {
->
-> > >
->
-> > >          pci_bridge_update_mappings(s);
->
-> > >
->
-> > >      }
->
-> > >
->
-> > >
->
-> > >
->
-> > > Thinks,
->
-> > >
->
-> > > Xu
->
-> > >
-
-On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
->
-On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > > Hi all,
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > In our test, we configured VM with several pci-bridges and a
->
-> > > > virtio-net nic been attached with bus 4,
->
-> > > >
->
-> > > > After VM is startup, We ping this nic from host to judge if it is
->
-> > > > working normally. Then, we hot add pci devices to this VM with bus 0.
->
-> > > >
->
-> > > > We  found the virtio-net NIC in bus 4 is not working (can not
->
-> > > > connect) occasionally, as it kick virtio backend failure with error
->
-> > > > below:
->
-> > > >
->
-> > > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > memory-region: pci_bridge_pci
->
-> > > >
->
-> > > >   0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci
->
-> > > >
->
-> > > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
-> > > >
->
-> > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > > virtio-pci-common
->
-> > > >
->
-> > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > > virtio-pci-isr
->
-> > > >
->
-> > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > > virtio-pci-device
->
-> > > >
->
-> > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > > virtio-pci-notify  <- io mem unassigned
->
-> > > >
->
-> > > >       …
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > We caught an exceptional address changing while this problem
->
-> > > > happened, show as
->
-> > > > follow:
->
-> > > >
->
-> > > > Before pci_bridge_update_mappings:
->
-> > > >
->
-> > > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fc000000-00000000fc1fffff
->
-> > > >
->
-> > > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fc200000-00000000fc3fffff
->
-> > > >
->
-> > > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fc400000-00000000fc5fffff
->
-> > > >
->
-> > > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fc600000-00000000fc7fffff
->
-> > > >
->
-> > > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fc800000-00000000fc9fffff
->
-> > > > <- correct Adress Spce
->
-> > > >
->
-> > > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fca00000-00000000fcbfffff
->
-> > > >
->
-> > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fcc00000-00000000fcdfffff
->
-> > > >
->
-> > > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > 00000000fce00000-00000000fcffffff
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > After pci_bridge_update_mappings:
->
-> > > >
->
-> > > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff
->
-> > > >
->
-> > > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff
->
-> > > >
->
-> > > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff
->
-> > > >
->
-> > > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff
->
-> > > >
->
-> > > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff
->
-> > > >
->
-> > > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff
->
-> > > >
->
-> > > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff
->
-> > > >
->
-> > > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> > > > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff
->
-> > > >
->
-> > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-> pci_bridge_pref_mem
->
-> > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
->
-> > > > Adress
->
-> > > Space
->
-> > >
->
-> > > This one is empty though right?
->
-> > >
->
-> > > >
->
-> > > >
->
-> > > > We have figured out why this address becomes this value,
->
-> > > > according to pci spec,  pci driver can get BAR address size by
->
-> > > > writing 0xffffffff to
->
-> > > >
->
-> > > > the pci register firstly, and then read back the value from this
->
-> > > > register.
->
-> > >
->
-> > >
->
-> > > OK however as you show below the BAR being sized is the BAR if a
->
-> > > bridge. Are you then adding a bridge device by hotplug?
->
-> >
->
-> > No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> > interesting phenomenon is If I hot plug the device to other bus, this
->
-> > doesn't
->
-> happened.
->
-> >
->
-> > >
->
-> > >
->
-> > > > We didn't handle this value  specially while process pci write in
->
-> > > > qemu, the function call stack is:
->
-> > > >
->
-> > > > Pci_bridge_dev_write_config
->
-> > > >
->
-> > > > -> pci_bridge_write_config
->
-> > > >
->
-> > > > -> pci_default_write_config (we update the config[address] value
->
-> > > > -> here to
->
-> > > > fffffffffc800000, which should be 0xfc800000 )
->
-> > > >
->
-> > > > -> pci_bridge_update_mappings
->
-> > > >
->
-> > > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > > >
->
-> > > > -> pci_bridge_region_init
->
-> > > >
->
-> > > >                                                                 ->
->
-> > > > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong
->
-> > > > value
->
-> > > > fffffffffc800000)
->
-> > > >
->
-> > > >                                                 ->
->
-> > > > memory_region_transaction_commit
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > So, as we can see, we use the wrong base address in qemu to update
->
-> > > > the memory regions, though, we update the base address to
->
-> > > >
->
-> > > > The correct value after pci driver in VM write the original value
->
-> > > > back, the virtio NIC in bus 4 may still sends net packets
->
-> > > > concurrently with
->
-> > > >
->
-> > > > The wrong memory region address.
->
-> > > >
->
-> > > >
->
-> > > >
->
-> > > > We have tried to skip the memory region update action in qemu
->
-> > > > while detect pci write with 0xffffffff value, and it does work,
->
-> > > > but
->
-> > > >
->
-> > > > This seems to be not gently.
->
-> > >
->
-> > > For sure. But I'm still puzzled as to why does Linux try to size the
->
-> > > BAR of the bridge while a device behind it is used.
->
-> > >
->
-> > > Can you pls post your QEMU command line?
->
-> >
->
-> > My QEMU command line:
->
-> > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
->
-> > -object
->
-> > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-
->
-> > Linux/master-key.aes -machine
->
-> > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
->
-> > 20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024
->
-> > -numa node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> > node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> > node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
->
-> > -chardev
->
-> > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/moni
->
-> > tor.sock,server,nowait -mon
->
-> > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
->
-> > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on
->
-> > -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-v
->
-> > irtio-disk0,cache=none -device
->
-> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id
->
-> > =virtio-disk0,bootindex=1 -drive
->
-> > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
->
-> > tap,fd=35,id=hostnet0 -device
->
-> > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4
->
-> > ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> > isa-serial,chardev=charserial0,id=serial0 -device
->
-> > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on
->
-> >
->
-> > I am also very curious about this issue, in the linux kernel code, maybe
->
-> > double
->
-> check in function pci_bridge_check_ranges triggered this problem.
->
->
->
-> If you can get the stacktrace in Linux when it tries to write this fffff
->
-> value, that
->
-> would be quite helpful.
->
->
->
->
-After I add mdelay(100) in function pci_bridge_check_ranges, this phenomenon
->
-is
->
-easier to reproduce, below is my modify in kernel:
->
-diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
->
-index cb389277..86e232d 100644
->
---- a/drivers/pci/setup-bus.c
->
-+++ b/drivers/pci/setup-bus.c
->
-@@ -27,7 +27,7 @@
->
-#include <linux/slab.h>
->
-#include <linux/acpi.h>
->
-#include "pci.h"
->
--
->
-+#include <linux/delay.h>
->
-unsigned int pci_flags;
->
->
-struct pci_dev_resource {
->
-@@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
->
-pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-0xffffffff);
->
-pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp);
->
-+               mdelay(100);
->
-+               printk(KERN_ERR "sleep\n");
->
-+                dump_stack();
->
-if (!tmp)
->
-b_res[2].flags &= ~IORESOURCE_MEM_64;
->
-pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-OK!
-I just sent a Linux patch that should help.
-I would appreciate it if you will give it a try
-and if that helps reply to it with
-a Tested-by: tag.
-
--- 
-MST
-
-On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
->
-> On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > > > Hi all,
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > In our test, we configured VM with several pci-bridges and a
->
-> > > > > virtio-net nic been attached with bus 4,
->
-> > > > >
->
-> > > > > After VM is startup, We ping this nic from host to judge if it
->
-> > > > > is working normally. Then, we hot add pci devices to this VM with
->
-> > > > > bus
->
-0.
->
-> > > > >
->
-> > > > > We  found the virtio-net NIC in bus 4 is not working (can not
->
-> > > > > connect) occasionally, as it kick virtio backend failure with error
->
-> > > > > below:
->
-> > > > >
->
-> > > > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > memory-region: pci_bridge_pci
->
-> > > > >
->
-> > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
->
-> > > > > pci_bridge_pci
->
-> > > > >
->
-> > > > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
-> > > > >
->
-> > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > > > virtio-pci-common
->
-> > > > >
->
-> > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > > > virtio-pci-isr
->
-> > > > >
->
-> > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > > > virtio-pci-device
->
-> > > > >
->
-> > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > > > virtio-pci-notify  <- io mem unassigned
->
-> > > > >
->
-> > > > >       …
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > We caught an exceptional address changing while this problem
->
-> > > > > happened, show as
->
-> > > > > follow:
->
-> > > > >
->
-> > > > > Before pci_bridge_update_mappings:
->
-> > > > >
->
-> > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fc000000-00000000fc1fffff
->
-> > > > >
->
-> > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fc200000-00000000fc3fffff
->
-> > > > >
->
-> > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fc400000-00000000fc5fffff
->
-> > > > >
->
-> > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fc600000-00000000fc7fffff
->
-> > > > >
->
-> > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fc800000-00000000fc9fffff
->
-> > > > > <- correct Adress Spce
->
-> > > > >
->
-> > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fca00000-00000000fcbfffff
->
-> > > > >
->
-> > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fcc00000-00000000fcdfffff
->
-> > > > >
->
-> > > > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > 00000000fce00000-00000000fcffffff
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > After pci_bridge_update_mappings:
->
-> > > > >
->
-> > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fda00000-00000000fdbfffff
->
-> > > > >
->
-> > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fdc00000-00000000fddfffff
->
-> > > > >
->
-> > > > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fde00000-00000000fdffffff
->
-> > > > >
->
-> > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fe000000-00000000fe1fffff
->
-> > > > >
->
-> > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fe200000-00000000fe3fffff
->
-> > > > >
->
-> > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fe400000-00000000fe5fffff
->
-> > > > >
->
-> > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fe600000-00000000fe7fffff
->
-> > > > >
->
-> > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > 00000000fe800000-00000000fe9fffff
->
-> > > > >
->
-> > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-> > pci_bridge_pref_mem
->
-> > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
->
-Adress
->
-> > > > Space
->
-> > > >
->
-> > > > This one is empty though right?
->
-> > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > We have figured out why this address becomes this value,
->
-> > > > > according to pci spec,  pci driver can get BAR address size by
->
-> > > > > writing 0xffffffff to
->
-> > > > >
->
-> > > > > the pci register firstly, and then read back the value from this
->
-> > > > > register.
->
-> > > >
->
-> > > >
->
-> > > > OK however as you show below the BAR being sized is the BAR if a
->
-> > > > bridge. Are you then adding a bridge device by hotplug?
->
-> > >
->
-> > > No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> > > interesting phenomenon is If I hot plug the device to other bus,
->
-> > > this doesn't
->
-> > happened.
->
-> > >
->
-> > > >
->
-> > > >
->
-> > > > > We didn't handle this value  specially while process pci write
->
-> > > > > in qemu, the function call stack is:
->
-> > > > >
->
-> > > > > Pci_bridge_dev_write_config
->
-> > > > >
->
-> > > > > -> pci_bridge_write_config
->
-> > > > >
->
-> > > > > -> pci_default_write_config (we update the config[address]
->
-> > > > > -> value here to
->
-> > > > > fffffffffc800000, which should be 0xfc800000 )
->
-> > > > >
->
-> > > > > -> pci_bridge_update_mappings
->
-> > > > >
->
-> > > > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > > > >
->
-> > > > > -> pci_bridge_region_init
->
-> > > > >
->
-> > > > >
->
-> > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use the
->
-> > > > > wrong value
->
-> > > > > fffffffffc800000)
->
-> > > > >
->
-> > > > >                                                 ->
->
-> > > > > memory_region_transaction_commit
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > So, as we can see, we use the wrong base address in qemu to
->
-> > > > > update the memory regions, though, we update the base address
->
-> > > > > to
->
-> > > > >
->
-> > > > > The correct value after pci driver in VM write the original
->
-> > > > > value back, the virtio NIC in bus 4 may still sends net
->
-> > > > > packets concurrently with
->
-> > > > >
->
-> > > > > The wrong memory region address.
->
-> > > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > We have tried to skip the memory region update action in qemu
->
-> > > > > while detect pci write with 0xffffffff value, and it does
->
-> > > > > work, but
->
-> > > > >
->
-> > > > > This seems to be not gently.
->
-> > > >
->
-> > > > For sure. But I'm still puzzled as to why does Linux try to size
->
-> > > > the BAR of the bridge while a device behind it is used.
->
-> > > >
->
-> > > > Can you pls post your QEMU command line?
->
-> > >
->
-> > > My QEMU command line:
->
-> > > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
->
-> > > -object
->
-> > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-
->
-> > > 194-
->
-> > > Linux/master-key.aes -machine
->
-> > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
->
-> > > 20,sockets=20,cores=1,threads=1 -numa
->
-> > > node,nodeid=0,cpus=0-4,mem=1024 -numa
->
-> > > node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> > > node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
->
-> > > -chardev
->
-> > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/
->
-> > > moni
->
-> > > tor.sock,server,nowait -mon
->
-> > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
->
-> > > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot
->
-> > > strict=on -device
->
-> > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=dri
->
-> > > ve-v
->
-> > > irtio-disk0,cache=none -device
->
-> > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk
->
-> > > 0,id
->
-> > > =virtio-disk0,bootindex=1 -drive
->
-> > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
->
-> > > tap,fd=35,id=hostnet0 -device
->
-> > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=p
->
-> > > ci.4
->
-> > > ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> > > isa-serial,chardev=charserial0,id=serial0 -device
->
-> > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
->
-> > > timestamp=on
->
-> > >
->
-> > > I am also very curious about this issue, in the linux kernel code,
->
-> > > maybe double
->
-> > check in function pci_bridge_check_ranges triggered this problem.
->
-> >
->
-> > If you can get the stacktrace in Linux when it tries to write this
->
-> > fffff value, that would be quite helpful.
->
-> >
->
->
->
-> After I add mdelay(100) in function pci_bridge_check_ranges, this
->
-> phenomenon is easier to reproduce, below is my modify in kernel:
->
-> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index
->
-> cb389277..86e232d 100644
->
-> --- a/drivers/pci/setup-bus.c
->
-> +++ b/drivers/pci/setup-bus.c
->
-> @@ -27,7 +27,7 @@
->
->  #include <linux/slab.h>
->
->  #include <linux/acpi.h>
->
->  #include "pci.h"
->
-> -
->
-> +#include <linux/delay.h>
->
->  unsigned int pci_flags;
->
->
->
->  struct pci_dev_resource {
->
-> @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus
->
-*bus)
->
->                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
->                                                0xffffffff);
->
->                 pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> &tmp);
->
-> +               mdelay(100);
->
-> +               printk(KERN_ERR "sleep\n");
->
-> +                dump_stack();
->
->                 if (!tmp)
->
->                         b_res[2].flags &= ~IORESOURCE_MEM_64;
->
->                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
->
->
->
-OK!
->
-I just sent a Linux patch that should help.
->
-I would appreciate it if you will give it a try and if that helps reply to it
->
-with a
->
-Tested-by: tag.
->
-I tested this patch and it works fine on my machine.
-
-But I have another question, if we only fix this problem in the kernel, the 
-Linux
-version that has been released does not work well on the virtualization 
-platform. 
-Is there a way to fix this problem in the backend?
-
->
---
->
-MST
-
-On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote:
->
-On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
->
-> > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > > > > Hi all,
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > In our test, we configured VM with several pci-bridges and a
->
-> > > > > > virtio-net nic been attached with bus 4,
->
-> > > > > >
->
-> > > > > > After VM is startup, We ping this nic from host to judge if it
->
-> > > > > > is working normally. Then, we hot add pci devices to this VM with
->
-> > > > > > bus
->
-> 0.
->
-> > > > > >
->
-> > > > > > We  found the virtio-net NIC in bus 4 is not working (can not
->
-> > > > > > connect) occasionally, as it kick virtio backend failure with
->
-> > > > > > error below:
->
-> > > > > >
->
-> > > > > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > memory-region: pci_bridge_pci
->
-> > > > > >
->
-> > > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
->
-> > > > > > pci_bridge_pci
->
-> > > > > >
->
-> > > > > >     00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci
->
-> > > > > >
->
-> > > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > > > > virtio-pci-common
->
-> > > > > >
->
-> > > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > > > > virtio-pci-isr
->
-> > > > > >
->
-> > > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > > > > virtio-pci-device
->
-> > > > > >
->
-> > > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > > > > virtio-pci-notify  <- io mem unassigned
->
-> > > > > >
->
-> > > > > >       …
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > We caught an exceptional address changing while this problem
->
-> > > > > > happened, show as
->
-> > > > > > follow:
->
-> > > > > >
->
-> > > > > > Before pci_bridge_update_mappings:
->
-> > > > > >
->
-> > > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fc000000-00000000fc1fffff
->
-> > > > > >
->
-> > > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fc200000-00000000fc3fffff
->
-> > > > > >
->
-> > > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fc400000-00000000fc5fffff
->
-> > > > > >
->
-> > > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fc600000-00000000fc7fffff
->
-> > > > > >
->
-> > > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fc800000-00000000fc9fffff
->
-> > > > > > <- correct Adress Spce
->
-> > > > > >
->
-> > > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fca00000-00000000fcbfffff
->
-> > > > > >
->
-> > > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fcc00000-00000000fcdfffff
->
-> > > > > >
->
-> > > > > >       00000000fce00000-00000000fcffffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > 00000000fce00000-00000000fcffffff
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > After pci_bridge_update_mappings:
->
-> > > > > >
->
-> > > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fda00000-00000000fdbfffff
->
-> > > > > >
->
-> > > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fdc00000-00000000fddfffff
->
-> > > > > >
->
-> > > > > >       00000000fde00000-00000000fdffffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fde00000-00000000fdffffff
->
-> > > > > >
->
-> > > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fe000000-00000000fe1fffff
->
-> > > > > >
->
-> > > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fe200000-00000000fe3fffff
->
-> > > > > >
->
-> > > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fe400000-00000000fe5fffff
->
-> > > > > >
->
-> > > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fe600000-00000000fe7fffff
->
-> > > > > >
->
-> > > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW): alias
->
-> > > > > > pci_bridge_mem @pci_bridge_pci
->
-> > > > > > 00000000fe800000-00000000fe9fffff
->
-> > > > > >
->
-> > > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW): alias
->
-> > > pci_bridge_pref_mem
->
-> > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <- Exceptional
->
-> Adress
->
-> > > > > Space
->
-> > > > >
->
-> > > > > This one is empty though right?
->
-> > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > We have figured out why this address becomes this value,
->
-> > > > > > according to pci spec,  pci driver can get BAR address size by
->
-> > > > > > writing 0xffffffff to
->
-> > > > > >
->
-> > > > > > the pci register firstly, and then read back the value from this
->
-> > > > > > register.
->
-> > > > >
->
-> > > > >
->
-> > > > > OK however as you show below the BAR being sized is the BAR if a
->
-> > > > > bridge. Are you then adding a bridge device by hotplug?
->
-> > > >
->
-> > > > No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> > > > interesting phenomenon is If I hot plug the device to other bus,
->
-> > > > this doesn't
->
-> > > happened.
->
-> > > >
->
-> > > > >
->
-> > > > >
->
-> > > > > > We didn't handle this value  specially while process pci write
->
-> > > > > > in qemu, the function call stack is:
->
-> > > > > >
->
-> > > > > > Pci_bridge_dev_write_config
->
-> > > > > >
->
-> > > > > > -> pci_bridge_write_config
->
-> > > > > >
->
-> > > > > > -> pci_default_write_config (we update the config[address]
->
-> > > > > > -> value here to
->
-> > > > > > fffffffffc800000, which should be 0xfc800000 )
->
-> > > > > >
->
-> > > > > > -> pci_bridge_update_mappings
->
-> > > > > >
->
-> > > > > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > > > > >
->
-> > > > > > -> pci_bridge_region_init
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use the
->
-> > > > > > wrong value
->
-> > > > > > fffffffffc800000)
->
-> > > > > >
->
-> > > > > >                                                 ->
->
-> > > > > > memory_region_transaction_commit
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > So, as we can see, we use the wrong base address in qemu to
->
-> > > > > > update the memory regions, though, we update the base address
->
-> > > > > > to
->
-> > > > > >
->
-> > > > > > The correct value after pci driver in VM write the original
->
-> > > > > > value back, the virtio NIC in bus 4 may still sends net
->
-> > > > > > packets concurrently with
->
-> > > > > >
->
-> > > > > > The wrong memory region address.
->
-> > > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > We have tried to skip the memory region update action in qemu
->
-> > > > > > while detect pci write with 0xffffffff value, and it does
->
-> > > > > > work, but
->
-> > > > > >
->
-> > > > > > This seems to be not gently.
->
-> > > > >
->
-> > > > > For sure. But I'm still puzzled as to why does Linux try to size
->
-> > > > > the BAR of the bridge while a device behind it is used.
->
-> > > > >
->
-> > > > > Can you pls post your QEMU command line?
->
-> > > >
->
-> > > > My QEMU command line:
->
-> > > > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S
->
-> > > > -object
->
-> > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-
->
-> > > > 194-
->
-> > > > Linux/master-key.aes -machine
->
-> > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp
->
-> > > > 20,sockets=20,cores=1,threads=1 -numa
->
-> > > > node,nodeid=0,cpus=0-4,mem=1024 -numa
->
-> > > > node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> > > > node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults
->
-> > > > -chardev
->
-> > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/
->
-> > > > moni
->
-> > > > tor.sock,server,nowait -mon
->
-> > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet
->
-> > > > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot
->
-> > > > strict=on -device
->
-> > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=dri
->
-> > > > ve-v
->
-> > > > irtio-disk0,cache=none -device
->
-> > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk
->
-> > > > 0,id
->
-> > > > =virtio-disk0,bootindex=1 -drive
->
-> > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev
->
-> > > > tap,fd=35,id=hostnet0 -device
->
-> > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=p
->
-> > > > ci.4
->
-> > > > ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> > > > isa-serial,chardev=charserial0,id=serial0 -device
->
-> > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
->
-> > > > timestamp=on
->
-> > > >
->
-> > > > I am also very curious about this issue, in the linux kernel code,
->
-> > > > maybe double
->
-> > > check in function pci_bridge_check_ranges triggered this problem.
->
-> > >
->
-> > > If you can get the stacktrace in Linux when it tries to write this
->
-> > > fffff value, that would be quite helpful.
->
-> > >
->
-> >
->
-> > After I add mdelay(100) in function pci_bridge_check_ranges, this
->
-> > phenomenon is easier to reproduce, below is my modify in kernel:
->
-> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index
->
-> > cb389277..86e232d 100644
->
-> > --- a/drivers/pci/setup-bus.c
->
-> > +++ b/drivers/pci/setup-bus.c
->
-> > @@ -27,7 +27,7 @@
->
-> >  #include <linux/slab.h>
->
-> >  #include <linux/acpi.h>
->
-> >  #include "pci.h"
->
-> > -
->
-> > +#include <linux/delay.h>
->
-> >  unsigned int pci_flags;
->
-> >
->
-> >  struct pci_dev_resource {
->
-> > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus
->
-> *bus)
->
-> >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> >                                                0xffffffff);
->
-> >                 pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> > &tmp);
->
-> > +               mdelay(100);
->
-> > +               printk(KERN_ERR "sleep\n");
->
-> > +                dump_stack();
->
-> >                 if (!tmp)
->
-> >                         b_res[2].flags &= ~IORESOURCE_MEM_64;
->
-> >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> >
->
->
->
-> OK!
->
-> I just sent a Linux patch that should help.
->
-> I would appreciate it if you will give it a try and if that helps reply to
->
-> it with a
->
-> Tested-by: tag.
->
->
->
->
-I tested this patch and it works fine on my machine.
->
->
-But I have another question, if we only fix this problem in the kernel, the
->
-Linux
->
-version that has been released does not work well on the virtualization
->
-platform.
->
-Is there a way to fix this problem in the backend?
-There could we a way to work around this.
-Does below help?
-
-diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
-index 236a20eaa8..7834cac4b0 100644
---- a/hw/i386/acpi-build.c
-+++ b/hw/i386/acpi-build.c
-@@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml *parent_scope, 
-PCIBus *bus,
- 
-         aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
-         aml_append(method,
--            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check */)
-+            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device Check 
-Light */)
-         );
-         aml_append(method,
-             aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request */)
-
->
-On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote:
->
-> On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
->
-> > > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > > > > > Hi all,
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > In our test, we configured VM with several pci-bridges and
->
-> > > > > > > a virtio-net nic been attached with bus 4,
->
-> > > > > > >
->
-> > > > > > > After VM is startup, We ping this nic from host to judge
->
-> > > > > > > if it is working normally. Then, we hot add pci devices to
->
-> > > > > > > this VM with bus
->
-> > 0.
->
-> > > > > > >
->
-> > > > > > > We  found the virtio-net NIC in bus 4 is not working (can
->
-> > > > > > > not
->
-> > > > > > > connect) occasionally, as it kick virtio backend failure with
->
-> > > > > > > error
->
-below:
->
-> > > > > > >
->
-> > > > > > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > memory-region: pci_bridge_pci
->
-> > > > > > >
->
-> > > > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
->
-> > > > > > > pci_bridge_pci
->
-> > > > > > >
->
-> > > > > > >     00000000fc800000-00000000fc803fff (prio 1, RW):
->
-> > > > > > > virtio-pci
->
-> > > > > > >
->
-> > > > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > > > > > virtio-pci-common
->
-> > > > > > >
->
-> > > > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > > > > > virtio-pci-isr
->
-> > > > > > >
->
-> > > > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > > > > > virtio-pci-device
->
-> > > > > > >
->
-> > > > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > > > > > virtio-pci-notify  <- io mem unassigned
->
-> > > > > > >
->
-> > > > > > >       …
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > We caught an exceptional address changing while this
->
-> > > > > > > problem happened, show as
->
-> > > > > > > follow:
->
-> > > > > > >
->
-> > > > > > > Before pci_bridge_update_mappings:
->
-> > > > > > >
->
-> > > > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fc000000-00000000fc1fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fc200000-00000000fc3fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fc400000-00000000fc5fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fc600000-00000000fc7fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fc800000-00000000fc9fffff
->
-> > > > > > > <- correct Adress Spce
->
-> > > > > > >
->
-> > > > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fca00000-00000000fcbfffff
->
-> > > > > > >
->
-> > > > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fcc00000-00000000fcdfffff
->
-> > > > > > >
->
-> > > > > > >       00000000fce00000-00000000fcffffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > 00000000fce00000-00000000fcffffff
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > After pci_bridge_update_mappings:
->
-> > > > > > >
->
-> > > > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fda00000-00000000fdbfffff
->
-> > > > > > >
->
-> > > > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fdc00000-00000000fddfffff
->
-> > > > > > >
->
-> > > > > > >       00000000fde00000-00000000fdffffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fde00000-00000000fdffffff
->
-> > > > > > >
->
-> > > > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fe000000-00000000fe1fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fe200000-00000000fe3fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fe400000-00000000fe5fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fe600000-00000000fe7fffff
->
-> > > > > > >
->
-> > > > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW):
->
-> > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > 00000000fe800000-00000000fe9fffff
->
-> > > > > > >
->
-> > > > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW):
->
-> > > > > > > alias
->
-> > > > pci_bridge_pref_mem
->
-> > > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <-
->
-> > > > > > > Exceptional
->
-> > Adress
->
-> > > > > > Space
->
-> > > > > >
->
-> > > > > > This one is empty though right?
->
-> > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > We have figured out why this address becomes this value,
->
-> > > > > > > according to pci spec,  pci driver can get BAR address
->
-> > > > > > > size by writing 0xffffffff to
->
-> > > > > > >
->
-> > > > > > > the pci register firstly, and then read back the value from this
->
-register.
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > OK however as you show below the BAR being sized is the BAR
->
-> > > > > > if a bridge. Are you then adding a bridge device by hotplug?
->
-> > > > >
->
-> > > > > No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> > > > > interesting phenomenon is If I hot plug the device to other
->
-> > > > > bus, this doesn't
->
-> > > > happened.
->
-> > > > >
->
-> > > > > >
->
-> > > > > >
->
-> > > > > > > We didn't handle this value  specially while process pci
->
-> > > > > > > write in qemu, the function call stack is:
->
-> > > > > > >
->
-> > > > > > > Pci_bridge_dev_write_config
->
-> > > > > > >
->
-> > > > > > > -> pci_bridge_write_config
->
-> > > > > > >
->
-> > > > > > > -> pci_default_write_config (we update the config[address]
->
-> > > > > > > -> value here to
->
-> > > > > > > fffffffffc800000, which should be 0xfc800000 )
->
-> > > > > > >
->
-> > > > > > > -> pci_bridge_update_mappings
->
-> > > > > > >
->
-> > > > > > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > > > > > >
->
-> > > > > > > -> pci_bridge_region_init
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use
->
-> > > > > > > -> the
->
-> > > > > > > wrong value
->
-> > > > > > > fffffffffc800000)
->
-> > > > > > >
->
-> > > > > > >                                                 ->
->
-> > > > > > > memory_region_transaction_commit
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > So, as we can see, we use the wrong base address in qemu
->
-> > > > > > > to update the memory regions, though, we update the base
->
-> > > > > > > address to
->
-> > > > > > >
->
-> > > > > > > The correct value after pci driver in VM write the
->
-> > > > > > > original value back, the virtio NIC in bus 4 may still
->
-> > > > > > > sends net packets concurrently with
->
-> > > > > > >
->
-> > > > > > > The wrong memory region address.
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > We have tried to skip the memory region update action in
->
-> > > > > > > qemu while detect pci write with 0xffffffff value, and it
->
-> > > > > > > does work, but
->
-> > > > > > >
->
-> > > > > > > This seems to be not gently.
->
-> > > > > >
->
-> > > > > > For sure. But I'm still puzzled as to why does Linux try to
->
-> > > > > > size the BAR of the bridge while a device behind it is used.
->
-> > > > > >
->
-> > > > > > Can you pls post your QEMU command line?
->
-> > > > >
->
-> > > > > My QEMU command line:
->
-> > > > > /root/xyd/qemu-system-x86_64 -name
->
-> > > > > guest=Linux,debug-threads=on -S -object
->
-> > > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/dom
->
-> > > > > ain-
->
-> > > > > 194-
->
-> > > > > Linux/master-key.aes -machine
->
-> > > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> > > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> > > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off
->
-> > > > > -smp
->
-> > > > > 20,sockets=20,cores=1,threads=1 -numa
->
-> > > > > node,nodeid=0,cpus=0-4,mem=1024 -numa
->
-> > > > > node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> > > > > node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> > > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> > > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config
->
-> > > > > -nodefaults -chardev
->
-> > > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Li
->
-> > > > > nux/
->
-> > > > > moni
->
-> > > > > tor.sock,server,nowait -mon
->
-> > > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc
->
-> > > > > -no-hpet -global kvm-pit.lost_tick_policy=delay -no-shutdown
->
-> > > > > -boot strict=on -device
->
-> > > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> > > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> > > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> > > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> > > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> > > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> > > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> > > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> > > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> > > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> > > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> > > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> > > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> > > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id
->
-> > > > > =dri
->
-> > > > > ve-v
->
-> > > > > irtio-disk0,cache=none -device
->
-> > > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-
->
-> > > > > disk
->
-> > > > > 0,id
->
-> > > > > =virtio-disk0,bootindex=1 -drive
->
-> > > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> > > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1
->
-> > > > > -netdev
->
-> > > > > tap,fd=35,id=hostnet0 -device
->
-> > > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,b
->
-> > > > > us=p
->
-> > > > > ci.4
->
-> > > > > ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> > > > > isa-serial,chardev=charserial0,id=serial0 -device
->
-> > > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> > > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> > > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
->
-> > > > > timestamp=on
->
-> > > > >
->
-> > > > > I am also very curious about this issue, in the linux kernel
->
-> > > > > code, maybe double
->
-> > > > check in function pci_bridge_check_ranges triggered this problem.
->
-> > > >
->
-> > > > If you can get the stacktrace in Linux when it tries to write
->
-> > > > this fffff value, that would be quite helpful.
->
-> > > >
->
-> > >
->
-> > > After I add mdelay(100) in function pci_bridge_check_ranges, this
->
-> > > phenomenon is easier to reproduce, below is my modify in kernel:
->
-> > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
->
-> > > index cb389277..86e232d 100644
->
-> > > --- a/drivers/pci/setup-bus.c
->
-> > > +++ b/drivers/pci/setup-bus.c
->
-> > > @@ -27,7 +27,7 @@
->
-> > >  #include <linux/slab.h>
->
-> > >  #include <linux/acpi.h>
->
-> > >  #include "pci.h"
->
-> > > -
->
-> > > +#include <linux/delay.h>
->
-> > >  unsigned int pci_flags;
->
-> > >
->
-> > >  struct pci_dev_resource {
->
-> > > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct
->
-> > > pci_bus
->
-> > *bus)
->
-> > >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> > >                                                0xffffffff);
->
-> > >                 pci_read_config_dword(bridge,
->
-> > > PCI_PREF_BASE_UPPER32, &tmp);
->
-> > > +               mdelay(100);
->
-> > > +               printk(KERN_ERR "sleep\n");
->
-> > > +                dump_stack();
->
-> > >                 if (!tmp)
->
-> > >                         b_res[2].flags &= ~IORESOURCE_MEM_64;
->
-> > >                 pci_write_config_dword(bridge,
->
-> > > PCI_PREF_BASE_UPPER32,
->
-> > >
->
-> >
->
-> > OK!
->
-> > I just sent a Linux patch that should help.
->
-> > I would appreciate it if you will give it a try and if that helps
->
-> > reply to it with a
->
-> > Tested-by: tag.
->
-> >
->
->
->
-> I tested this patch and it works fine on my machine.
->
->
->
-> But I have another question, if we only fix this problem in the
->
-> kernel, the Linux version that has been released does not work well on the
->
-virtualization platform.
->
-> Is there a way to fix this problem in the backend?
->
->
-There could we a way to work around this.
->
-Does below help?
-I am sorry to tell you, I tested this patch and it doesn't work fine.
-
->
->
-diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-236a20eaa8..7834cac4b0 100644
->
---- a/hw/i386/acpi-build.c
->
-+++ b/hw/i386/acpi-build.c
->
-@@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-*parent_scope, PCIBus *bus,
->
->
-aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
->
-aml_append(method,
->
--            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check
->
-*/)
->
-+            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device
->
-+ Check Light */)
->
-);
->
-aml_append(method,
->
-aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request
->
-*/)
-
-On Tue, Dec 11, 2018 at 03:51:09AM +0000, xuyandong wrote:
->
-> There could we a way to work around this.
->
-> Does below help?
->
->
-I am sorry to tell you, I tested this patch and it doesn't work fine.
-What happens?
-
->
->
->
-> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-> 236a20eaa8..7834cac4b0 100644
->
-> --- a/hw/i386/acpi-build.c
->
-> +++ b/hw/i386/acpi-build.c
->
-> @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-> *parent_scope, PCIBus *bus,
->
->
->
->          aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
->
->          aml_append(method,
->
-> -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check
->
-> */)
->
-> +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device
->
-> + Check Light */)
->
->          );
->
->          aml_append(method,
->
->              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request
->
-> */)
-
-On Tue, Dec 11, 2018 at 03:51:09AM +0000, xuyandong wrote:
->
-> On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote:
->
-> > On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote:
->
-> > > > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote:
->
-> > > > > > > > Hi all,
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > In our test, we configured VM with several pci-bridges and
->
-> > > > > > > > a virtio-net nic been attached with bus 4,
->
-> > > > > > > >
->
-> > > > > > > > After VM is startup, We ping this nic from host to judge
->
-> > > > > > > > if it is working normally. Then, we hot add pci devices to
->
-> > > > > > > > this VM with bus
->
-> > > 0.
->
-> > > > > > > >
->
-> > > > > > > > We  found the virtio-net NIC in bus 4 is not working (can
->
-> > > > > > > > not
->
-> > > > > > > > connect) occasionally, as it kick virtio backend failure with
->
-> > > > > > > > error
->
-> below:
->
-> > > > > > > >
->
-> > > > > > > >     Unassigned mem write 00000000fc803004 = 0x1
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > memory-region: pci_bridge_pci
->
-> > > > > > > >
->
-> > > > > > > >   0000000000000000-ffffffffffffffff (prio 0, RW):
->
-> > > > > > > > pci_bridge_pci
->
-> > > > > > > >
->
-> > > > > > > >     00000000fc800000-00000000fc803fff (prio 1, RW):
->
-> > > > > > > > virtio-pci
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc800000-00000000fc800fff (prio 0, RW):
->
-> > > > > > > > virtio-pci-common
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc801000-00000000fc801fff (prio 0, RW):
->
-> > > > > > > > virtio-pci-isr
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc802000-00000000fc802fff (prio 0, RW):
->
-> > > > > > > > virtio-pci-device
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc803000-00000000fc803fff (prio 0, RW):
->
-> > > > > > > > virtio-pci-notify  <- io mem unassigned
->
-> > > > > > > >
->
-> > > > > > > >       …
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > We caught an exceptional address changing while this
->
-> > > > > > > > problem happened, show as
->
-> > > > > > > > follow:
->
-> > > > > > > >
->
-> > > > > > > > Before pci_bridge_update_mappings:
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc000000-00000000fc1fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fc000000-00000000fc1fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc200000-00000000fc3fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fc200000-00000000fc3fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc400000-00000000fc5fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fc400000-00000000fc5fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc600000-00000000fc7fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fc600000-00000000fc7fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fc800000-00000000fc9fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fc800000-00000000fc9fffff
->
-> > > > > > > > <- correct Adress Spce
->
-> > > > > > > >
->
-> > > > > > > >       00000000fca00000-00000000fcbfffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fca00000-00000000fcbfffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fcc00000-00000000fcdfffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fcc00000-00000000fcdfffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fce00000-00000000fcffffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fce00000-00000000fcffffff
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > After pci_bridge_update_mappings:
->
-> > > > > > > >
->
-> > > > > > > >       00000000fda00000-00000000fdbfffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fda00000-00000000fdbfffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fdc00000-00000000fddfffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fdc00000-00000000fddfffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fde00000-00000000fdffffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fde00000-00000000fdffffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fe000000-00000000fe1fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fe000000-00000000fe1fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fe200000-00000000fe3fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fe200000-00000000fe3fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fe400000-00000000fe5fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fe400000-00000000fe5fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fe600000-00000000fe7fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fe600000-00000000fe7fffff
->
-> > > > > > > >
->
-> > > > > > > >       00000000fe800000-00000000fe9fffff (prio 1, RW):
->
-> > > > > > > > alias pci_bridge_mem @pci_bridge_pci
->
-> > > > > > > > 00000000fe800000-00000000fe9fffff
->
-> > > > > > > >
->
-> > > > > > > >       fffffffffc800000-fffffffffc800000 (prio 1, RW):
->
-> > > > > > > > alias
->
-> > > > > pci_bridge_pref_mem
->
-> > > > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000   <-
->
-> > > > > > > > Exceptional
->
-> > > Adress
->
-> > > > > > > Space
->
-> > > > > > >
->
-> > > > > > > This one is empty though right?
->
-> > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > We have figured out why this address becomes this value,
->
-> > > > > > > > according to pci spec,  pci driver can get BAR address
->
-> > > > > > > > size by writing 0xffffffff to
->
-> > > > > > > >
->
-> > > > > > > > the pci register firstly, and then read back the value from
->
-> > > > > > > > this
->
-> register.
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > OK however as you show below the BAR being sized is the BAR
->
-> > > > > > > if a bridge. Are you then adding a bridge device by hotplug?
->
-> > > > > >
->
-> > > > > > No, I just simply hot plugged a VFIO device to Bus 0, another
->
-> > > > > > interesting phenomenon is If I hot plug the device to other
->
-> > > > > > bus, this doesn't
->
-> > > > > happened.
->
-> > > > > >
->
-> > > > > > >
->
-> > > > > > >
->
-> > > > > > > > We didn't handle this value  specially while process pci
->
-> > > > > > > > write in qemu, the function call stack is:
->
-> > > > > > > >
->
-> > > > > > > > Pci_bridge_dev_write_config
->
-> > > > > > > >
->
-> > > > > > > > -> pci_bridge_write_config
->
-> > > > > > > >
->
-> > > > > > > > -> pci_default_write_config (we update the config[address]
->
-> > > > > > > > -> value here to
->
-> > > > > > > > fffffffffc800000, which should be 0xfc800000 )
->
-> > > > > > > >
->
-> > > > > > > > -> pci_bridge_update_mappings
->
-> > > > > > > >
->
-> > > > > > > >                 ->pci_bridge_region_del(br, br->windows);
->
-> > > > > > > >
->
-> > > > > > > > -> pci_bridge_region_init
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use
->
-> > > > > > > > -> the
->
-> > > > > > > > wrong value
->
-> > > > > > > > fffffffffc800000)
->
-> > > > > > > >
->
-> > > > > > > >                                                 ->
->
-> > > > > > > > memory_region_transaction_commit
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > So, as we can see, we use the wrong base address in qemu
->
-> > > > > > > > to update the memory regions, though, we update the base
->
-> > > > > > > > address to
->
-> > > > > > > >
->
-> > > > > > > > The correct value after pci driver in VM write the
->
-> > > > > > > > original value back, the virtio NIC in bus 4 may still
->
-> > > > > > > > sends net packets concurrently with
->
-> > > > > > > >
->
-> > > > > > > > The wrong memory region address.
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > >
->
-> > > > > > > > We have tried to skip the memory region update action in
->
-> > > > > > > > qemu while detect pci write with 0xffffffff value, and it
->
-> > > > > > > > does work, but
->
-> > > > > > > >
->
-> > > > > > > > This seems to be not gently.
->
-> > > > > > >
->
-> > > > > > > For sure. But I'm still puzzled as to why does Linux try to
->
-> > > > > > > size the BAR of the bridge while a device behind it is used.
->
-> > > > > > >
->
-> > > > > > > Can you pls post your QEMU command line?
->
-> > > > > >
->
-> > > > > > My QEMU command line:
->
-> > > > > > /root/xyd/qemu-system-x86_64 -name
->
-> > > > > > guest=Linux,debug-threads=on -S -object
->
-> > > > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/dom
->
-> > > > > > ain-
->
-> > > > > > 194-
->
-> > > > > > Linux/master-key.aes -machine
->
-> > > > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
->
-> > > > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m
->
-> > > > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off
->
-> > > > > > -smp
->
-> > > > > > 20,sockets=20,cores=1,threads=1 -numa
->
-> > > > > > node,nodeid=0,cpus=0-4,mem=1024 -numa
->
-> > > > > > node,nodeid=1,cpus=5-9,mem=1024 -numa
->
-> > > > > > node,nodeid=2,cpus=10-14,mem=1024 -numa
->
-> > > > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid
->
-> > > > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config
->
-> > > > > > -nodefaults -chardev
->
-> > > > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Li
->
-> > > > > > nux/
->
-> > > > > > moni
->
-> > > > > > tor.sock,server,nowait -mon
->
-> > > > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc
->
-> > > > > > -no-hpet -global kvm-pit.lost_tick_policy=delay -no-shutdown
->
-> > > > > > -boot strict=on -device
->
-> > > > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device
->
-> > > > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device
->
-> > > > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device
->
-> > > > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device
->
-> > > > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device
->
-> > > > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
->
-> > > > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device
->
-> > > > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device
->
-> > > > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
->
-> > > > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device
->
-> > > > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device
->
-> > > > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device
->
-> > > > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
->
-> > > > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id
->
-> > > > > > =dri
->
-> > > > > > ve-v
->
-> > > > > > irtio-disk0,cache=none -device
->
-> > > > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-
->
-> > > > > > disk
->
-> > > > > > 0,id
->
-> > > > > > =virtio-disk0,bootindex=1 -drive
->
-> > > > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device
->
-> > > > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1
->
-> > > > > > -netdev
->
-> > > > > > tap,fd=35,id=hostnet0 -device
->
-> > > > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,b
->
-> > > > > > us=p
->
-> > > > > > ci.4
->
-> > > > > > ,addr=0x1 -chardev pty,id=charserial0 -device
->
-> > > > > > isa-serial,chardev=charserial0,id=serial0 -device
->
-> > > > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device
->
-> > > > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device
->
-> > > > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg
->
-> > > > > > timestamp=on
->
-> > > > > >
->
-> > > > > > I am also very curious about this issue, in the linux kernel
->
-> > > > > > code, maybe double
->
-> > > > > check in function pci_bridge_check_ranges triggered this problem.
->
-> > > > >
->
-> > > > > If you can get the stacktrace in Linux when it tries to write
->
-> > > > > this fffff value, that would be quite helpful.
->
-> > > > >
->
-> > > >
->
-> > > > After I add mdelay(100) in function pci_bridge_check_ranges, this
->
-> > > > phenomenon is easier to reproduce, below is my modify in kernel:
->
-> > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
->
-> > > > index cb389277..86e232d 100644
->
-> > > > --- a/drivers/pci/setup-bus.c
->
-> > > > +++ b/drivers/pci/setup-bus.c
->
-> > > > @@ -27,7 +27,7 @@
->
-> > > >  #include <linux/slab.h>
->
-> > > >  #include <linux/acpi.h>
->
-> > > >  #include "pci.h"
->
-> > > > -
->
-> > > > +#include <linux/delay.h>
->
-> > > >  unsigned int pci_flags;
->
-> > > >
->
-> > > >  struct pci_dev_resource {
->
-> > > > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct
->
-> > > > pci_bus
->
-> > > *bus)
->
-> > > >                 pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32,
->
-> > > >                                                0xffffffff);
->
-> > > >                 pci_read_config_dword(bridge,
->
-> > > > PCI_PREF_BASE_UPPER32, &tmp);
->
-> > > > +               mdelay(100);
->
-> > > > +               printk(KERN_ERR "sleep\n");
->
-> > > > +                dump_stack();
->
-> > > >                 if (!tmp)
->
-> > > >                         b_res[2].flags &= ~IORESOURCE_MEM_64;
->
-> > > >                 pci_write_config_dword(bridge,
->
-> > > > PCI_PREF_BASE_UPPER32,
->
-> > > >
->
-> > >
->
-> > > OK!
->
-> > > I just sent a Linux patch that should help.
->
-> > > I would appreciate it if you will give it a try and if that helps
->
-> > > reply to it with a
->
-> > > Tested-by: tag.
->
-> > >
->
-> >
->
-> > I tested this patch and it works fine on my machine.
->
-> >
->
-> > But I have another question, if we only fix this problem in the
->
-> > kernel, the Linux version that has been released does not work well on the
->
-> virtualization platform.
->
-> > Is there a way to fix this problem in the backend?
->
->
->
-> There could we a way to work around this.
->
-> Does below help?
->
->
-I am sorry to tell you, I tested this patch and it doesn't work fine.
->
->
->
->
-> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-> 236a20eaa8..7834cac4b0 100644
->
-> --- a/hw/i386/acpi-build.c
->
-> +++ b/hw/i386/acpi-build.c
->
-> @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-> *parent_scope, PCIBus *bus,
->
->
->
->          aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM")));
->
->          aml_append(method,
->
-> -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check
->
-> */)
->
-> +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device
->
-> + Check Light */)
->
->          );
->
->          aml_append(method,
->
->              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request
->
-> */)
-Oh I see, another bug:
-
-        case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
-                acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT 
-event\n");
-                /* TBD: Exactly what does 'light' mean? */
-                break;
-
-And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type)
-and friends all just ignore this event type.
-
-
-
--- 
-MST
-
->
-> > > > > > > > > Hi all,
->
-> > > > > > > > >
->
-> > > > > > > > >
->
-> > > > > > > > >
->
-> > > > > > > > > In our test, we configured VM with several pci-bridges
->
-> > > > > > > > > and a virtio-net nic been attached with bus 4,
->
-> > > > > > > > >
->
-> > > > > > > > > After VM is startup, We ping this nic from host to
->
-> > > > > > > > > judge if it is working normally. Then, we hot add pci
->
-> > > > > > > > > devices to this VM with bus
->
-> > > > 0.
->
-> > > > > > > > >
->
-> > > > > > > > > We  found the virtio-net NIC in bus 4 is not working
->
-> > > > > > > > > (can not
->
-> > > > > > > > > connect) occasionally, as it kick virtio backend
->
-> > > > > > > > > failure with error
->
-> > > But I have another question, if we only fix this problem in the
->
-> > > kernel, the Linux version that has been released does not work
->
-> > > well on the
->
-> > virtualization platform.
->
-> > > Is there a way to fix this problem in the backend?
->
-> >
->
-> > There could we a way to work around this.
->
-> > Does below help?
->
->
->
-> I am sorry to tell you, I tested this patch and it doesn't work fine.
->
->
->
-> >
->
-> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-> > 236a20eaa8..7834cac4b0 100644
->
-> > --- a/hw/i386/acpi-build.c
->
-> > +++ b/hw/i386/acpi-build.c
->
-> > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-> > *parent_scope, PCIBus *bus,
->
-> >
->
-> >          aml_append(method, aml_store(aml_int(bsel_val),
->
-aml_name("BNUM")));
->
-> >          aml_append(method,
->
-> > -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device
->
-> > Check
->
-*/)
->
-> > +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /*
->
-> > + Device Check Light */)
->
-> >          );
->
-> >          aml_append(method,
->
-> >              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject
->
-> > Request */)
->
->
->
-Oh I see, another bug:
->
->
-case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
->
-acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT
->
-event\n");
->
-/* TBD: Exactly what does 'light' mean? */
->
-break;
->
->
-And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type)
->
-and friends all just ignore this event type.
->
->
->
->
---
->
-MST
-Hi Michael,
-
-If we want to fix this problem on the backend, it is not enough to consider 
-only PCI
-device hot plugging, because I found that if we use a command like
-"echo 1 > /sys/bus/pci/rescan" in guest, this problem is very easy to reproduce.
-
-From the perspective of device emulation, when guest writes 0xffffffff to the 
-BAR,
-guest just want to get the size of the region but not really updating the 
-address space.
-So I made the following patch to avoid  update pci mapping.
-
-Do you think this make sense?
-
-[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
-
-When guest writes 0xffffffff to the BAR, guest just want to get the size of the 
-region
-but not really updating the address space.
-So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings 
-or pci_bridge_update_mappings.
-
-Signed-off-by: xuyandong <address@hidden>
----
- hw/pci/pci.c        | 6 ++++--
- hw/pci/pci_bridge.c | 8 +++++---
- 2 files changed, 9 insertions(+), 5 deletions(-)
-
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index 56b13b3..ef368e1 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
-addr, uint32_t val_in, int
- {
-     int i, was_irq_disabled = pci_irq_disabled(d);
-     uint32_t val = val_in;
-+    uint64_t barmask = (1 << l*8) - 1;
- 
-     for (i = 0; i < l; val >>= 8, ++i) {
-         uint8_t wmask = d->wmask[addr + i];
-@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
-addr, uint32_t val_in, int
-         d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
-         d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
-     }
--    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
-+    if ((val_in != barmask &&
-+       (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
-         ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
--        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
-+        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
-         range_covers_byte(addr, l, PCI_COMMAND))
-         pci_update_mappings(d);
- 
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
-index ee9dff2..f2bad79 100644
---- a/hw/pci/pci_bridge.c
-+++ b/hw/pci/pci_bridge.c
-@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
-     PCIBridge *s = PCI_BRIDGE(d);
-     uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
-     uint16_t newctl;
-+    uint64_t barmask = (1 << len * 8) - 1;
- 
-     pci_default_write_config(d, address, val, len);
- 
-     if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
- 
--        /* io base/limit */
--        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
-+        (val != barmask &&
-+       /* io base/limit */
-+        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
- 
-         /* memory base/limit, prefetchable base/limit and
-            io base/limit upper 16 */
--        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
-+        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
- 
-         /* vga enable */
-         ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
--- 
-1.8.3.1
-
-On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote:
->
-> > > > > > > > > > Hi all,
->
-> > > > > > > > > >
->
-> > > > > > > > > >
->
-> > > > > > > > > >
->
-> > > > > > > > > > In our test, we configured VM with several pci-bridges
->
-> > > > > > > > > > and a virtio-net nic been attached with bus 4,
->
-> > > > > > > > > >
->
-> > > > > > > > > > After VM is startup, We ping this nic from host to
->
-> > > > > > > > > > judge if it is working normally. Then, we hot add pci
->
-> > > > > > > > > > devices to this VM with bus
->
-> > > > > 0.
->
-> > > > > > > > > >
->
-> > > > > > > > > > We  found the virtio-net NIC in bus 4 is not working
->
-> > > > > > > > > > (can not
->
-> > > > > > > > > > connect) occasionally, as it kick virtio backend
->
-> > > > > > > > > > failure with error
->
->
-> > > > But I have another question, if we only fix this problem in the
->
-> > > > kernel, the Linux version that has been released does not work
->
-> > > > well on the
->
-> > > virtualization platform.
->
-> > > > Is there a way to fix this problem in the backend?
->
-> > >
->
-> > > There could we a way to work around this.
->
-> > > Does below help?
->
-> >
->
-> > I am sorry to tell you, I tested this patch and it doesn't work fine.
->
-> >
->
-> > >
->
-> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-> > > 236a20eaa8..7834cac4b0 100644
->
-> > > --- a/hw/i386/acpi-build.c
->
-> > > +++ b/hw/i386/acpi-build.c
->
-> > > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-> > > *parent_scope, PCIBus *bus,
->
-> > >
->
-> > >          aml_append(method, aml_store(aml_int(bsel_val),
->
-> aml_name("BNUM")));
->
-> > >          aml_append(method,
->
-> > > -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device
->
-> > > Check
->
-> */)
->
-> > > +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /*
->
-> > > + Device Check Light */)
->
-> > >          );
->
-> > >          aml_append(method,
->
-> > >              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject
->
-> > > Request */)
->
->
->
->
->
-> Oh I see, another bug:
->
->
->
->         case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
->
->                 acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT
->
-> event\n");
->
->                 /* TBD: Exactly what does 'light' mean? */
->
->                 break;
->
->
->
-> And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type)
->
-> and friends all just ignore this event type.
->
->
->
->
->
->
->
-> --
->
-> MST
->
->
-Hi Michael,
->
->
-If we want to fix this problem on the backend, it is not enough to consider
->
-only PCI
->
-device hot plugging, because I found that if we use a command like
->
-"echo 1 > /sys/bus/pci/rescan" in guest, this problem is very easy to
->
-reproduce.
->
->
-From the perspective of device emulation, when guest writes 0xffffffff to the
->
-BAR,
->
-guest just want to get the size of the region but not really updating the
->
-address space.
->
-So I made the following patch to avoid  update pci mapping.
->
->
-Do you think this make sense?
->
->
-[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
->
->
-When guest writes 0xffffffff to the BAR, guest just want to get the size of
->
-the region
->
-but not really updating the address space.
->
-So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings
->
-or pci_bridge_update_mappings.
->
->
-Signed-off-by: xuyandong <address@hidden>
-I see how that will address the common case however there are a bunch of
-issues here.  First of all it's easy to trigger the update by some other
-action like VM migration.  More importantly it's just possible that
-guest actually does want to set the low 32 bit of the address to all
-ones.  For example, that is clearly listed as a way to disable all
-devices behind the bridge in the pci to pci bridge spec.
-
-Given upstream is dragging it's feet I'm open to adding a flag
-that will help keep guests going as a temporary measure.
-We will need to think about ways to restrict this as much as
-we can.
-
-
->
----
->
-hw/pci/pci.c        | 6 ++++--
->
-hw/pci/pci_bridge.c | 8 +++++---
->
-2 files changed, 9 insertions(+), 5 deletions(-)
->
->
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
->
-index 56b13b3..ef368e1 100644
->
---- a/hw/pci/pci.c
->
-+++ b/hw/pci/pci.c
->
-@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t
->
-addr, uint32_t val_in, int
->
-{
->
-int i, was_irq_disabled = pci_irq_disabled(d);
->
-uint32_t val = val_in;
->
-+    uint64_t barmask = (1 << l*8) - 1;
->
->
-for (i = 0; i < l; val >>= 8, ++i) {
->
-uint8_t wmask = d->wmask[addr + i];
->
-@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t
->
-addr, uint32_t val_in, int
->
-d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
->
-d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
->
-}
->
--    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-+    if ((val_in != barmask &&
->
-+     (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
->
--        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
->
-+        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
->
-range_covers_byte(addr, l, PCI_COMMAND))
->
-pci_update_mappings(d);
->
->
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
->
-index ee9dff2..f2bad79 100644
->
---- a/hw/pci/pci_bridge.c
->
-+++ b/hw/pci/pci_bridge.c
->
-@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
->
-PCIBridge *s = PCI_BRIDGE(d);
->
-uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
->
-uint16_t newctl;
->
-+    uint64_t barmask = (1 << len * 8) - 1;
->
->
-pci_default_write_config(d, address, val, len);
->
->
-if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
--        /* io base/limit */
->
--        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-+        (val != barmask &&
->
-+     /* io base/limit */
->
-+        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
->
-/* memory base/limit, prefetchable base/limit and
->
-io base/limit upper 16 */
->
--        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-+        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
->
->
-/* vga enable */
->
-ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
---
->
-1.8.3.1
->
->
->
-
->
------Original Message-----
->
-From: Michael S. Tsirkin [
-mailto:address@hidden
->
-Sent: Monday, January 07, 2019 11:06 PM
->
-To: xuyandong <address@hidden>
->
-Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu-
->
-address@hidden; Zhanghailiang <address@hidden>;
->
-wangxin (U) <address@hidden>; Huangweidong (C)
->
-<address@hidden>
->
-Subject: Re: [BUG]Unassigned mem write during pci device hot-plug
->
->
-On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote:
->
-> > > > > > > > > > > Hi all,
->
-> > > > > > > > > > >
->
-> > > > > > > > > > >
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > In our test, we configured VM with several
->
-> > > > > > > > > > > pci-bridges and a virtio-net nic been attached
->
-> > > > > > > > > > > with bus 4,
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > After VM is startup, We ping this nic from host to
->
-> > > > > > > > > > > judge if it is working normally. Then, we hot add
->
-> > > > > > > > > > > pci devices to this VM with bus
->
-> > > > > > 0.
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > We  found the virtio-net NIC in bus 4 is not
->
-> > > > > > > > > > > working (can not
->
-> > > > > > > > > > > connect) occasionally, as it kick virtio backend
->
-> > > > > > > > > > > failure with error
->
->
->
-> > > > > But I have another question, if we only fix this problem in
->
-> > > > > the kernel, the Linux version that has been released does not
->
-> > > > > work well on the
->
-> > > > virtualization platform.
->
-> > > > > Is there a way to fix this problem in the backend?
->
->
->
-> Hi Michael,
->
->
->
-> If we want to fix this problem on the backend, it is not enough to
->
-> consider only PCI device hot plugging, because I found that if we use
->
-> a command like "echo 1 > /sys/bus/pci/rescan" in guest, this problem is very
->
-easy to reproduce.
->
->
->
-> From the perspective of device emulation, when guest writes 0xffffffff
->
-> to the BAR, guest just want to get the size of the region but not really
->
-updating the address space.
->
-> So I made the following patch to avoid  update pci mapping.
->
->
->
-> Do you think this make sense?
->
->
->
-> [PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
->
->
->
-> When guest writes 0xffffffff to the BAR, guest just want to get the
->
-> size of the region but not really updating the address space.
->
-> So when guest writes 0xffffffff to BAR, we need avoid
->
-> pci_update_mappings or pci_bridge_update_mappings.
->
->
->
-> Signed-off-by: xuyandong <address@hidden>
->
->
-I see how that will address the common case however there are a bunch of
->
-issues here.  First of all it's easy to trigger the update by some other
->
-action like
->
-VM migration.  More importantly it's just possible that guest actually does
->
-want
->
-to set the low 32 bit of the address to all ones.  For example, that is
->
-clearly
->
-listed as a way to disable all devices behind the bridge in the pci to pci
->
-bridge
->
-spec.
-Ok, I see. If I only skip upate when guest writing 0xFFFFFFFF to Prefetcable 
-Base Upper 32 Bits
-to meet the kernel double check problem.
-Do you think there is still risk?
-
->
->
-Given upstream is dragging it's feet I'm open to adding a flag that will help
->
-keep guests going as a temporary measure.
->
-We will need to think about ways to restrict this as much as we can.
->
->
->
-> ---
->
->  hw/pci/pci.c        | 6 ++++--
->
->  hw/pci/pci_bridge.c | 8 +++++---
->
->  2 files changed, 9 insertions(+), 5 deletions(-)
->
->
->
-> diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644
->
-> --- a/hw/pci/pci.c
->
-> +++ b/hw/pci/pci.c
->
-> @@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d,
->
-> uint32_t addr, uint32_t val_in, int  {
->
->      int i, was_irq_disabled = pci_irq_disabled(d);
->
->      uint32_t val = val_in;
->
-> +    uint64_t barmask = (1 << l*8) - 1;
->
->
->
->      for (i = 0; i < l; val >>= 8, ++i) {
->
->          uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@
->
-> void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in,
->
-int
->
->          d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val &
->
-> wmask);
->
->          d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear
->
-> */
->
->      }
->
-> -    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-> +    if ((val_in != barmask &&
->
-> +   (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
->          ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
->
-> -        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
->
-> +        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
->
->          range_covers_byte(addr, l, PCI_COMMAND))
->
->          pci_update_mappings(d);
->
->
->
-> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index
->
-> ee9dff2..f2bad79 100644
->
-> --- a/hw/pci/pci_bridge.c
->
-> +++ b/hw/pci/pci_bridge.c
->
-> @@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
->
->      PCIBridge *s = PCI_BRIDGE(d);
->
->      uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
->
->      uint16_t newctl;
->
-> +    uint64_t barmask = (1 << len * 8) - 1;
->
->
->
->      pci_default_write_config(d, address, val, len);
->
->
->
->      if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
->
-> -        /* io base/limit */
->
-> -        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-> +        (val != barmask &&
->
-> +   /* io base/limit */
->
-> +        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
->
->
->          /* memory base/limit, prefetchable base/limit and
->
->             io base/limit upper 16 */
->
-> -        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-> +        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
->
->
->
->          /* vga enable */
->
->          ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
-> --
->
-> 1.8.3.1
->
->
->
->
->
->
-
-On Mon, Jan 07, 2019 at 03:28:36PM +0000, xuyandong wrote:
->
->
->
-> -----Original Message-----
->
-> From: Michael S. Tsirkin [
-mailto:address@hidden
->
-> Sent: Monday, January 07, 2019 11:06 PM
->
-> To: xuyandong <address@hidden>
->
-> Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu-
->
-> address@hidden; Zhanghailiang <address@hidden>;
->
-> wangxin (U) <address@hidden>; Huangweidong (C)
->
-> <address@hidden>
->
-> Subject: Re: [BUG]Unassigned mem write during pci device hot-plug
->
->
->
-> On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote:
->
-> > > > > > > > > > > > Hi all,
->
-> > > > > > > > > > > >
->
-> > > > > > > > > > > >
->
-> > > > > > > > > > > >
->
-> > > > > > > > > > > > In our test, we configured VM with several
->
-> > > > > > > > > > > > pci-bridges and a virtio-net nic been attached
->
-> > > > > > > > > > > > with bus 4,
->
-> > > > > > > > > > > >
->
-> > > > > > > > > > > > After VM is startup, We ping this nic from host to
->
-> > > > > > > > > > > > judge if it is working normally. Then, we hot add
->
-> > > > > > > > > > > > pci devices to this VM with bus
->
-> > > > > > > 0.
->
-> > > > > > > > > > > >
->
-> > > > > > > > > > > > We  found the virtio-net NIC in bus 4 is not
->
-> > > > > > > > > > > > working (can not
->
-> > > > > > > > > > > > connect) occasionally, as it kick virtio backend
->
-> > > > > > > > > > > > failure with error
->
-> >
->
-> > > > > > But I have another question, if we only fix this problem in
->
-> > > > > > the kernel, the Linux version that has been released does not
->
-> > > > > > work well on the
->
-> > > > > virtualization platform.
->
-> > > > > > Is there a way to fix this problem in the backend?
->
-> >
->
-> > Hi Michael,
->
-> >
->
-> > If we want to fix this problem on the backend, it is not enough to
->
-> > consider only PCI device hot plugging, because I found that if we use
->
-> > a command like "echo 1 > /sys/bus/pci/rescan" in guest, this problem is
->
-> > very
->
-> easy to reproduce.
->
-> >
->
-> > From the perspective of device emulation, when guest writes 0xffffffff
->
-> > to the BAR, guest just want to get the size of the region but not really
->
-> updating the address space.
->
-> > So I made the following patch to avoid  update pci mapping.
->
-> >
->
-> > Do you think this make sense?
->
-> >
->
-> > [PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
->
-> >
->
-> > When guest writes 0xffffffff to the BAR, guest just want to get the
->
-> > size of the region but not really updating the address space.
->
-> > So when guest writes 0xffffffff to BAR, we need avoid
->
-> > pci_update_mappings or pci_bridge_update_mappings.
->
-> >
->
-> > Signed-off-by: xuyandong <address@hidden>
->
->
->
-> I see how that will address the common case however there are a bunch of
->
-> issues here.  First of all it's easy to trigger the update by some other
->
-> action like
->
-> VM migration.  More importantly it's just possible that guest actually does
->
-> want
->
-> to set the low 32 bit of the address to all ones.  For example, that is
->
-> clearly
->
-> listed as a way to disable all devices behind the bridge in the pci to pci
->
-> bridge
->
-> spec.
->
->
-Ok, I see. If I only skip upate when guest writing 0xFFFFFFFF to Prefetcable
->
-Base Upper 32 Bits
->
-to meet the kernel double check problem.
->
-Do you think there is still risk?
-Well it's non zero since spec says such a write should disable all
-accesses. Just an idea: why not add an option to disable upper 32 bit?
-That is ugly and limits space but spec compliant.
-
->
->
->
-> Given upstream is dragging it's feet I'm open to adding a flag that will
->
-> help
->
-> keep guests going as a temporary measure.
->
-> We will need to think about ways to restrict this as much as we can.
->
->
->
->
->
-> > ---
->
-> >  hw/pci/pci.c        | 6 ++++--
->
-> >  hw/pci/pci_bridge.c | 8 +++++---
->
-> >  2 files changed, 9 insertions(+), 5 deletions(-)
->
-> >
->
-> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644
->
-> > --- a/hw/pci/pci.c
->
-> > +++ b/hw/pci/pci.c
->
-> > @@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d,
->
-> > uint32_t addr, uint32_t val_in, int  {
->
-> >      int i, was_irq_disabled = pci_irq_disabled(d);
->
-> >      uint32_t val = val_in;
->
-> > +    uint64_t barmask = (1 << l*8) - 1;
->
-> >
->
-> >      for (i = 0; i < l; val >>= 8, ++i) {
->
-> >          uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@
->
-> > void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t
->
-> > val_in,
->
-> int
->
-> >          d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val &
->
-> > wmask);
->
-> >          d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to
->
-> > Clear */
->
-> >      }
->
-> > -    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-> > +    if ((val_in != barmask &&
->
-> > + (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-> >          ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
->
-> > -        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
->
-> > +        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
->
-> >          range_covers_byte(addr, l, PCI_COMMAND))
->
-> >          pci_update_mappings(d);
->
-> >
->
-> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index
->
-> > ee9dff2..f2bad79 100644
->
-> > --- a/hw/pci/pci_bridge.c
->
-> > +++ b/hw/pci/pci_bridge.c
->
-> > @@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
->
-> >      PCIBridge *s = PCI_BRIDGE(d);
->
-> >      uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
->
-> >      uint16_t newctl;
->
-> > +    uint64_t barmask = (1 << len * 8) - 1;
->
-> >
->
-> >      pci_default_write_config(d, address, val, len);
->
-> >
->
-> >      if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
-> >
->
-> > -        /* io base/limit */
->
-> > -        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-> > +        (val != barmask &&
->
-> > + /* io base/limit */
->
-> > +        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-> >
->
-> >          /* memory base/limit, prefetchable base/limit and
->
-> >             io base/limit upper 16 */
->
-> > -        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-> > +        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
->
-> >
->
-> >          /* vga enable */
->
-> >          ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
-> > --
->
-> > 1.8.3.1
->
-> >
->
-> >
->
-> >
-
->
------Original Message-----
->
-From: xuyandong
->
-Sent: Monday, January 07, 2019 10:37 PM
->
-To: 'Michael S. Tsirkin' <address@hidden>
->
-Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu-
->
-address@hidden; Zhanghailiang <address@hidden>;
->
-wangxin (U) <address@hidden>; Huangweidong (C)
->
-<address@hidden>
->
-Subject: RE: [BUG]Unassigned mem write during pci device hot-plug
->
->
-> > > > > > > > > > Hi all,
->
-> > > > > > > > > >
->
-> > > > > > > > > >
->
-> > > > > > > > > >
->
-> > > > > > > > > > In our test, we configured VM with several
->
-> > > > > > > > > > pci-bridges and a virtio-net nic been attached with
->
-> > > > > > > > > > bus 4,
->
-> > > > > > > > > >
->
-> > > > > > > > > > After VM is startup, We ping this nic from host to
->
-> > > > > > > > > > judge if it is working normally. Then, we hot add
->
-> > > > > > > > > > pci devices to this VM with bus
->
-> > > > > 0.
->
-> > > > > > > > > >
->
-> > > > > > > > > > We  found the virtio-net NIC in bus 4 is not working
->
-> > > > > > > > > > (can not
->
-> > > > > > > > > > connect) occasionally, as it kick virtio backend
->
-> > > > > > > > > > failure with error
->
->
-> > > > But I have another question, if we only fix this problem in the
->
-> > > > kernel, the Linux version that has been released does not work
->
-> > > > well on the
->
-> > > virtualization platform.
->
-> > > > Is there a way to fix this problem in the backend?
->
-> > >
->
-> > > There could we a way to work around this.
->
-> > > Does below help?
->
-> >
->
-> > I am sorry to tell you, I tested this patch and it doesn't work fine.
->
-> >
->
-> > >
->
-> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index
->
-> > > 236a20eaa8..7834cac4b0 100644
->
-> > > --- a/hw/i386/acpi-build.c
->
-> > > +++ b/hw/i386/acpi-build.c
->
-> > > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml
->
-> > > *parent_scope, PCIBus *bus,
->
-> > >
->
-> > >          aml_append(method, aml_store(aml_int(bsel_val),
->
-> aml_name("BNUM")));
->
-> > >          aml_append(method,
->
-> > > -            aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device
->
-Check
->
-> */)
->
-> > > +            aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /*
->
-> > > + Device Check Light */)
->
-> > >          );
->
-> > >          aml_append(method,
->
-> > >              aml_call2("DVNT", aml_name("PCID"), aml_int(3)/*
->
-> > > Eject Request */)
->
->
->
->
->
-> Oh I see, another bug:
->
->
->
->         case ACPI_NOTIFY_DEVICE_CHECK_LIGHT:
->
->                 acpi_handle_debug(handle,
->
-> "ACPI_NOTIFY_DEVICE_CHECK_LIGHT event\n");
->
->                 /* TBD: Exactly what does 'light' mean? */
->
->                 break;
->
->
->
-> And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32
->
-> type) and friends all just ignore this event type.
->
->
->
->
->
->
->
-> --
->
-> MST
->
->
-Hi Michael,
->
->
-If we want to fix this problem on the backend, it is not enough to consider
->
-only
->
-PCI device hot plugging, because I found that if we use a command like "echo
->
-1 >
->
-/sys/bus/pci/rescan" in guest, this problem is very easy to reproduce.
->
->
-From the perspective of device emulation, when guest writes 0xffffffff to the
->
-BAR, guest just want to get the size of the region but not really updating the
->
-address space.
->
-So I made the following patch to avoid  update pci mapping.
->
->
-Do you think this make sense?
->
->
-[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR
->
->
-When guest writes 0xffffffff to the BAR, guest just want to get the size of
->
-the
->
-region but not really updating the address space.
->
-So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings or
->
-pci_bridge_update_mappings.
->
->
-Signed-off-by: xuyandong <address@hidden>
->
----
->
-hw/pci/pci.c        | 6 ++++--
->
-hw/pci/pci_bridge.c | 8 +++++---
->
-2 files changed, 9 insertions(+), 5 deletions(-)
->
->
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644
->
---- a/hw/pci/pci.c
->
-+++ b/hw/pci/pci.c
->
-@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t
->
-addr, uint32_t val_in, int  {
->
-int i, was_irq_disabled = pci_irq_disabled(d);
->
-uint32_t val = val_in;
->
-+    uint64_t barmask = (1 << l*8) - 1;
->
->
-for (i = 0; i < l; val >>= 8, ++i) {
->
-uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@ void
->
-pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
->
-d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
->
-d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
->
-}
->
--    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-+    if ((val_in != barmask &&
->
-+     (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
->
-ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
->
--        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
->
-+        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
->
-range_covers_byte(addr, l, PCI_COMMAND))
->
-pci_update_mappings(d);
->
->
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index ee9dff2..f2bad79
->
-100644
->
---- a/hw/pci/pci_bridge.c
->
-+++ b/hw/pci/pci_bridge.c
->
-@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d,
->
-PCIBridge *s = PCI_BRIDGE(d);
->
-uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
->
-uint16_t newctl;
->
-+    uint64_t barmask = (1 << len * 8) - 1;
->
->
-pci_default_write_config(d, address, val, len);
->
->
-if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
->
->
--        /* io base/limit */
->
--        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
-+        (val != barmask &&
->
-+     /* io base/limit */
->
-+        (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
->
->
-/* memory base/limit, prefetchable base/limit and
->
-io base/limit upper 16 */
->
--        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
->
-+        ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) ||
->
->
-/* vga enable */
->
-ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
->
---
->
-1.8.3.1
->
->
-Sorry, please ignore the patch above.
-
-Here is the patch I want to post:
-
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index 56b13b3..38a300f 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
-addr, uint32_t val_in, int
- {
-     int i, was_irq_disabled = pci_irq_disabled(d);
-     uint32_t val = val_in;
-+    uint64_t barmask = ((uint64_t)1 << l*8) - 1;
- 
-     for (i = 0; i < l; val >>= 8, ++i) {
-         uint8_t wmask = d->wmask[addr + i];
-@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
-addr, uint32_t val_in, int
-         d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
-         d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
-     }
--    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
-+    if ((val_in != barmask &&
-+       (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
-         ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
--        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
-+        ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) ||
-         range_covers_byte(addr, l, PCI_COMMAND))
-         pci_update_mappings(d);
- 
-diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
-index ee9dff2..b8f7d48 100644
---- a/hw/pci/pci_bridge.c
-+++ b/hw/pci/pci_bridge.c
-@@ -253,20 +253,22 @@ void pci_bridge_write_config(PCIDevice *d,
-     PCIBridge *s = PCI_BRIDGE(d);
-     uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
-     uint16_t newctl;
-+    uint64_t barmask = ((uint64_t)1 << len * 8) - 1;
- 
-     pci_default_write_config(d, address, val, len);
- 
-     if (ranges_overlap(address, len, PCI_COMMAND, 2) ||
- 
--        /* io base/limit */
--        ranges_overlap(address, len, PCI_IO_BASE, 2) ||
-+        /* vga enable */
-+        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2) ||
- 
--        /* memory base/limit, prefetchable base/limit and
--           io base/limit upper 16 */
--        ranges_overlap(address, len, PCI_MEMORY_BASE, 20) ||
-+        (val != barmask &&
-+        /* io base/limit */
-+         (ranges_overlap(address, len, PCI_IO_BASE, 2) ||
- 
--        /* vga enable */
--        ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) {
-+         /* memory base/limit, prefetchable base/limit and
-+            io base/limit upper 16 */
-+         ranges_overlap(address, len, PCI_MEMORY_BASE, 20)))) {
-         pci_bridge_update_mappings(s);
-     }
- 
--- 
-1.8.3.1
-
diff --git a/results/classifier/008/other/70294255 b/results/classifier/008/other/70294255
deleted file mode 100644
index 04cebede4..000000000
--- a/results/classifier/008/other/70294255
+++ /dev/null
@@ -1,1071 +0,0 @@
-PID: 0.859
-semantic: 0.858
-socket: 0.858
-device: 0.857
-graphic: 0.857
-debug: 0.854
-permissions: 0.854
-other: 0.852
-performance: 0.850
-network: 0.846
-vnc: 0.837
-files: 0.832
-boot: 0.811
-KVM: 0.806
-
-[Qemu-devel] 答复: Re:   答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang
-
-hi:
-
-yes.it is better.
-
-And should we delete 
-
-
-
-
-#ifdef WIN32
-
-    QIO_CHANNEL(cioc)->event = CreateEvent(NULL, FALSE, FALSE, NULL)
-
-#endif
-
-
-
-
-in qio_channel_socket_accept?
-
-qio_channel_socket_new already have it.
-
-
-
-
-
-
-
-
-
-
-
-
-原始邮件
-
-
-
-发件人: address@hidden
-收件人:王广10165992
-抄送人: address@hidden address@hidden address@hidden address@hidden
-日 期 :2017年03月22日 15:03
-主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang
-
-
-
-
-
-Hi,
-
-On 2017/3/22 9:42, address@hidden wrote:
-> diff --git a/migration/socket.c b/migration/socket.c
->
->
-> index 13966f1..d65a0ea 100644
->
->
-> --- a/migration/socket.c
->
->
-> +++ b/migration/socket.c
->
->
-> @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
->
->
->       }
->
->
->
->
->
->       trace_migration_socket_incoming_accepted()
->
->
->
->
->
->       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
->
->
-> +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
->
->
->       migration_channel_process_incoming(migrate_get_current(),
->
->
->                                          QIO_CHANNEL(sioc))
->
->
->       object_unref(OBJECT(sioc))
->
->
->
->
-> Is this patch ok?
->
-
-Yes, i think this works, but a better way maybe to call 
-qio_channel_set_feature()
-in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
-socket accept fd,
-Or fix it by this:
-
-diff --git a/io/channel-socket.c b/io/channel-socket.c
-index f546c68..ce6894c 100644
---- a/io/channel-socket.c
-+++ b/io/channel-socket.c
-@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
-                            Error **errp)
-  {
-      QIOChannelSocket *cioc
--
--    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
--    cioc->fd = -1
-+
-+    cioc = qio_channel_socket_new()
-      cioc->remoteAddrLen = sizeof(ioc->remoteAddr)
-      cioc->localAddrLen = sizeof(ioc->localAddr)
-
-
-Thanks,
-Hailiang
-
-> I have test it . The test could not hang any more.
->
->
->
->
->
->
->
->
->
->
->
->
-> 原始邮件
->
->
->
-> 发件人: address@hidden
-> 收件人: address@hidden address@hidden
-> 抄送人: address@hidden address@hidden address@hidden
-> 日 期 :2017年03月22日 09:11
-> 主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang
->
->
->
->
->
-> On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-> > * Hailiang Zhang (address@hidden) wrote:
-> >> Hi,
-> >>
-> >> Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-> >>
-> >> Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-> >> case COLO thread/incoming thread is stuck in read/write() while do 
-failover,
-> >> but it didn't take effect, because all the fd used by COLO (also migration)
-> >> has been wrapped by qio channel, and it will not call the shutdown API if
-> >> we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-> >>
-> >> Cc: Dr. David Alan Gilbert address@hidden
-> >>
-> >> I doubted migration cancel has the same problem, it may be stuck in write()
-> >> if we tried to cancel migration.
-> >>
-> >> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
-Error **errp)
-> >> {
-> >>      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
-> >>      migration_channel_connect(s, ioc, NULL)
-> >>      ... ...
-> >> We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-> >> and the
-> >> migrate_fd_cancel()
-> >> {
-> >>   ... ...
-> >>      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
-> >>          qemu_file_shutdown(f)  --> This will not take effect. No ?
-> >>      }
-> >> }
-> >
-> > (cc'd in Daniel Berrange).
-> > I see that we call qio_channel_set_feature(ioc, 
-QIO_CHANNEL_FEATURE_SHUTDOWN) at the
-> > top of qio_channel_socket_new  so I think that's safe isn't it?
-> >
->
-> Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
->
-> > Dave
-> >
-> >> Thanks,
-> >> Hailiang
-> >>
-> >> On 2017/3/21 16:10, address@hidden wrote:
-> >>> Thank you。
-> >>>
-> >>> I have test aready。
-> >>>
-> >>> When the Primary Node panic,the Secondary Node qemu hang at the same 
-place。
-> >>>
-> >>> Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary Node 
-qemu will not produce the problem,but Primary Node panic can。
-> >>>
-> >>> I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-> >>>
-> >>>
-> >>> when failover,channel_shutdown could not shut down the channel.
-> >>>
-> >>>
-> >>> so the colo_process_incoming_thread will hang at recvmsg.
-> >>>
-> >>>
-> >>> I test a patch:
-> >>>
-> >>>
-> >>> diff --git a/migration/socket.c b/migration/socket.c
-> >>>
-> >>>
-> >>> index 13966f1..d65a0ea 100644
-> >>>
-> >>>
-> >>> --- a/migration/socket.c
-> >>>
-> >>>
-> >>> +++ b/migration/socket.c
-> >>>
-> >>>
-> >>> @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-> >>>
-> >>>
-> >>>        }
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>        trace_migration_socket_incoming_accepted()
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>        qio_channel_set_name(QIO_CHANNEL(sioc), 
-"migration-socket-incoming")
-> >>>
-> >>>
-> >>> +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN)
-> >>>
-> >>>
-> >>>        migration_channel_process_incoming(migrate_get_current(),
-> >>>
-> >>>
-> >>>                                           QIO_CHANNEL(sioc))
-> >>>
-> >>>
-> >>>        object_unref(OBJECT(sioc))
-> >>>
-> >>>
-> >>>
-> >>>
-> >>> My test will not hang any more.
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>> 原始邮件
-> >>>
-> >>>
-> >>>
-> >>> 发件人: address@hidden
-> >>> 收件人:王广10165992 address@hidden
-> >>> 抄送人: address@hidden address@hidden
-> >>> 日 期 :2017年03月21日 15:58
-> >>> 主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>> Hi,Wang.
-> >>>
-> >>> You can test this branch:
-> >>>
-> >>>
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-> >>>
-> >>> and please follow wiki ensure your own configuration correctly.
-> >>>
-> >>>
-http://wiki.qemu-project.org/Features/COLO
-> >>>
-> >>>
-> >>> Thanks
-> >>>
-> >>> Zhang Chen
-> >>>
-> >>>
-> >>> On 03/21/2017 03:27 PM, address@hidden wrote:
-> >>> >
-> >>> > hi.
-> >>> >
-> >>> > I test the git qemu master have the same problem.
-> >>> >
-> >>> > (gdb) bt
-> >>> >
-> >>> > #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-> >>> > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-> >>> >
-> >>> > #1  0x00007f658e4aa0c2 in qio_channel_read
-> >>> > (address@hidden, address@hidden "",
-> >>> > address@hidden, address@hidden) at io/channel.c:114
-> >>> >
-> >>> > #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
-> >>> > buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
-> >>> > migration/qemu-file-channel.c:78
-> >>> >
-> >>> > #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-> >>> > migration/qemu-file.c:295
-> >>> >
-> >>> > #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-> >>> > address@hidden) at migration/qemu-file.c:555
-> >>> >
-> >>> > #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-> >>> > migration/qemu-file.c:568
-> >>> >
-> >>> > #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-> >>> > migration/qemu-file.c:648
-> >>> >
-> >>> > #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-> >>> > address@hidden) at migration/colo.c:244
-> >>> >
-> >>> > #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
-> >>> > out>, address@hidden,
-> >>> > address@hidden)
-> >>> >
-> >>> >     at migration/colo.c:264
-> >>> >
-> >>> > #9  0x00007f658e3e740e in colo_process_incoming_thread
-> >>> > (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
-> >>> >
-> >>> > #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-> >>> >
-> >>> > #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-> >>> >
-> >>> > (gdb) p ioc->name
-> >>> >
-> >>> > $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-> >>> >
-> >>> > (gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-> >>> >
-> >>> > $3 = 0
-> >>> >
-> >>> >
-> >>> > (gdb) bt
-> >>> >
-> >>> > #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-> >>> > condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-> >>> >
-> >>> > #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
-> >>> > gmain.c:3054
-> >>> >
-> >>> > #2  g_main_context_dispatch (context=<optimized out>,
-> >>> > address@hidden) at gmain.c:3630
-> >>> >
-> >>> > #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-> >>> >
-> >>> > #4  os_host_main_loop_wait (timeout=<optimized out>) at
-> >>> > util/main-loop.c:258
-> >>> >
-> >>> > #5  main_loop_wait (address@hidden) at
-> >>> > util/main-loop.c:506
-> >>> >
-> >>> > #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-> >>> >
-> >>> > #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
-> >>> > out>) at vl.c:4709
-> >>> >
-> >>> > (gdb) p ioc->features
-> >>> >
-> >>> > $1 = 6
-> >>> >
-> >>> > (gdb) p ioc->name
-> >>> >
-> >>> > $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-> >>> >
-> >>> >
-> >>> > May be socket_accept_incoming_migration should
-> >>> > call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-> >>> >
-> >>> >
-> >>> > thank you.
-> >>> >
-> >>> >
-> >>> >
-> >>> >
-> >>> >
-> >>> > 原始邮件
-> >>> > address@hidden
-> >>> > address@hidden
-> >>> > address@hidden@huawei.com>
-> >>> > *日 期 :*2017年03月16日 14:46
-> >>> > *主 题 :**Re: [Qemu-devel] COLO failover hang*
-> >>> >
-> >>> >
-> >>> >
-> >>> >
-> >>> > On 03/15/2017 05:06 PM, wangguang wrote:
-> >>> > >   am testing QEMU COLO feature described here [QEMU
-> >>> > > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-> >>> > >
-> >>> > > When the Primary Node panic,the Secondary Node qemu hang.
-> >>> > > hang at recvmsg in qio_channel_socket_readv.
-> >>> > > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-> >>> > > "x-colo-lost-heartbeat" } in Secondary VM's
-> >>> > > monitor,the  Secondary Node qemu still hang at recvmsg .
-> >>> > >
-> >>> > > I found that the colo in qemu is not complete yet.
-> >>> > > Do the colo have any plan for development?
-> >>> >
-> >>> > Yes, We are developing. You can see some of patch we pushing.
-> >>> >
-> >>> > > Has anyone ever run it successfully? Any help is appreciated!
-> >>> >
-> >>> > In our internal version can run it successfully,
-> >>> > The failover detail you can ask Zhanghailiang for help.
-> >>> > Next time if you have some question about COLO,
-> >>> > please cc me and zhanghailiang address@hidden
-> >>> >
-> >>> >
-> >>> > Thanks
-> >>> > Zhang Chen
-> >>> >
-> >>> >
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> > > centos7.2+qemu2.7.50
-> >>> > > (gdb) bt
-> >>> > > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-> >>> > > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized 
-out>,
-> >>> > > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, 
-errp=0x0) at
-> >>> > > io/channel-socket.c:497
-> >>> > > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-> >>> > > address@hidden "", address@hidden,
-> >>> > > address@hidden) at io/channel.c:97
-> >>> > > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
-> >>> > > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
-> >>> > > migration/qemu-file-channel.c:78
-> >>> > > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-> >>> > > migration/qemu-file.c:257
-> >>> > > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-> >>> > > address@hidden) at migration/qemu-file.c:510
-> >>> > > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-> >>> > > migration/qemu-file.c:523
-> >>> > > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-> >>> > > migration/qemu-file.c:603
-> >>> > > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-> >>> > > address@hidden) at migration/colo.c:215
-> >>> > > #9  0x00007f3e0327250d in colo_wait_handle_message 
-(errp=0x7f3d62bfaa48,
-> >>> > > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
-> >>> > > migration/colo.c:546
-> >>> > > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-> >>> > > migration/colo.c:649
-> >>> > > #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-> >>> > > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> > > --
-> >>> > > View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-> >>> > > Sent from the Developer mailing list archive at Nabble.com.
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> >
-> >>> > --
-> >>> > Thanks
-> >>> > Zhang Chen
-> >>> >
-> >>> >
-> >>> >
-> >>> >
-> >>> >
-> >>>
-> >>
-> > --
-> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
-> >
-> > .
-> >
->
-
-On 2017/3/22 16:09, address@hidden wrote:
-hi:
-
-yes.it is better.
-
-And should we delete
-Yes, you are right.
-#ifdef WIN32
-
-     QIO_CHANNEL(cioc)->event = CreateEvent(NULL, FALSE, FALSE, NULL)
-
-#endif
-
-
-
-
-in qio_channel_socket_accept?
-
-qio_channel_socket_new already have it.
-
-
-
-
-
-
-
-
-
-
-
-
-原始邮件
-
-
-
-发件人: address@hidden
-收件人:王广10165992
-抄送人: address@hidden address@hidden address@hidden address@hidden
-日 期 :2017年03月22日 15:03
-主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: 答复: Re: [BUG]COLO failover hang
-
-
-
-
-
-Hi,
-
-On 2017/3/22 9:42, address@hidden wrote:
-> diff --git a/migration/socket.c b/migration/socket.c
->
->
-> index 13966f1..d65a0ea 100644
->
->
-> --- a/migration/socket.c
->
->
-> +++ b/migration/socket.c
->
->
-> @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
->
->
->       }
->
->
->
->
->
->       trace_migration_socket_incoming_accepted()
->
->
->
->
->
->       qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming")
->
->
-> +    qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN)
->
->
->       migration_channel_process_incoming(migrate_get_current(),
->
->
->                                          QIO_CHANNEL(sioc))
->
->
->       object_unref(OBJECT(sioc))
->
->
->
->
-> Is this patch ok?
->
-
-Yes, i think this works, but a better way maybe to call 
-qio_channel_set_feature()
-in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the 
-socket accept fd,
-Or fix it by this:
-
-diff --git a/io/channel-socket.c b/io/channel-socket.c
-index f546c68..ce6894c 100644
---- a/io/channel-socket.c
-+++ b/io/channel-socket.c
-@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
-                             Error **errp)
-   {
-       QIOChannelSocket *cioc
--
--    cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET))
--    cioc->fd = -1
-+
-+    cioc = qio_channel_socket_new()
-       cioc->remoteAddrLen = sizeof(ioc->remoteAddr)
-       cioc->localAddrLen = sizeof(ioc->localAddr)
-
-
-Thanks,
-Hailiang
-
-> I have test it . The test could not hang any more.
->
->
->
->
->
->
->
->
->
->
->
->
-> 原始邮件
->
->
->
-> 发件人: address@hidden
-> 收件人: address@hidden address@hidden
-> 抄送人: address@hidden address@hidden address@hidden
-> 日 期 :2017年03月22日 09:11
-> 主 题 :Re: [Qemu-devel]  答复: Re:  答复: Re: [BUG]COLO failover hang
->
->
->
->
->
-> On 2017/3/21 19:56, Dr. David Alan Gilbert wrote:
-> > * Hailiang Zhang (address@hidden) wrote:
-> >> Hi,
-> >>
-> >> Thanks for reporting this, and i confirmed it in my test, and it is a bug.
-> >>
-> >> Though we tried to call qemu_file_shutdown() to shutdown the related fd, in
-> >> case COLO thread/incoming thread is stuck in read/write() while do 
-failover,
-> >> but it didn't take effect, because all the fd used by COLO (also migration)
-> >> has been wrapped by qio channel, and it will not call the shutdown API if
-> >> we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN).
-> >>
-> >> Cc: Dr. David Alan Gilbert address@hidden
-> >>
-> >> I doubted migration cancel has the same problem, it may be stuck in write()
-> >> if we tried to cancel migration.
-> >>
-> >> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, 
-Error **errp)
-> >> {
-> >>      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing")
-> >>      migration_channel_connect(s, ioc, NULL)
-> >>      ... ...
-> >> We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN) above,
-> >> and the
-> >> migrate_fd_cancel()
-> >> {
-> >>   ... ...
-> >>      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
-> >>          qemu_file_shutdown(f)  --> This will not take effect. No ?
-> >>      }
-> >> }
-> >
-> > (cc'd in Daniel Berrange).
-> > I see that we call qio_channel_set_feature(ioc, 
-QIO_CHANNEL_FEATURE_SHUTDOWN) at the
-> > top of qio_channel_socket_new  so I think that's safe isn't it?
-> >
->
-> Hmm, you are right, this problem is only exist for the migration incoming fd, 
-thanks.
->
-> > Dave
-> >
-> >> Thanks,
-> >> Hailiang
-> >>
-> >> On 2017/3/21 16:10, address@hidden wrote:
-> >>> Thank you。
-> >>>
-> >>> I have test aready。
-> >>>
-> >>> When the Primary Node panic,the Secondary Node qemu hang at the same 
-place。
-> >>>
-> >>> Incorrding
-http://wiki.qemu-project.org/Features/COLO
-,kill Primary Node 
-qemu will not produce the problem,but Primary Node panic can。
-> >>>
-> >>> I think due to the feature of channel does not support 
-QIO_CHANNEL_FEATURE_SHUTDOWN.
-> >>>
-> >>>
-> >>> when failover,channel_shutdown could not shut down the channel.
-> >>>
-> >>>
-> >>> so the colo_process_incoming_thread will hang at recvmsg.
-> >>>
-> >>>
-> >>> I test a patch:
-> >>>
-> >>>
-> >>> diff --git a/migration/socket.c b/migration/socket.c
-> >>>
-> >>>
-> >>> index 13966f1..d65a0ea 100644
-> >>>
-> >>>
-> >>> --- a/migration/socket.c
-> >>>
-> >>>
-> >>> +++ b/migration/socket.c
-> >>>
-> >>>
-> >>> @@ -147,8 +147,9 @@ static gboolean 
-socket_accept_incoming_migration(QIOChannel *ioc,
-> >>>
-> >>>
-> >>>        }
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>        trace_migration_socket_incoming_accepted()
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>        qio_channel_set_name(QIO_CHANNEL(sioc), 
-"migration-socket-incoming")
-> >>>
-> >>>
-> >>> +    qio_channel_set_feature(QIO_CHANNEL(sioc), 
-QIO_CHANNEL_FEATURE_SHUTDOWN)
-> >>>
-> >>>
-> >>>        migration_channel_process_incoming(migrate_get_current(),
-> >>>
-> >>>
-> >>>                                           QIO_CHANNEL(sioc))
-> >>>
-> >>>
-> >>>        object_unref(OBJECT(sioc))
-> >>>
-> >>>
-> >>>
-> >>>
-> >>> My test will not hang any more.
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>> 原始邮件
-> >>>
-> >>>
-> >>>
-> >>> 发件人: address@hidden
-> >>> 收件人:王广10165992 address@hidden
-> >>> 抄送人: address@hidden address@hidden
-> >>> 日 期 :2017年03月21日 15:58
-> >>> 主 题 :Re: [Qemu-devel]  答复: Re:  [BUG]COLO failover hang
-> >>>
-> >>>
-> >>>
-> >>>
-> >>>
-> >>> Hi,Wang.
-> >>>
-> >>> You can test this branch:
-> >>>
-> >>>
-https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk
-> >>>
-> >>> and please follow wiki ensure your own configuration correctly.
-> >>>
-> >>>
-http://wiki.qemu-project.org/Features/COLO
-> >>>
-> >>>
-> >>> Thanks
-> >>>
-> >>> Zhang Chen
-> >>>
-> >>>
-> >>> On 03/21/2017 03:27 PM, address@hidden wrote:
-> >>> >
-> >>> > hi.
-> >>> >
-> >>> > I test the git qemu master have the same problem.
-> >>> >
-> >>> > (gdb) bt
-> >>> >
-> >>> > #0  qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880,
-> >>> > niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461
-> >>> >
-> >>> > #1  0x00007f658e4aa0c2 in qio_channel_read
-> >>> > (address@hidden, address@hidden "",
-> >>> > address@hidden, address@hidden) at io/channel.c:114
-> >>> >
-> >>> > #2  0x00007f658e3ea990 in channel_get_buffer (opaque=<optimized out>,
-> >>> > buf=0x7f65907cb838 "", pos=<optimized out>, size=32768) at
-> >>> > migration/qemu-file-channel.c:78
-> >>> >
-> >>> > #3  0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at
-> >>> > migration/qemu-file.c:295
-> >>> >
-> >>> > #4  0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden,
-> >>> > address@hidden) at migration/qemu-file.c:555
-> >>> >
-> >>> > #5  0x00007f658e3ea34b in qemu_get_byte (address@hidden) at
-> >>> > migration/qemu-file.c:568
-> >>> >
-> >>> > #6  0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at
-> >>> > migration/qemu-file.c:648
-> >>> >
-> >>> > #7  0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800,
-> >>> > address@hidden) at migration/colo.c:244
-> >>> >
-> >>> > #8  0x00007f658e3e681e in colo_receive_check_message (f=<optimized
-> >>> > out>, address@hidden,
-> >>> > address@hidden)
-> >>> >
-> >>> >     at migration/colo.c:264
-> >>> >
-> >>> > #9  0x00007f658e3e740e in colo_process_incoming_thread
-> >>> > (opaque=0x7f658eb30360 <mis_current.31286>) at migration/colo.c:577
-> >>> >
-> >>> > #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0
-> >>> >
-> >>> > #11 0x00007f65881983ed in clone () from /lib64/libc.so.6
-> >>> >
-> >>> > (gdb) p ioc->name
-> >>> >
-> >>> > $2 = 0x7f658ff7d5c0 "migration-socket-incoming"
-> >>> >
-> >>> > (gdb) p ioc->features        Do not support QIO_CHANNEL_FEATURE_SHUTDOWN
-> >>> >
-> >>> > $3 = 0
-> >>> >
-> >>> >
-> >>> > (gdb) bt
-> >>> >
-> >>> > #0  socket_accept_incoming_migration (ioc=0x7fdcceeafa90,
-> >>> > condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137
-> >>> >
-> >>> > #1  0x00007fdcc6966350 in g_main_dispatch (context=<optimized out>) at
-> >>> > gmain.c:3054
-> >>> >
-> >>> > #2  g_main_context_dispatch (context=<optimized out>,
-> >>> > address@hidden) at gmain.c:3630
-> >>> >
-> >>> > #3  0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213
-> >>> >
-> >>> > #4  os_host_main_loop_wait (timeout=<optimized out>) at
-> >>> > util/main-loop.c:258
-> >>> >
-> >>> > #5  main_loop_wait (address@hidden) at
-> >>> > util/main-loop.c:506
-> >>> >
-> >>> > #6  0x00007fdccb526187 in main_loop () at vl.c:1898
-> >>> >
-> >>> > #7  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
-> >>> > out>) at vl.c:4709
-> >>> >
-> >>> > (gdb) p ioc->features
-> >>> >
-> >>> > $1 = 6
-> >>> >
-> >>> > (gdb) p ioc->name
-> >>> >
-> >>> > $2 = 0x7fdcce1b1ab0 "migration-socket-listener"
-> >>> >
-> >>> >
-> >>> > May be socket_accept_incoming_migration should
-> >>> > call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)??
-> >>> >
-> >>> >
-> >>> > thank you.
-> >>> >
-> >>> >
-> >>> >
-> >>> >
-> >>> >
-> >>> > 原始邮件
-> >>> > address@hidden
-> >>> > address@hidden
-> >>> > address@hidden@huawei.com>
-> >>> > *日 期 :*2017年03月16日 14:46
-> >>> > *主 题 :**Re: [Qemu-devel] COLO failover hang*
-> >>> >
-> >>> >
-> >>> >
-> >>> >
-> >>> > On 03/15/2017 05:06 PM, wangguang wrote:
-> >>> > >   am testing QEMU COLO feature described here [QEMU
-> >>> > > Wiki](
-http://wiki.qemu-project.org/Features/COLO
-).
-> >>> > >
-> >>> > > When the Primary Node panic,the Secondary Node qemu hang.
-> >>> > > hang at recvmsg in qio_channel_socket_readv.
-> >>> > > And  I run  { 'execute': 'nbd-server-stop' } and { "execute":
-> >>> > > "x-colo-lost-heartbeat" } in Secondary VM's
-> >>> > > monitor,the  Secondary Node qemu still hang at recvmsg .
-> >>> > >
-> >>> > > I found that the colo in qemu is not complete yet.
-> >>> > > Do the colo have any plan for development?
-> >>> >
-> >>> > Yes, We are developing. You can see some of patch we pushing.
-> >>> >
-> >>> > > Has anyone ever run it successfully? Any help is appreciated!
-> >>> >
-> >>> > In our internal version can run it successfully,
-> >>> > The failover detail you can ask Zhanghailiang for help.
-> >>> > Next time if you have some question about COLO,
-> >>> > please cc me and zhanghailiang address@hidden
-> >>> >
-> >>> >
-> >>> > Thanks
-> >>> > Zhang Chen
-> >>> >
-> >>> >
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> > > centos7.2+qemu2.7.50
-> >>> > > (gdb) bt
-> >>> > > #0  0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0
-> >>> > > #1  0x00007f3e0332b738 in qio_channel_socket_readv (ioc=<optimized 
-out>,
-> >>> > > iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, 
-errp=0x0) at
-> >>> > > io/channel-socket.c:497
-> >>> > > #2  0x00007f3e03329472 in qio_channel_read (address@hidden,
-> >>> > > address@hidden "", address@hidden,
-> >>> > > address@hidden) at io/channel.c:97
-> >>> > > #3  0x00007f3e032750e0 in channel_get_buffer (opaque=<optimized out>,
-> >>> > > buf=0x7f3e05910f38 "", pos=<optimized out>, size=32768) at
-> >>> > > migration/qemu-file-channel.c:78
-> >>> > > #4  0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at
-> >>> > > migration/qemu-file.c:257
-> >>> > > #5  0x00007f3e03274a41 in qemu_peek_byte (address@hidden,
-> >>> > > address@hidden) at migration/qemu-file.c:510
-> >>> > > #6  0x00007f3e03274aab in qemu_get_byte (address@hidden) at
-> >>> > > migration/qemu-file.c:523
-> >>> > > #7  0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at
-> >>> > > migration/qemu-file.c:603
-> >>> > > #8  0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00,
-> >>> > > address@hidden) at migration/colo.c:215
-> >>> > > #9  0x00007f3e0327250d in colo_wait_handle_message 
-(errp=0x7f3d62bfaa48,
-> >>> > > checkpoint_request=<synthetic pointer>, f=<optimized out>) at
-> >>> > > migration/colo.c:546
-> >>> > > #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at
-> >>> > > migration/colo.c:649
-> >>> > > #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0
-> >>> > > #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> > > --
-> >>> > > View this message in context:
-http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html
-> >>> > > Sent from the Developer mailing list archive at Nabble.com.
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> > >
-> >>> >
-> >>> > --
-> >>> > Thanks
-> >>> > Zhang Chen
-> >>> >
-> >>> >
-> >>> >
-> >>> >
-> >>> >
-> >>>
-> >>
-> > --
-> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
-> >
-> > .
-> >
->
-
diff --git a/results/classifier/008/other/70416488 b/results/classifier/008/other/70416488
deleted file mode 100644
index 5f39bbe71..000000000
--- a/results/classifier/008/other/70416488
+++ /dev/null
@@ -1,1189 +0,0 @@
-other: 0.980
-semantic: 0.975
-graphic: 0.972
-debug: 0.967
-device: 0.945
-performance: 0.939
-permissions: 0.922
-PID: 0.916
-vnc: 0.910
-KVM: 0.908
-boot: 0.897
-network: 0.881
-socket: 0.870
-files: 0.858
-
-[Bug Report] smmuv3 event 0x10 report when running virtio-blk-pci
-
-Hi All,
-
-When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
-during kernel booting up.
-
-qemu command which I use is as below:
-
-qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
--kernel Image -initrd minifs.cpio.gz \
--enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
--append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
--device 
-pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 
-\
--device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
--device 
-virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
--drive file=/home/boot.img,if=none,id=drive0,format=raw
-
-smmuv3 event 0x10 log:
-[...]
-[    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
-[    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
-[    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
-[    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks (1.07 
-GB/1.00 GiB)
-[    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
-[    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
-[    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
-[    1.968381] clk: Disabling unused clocks
-[    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
-[    1.968990] PM: genpd: Disabling unused power domains
-[    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.969814] ALSA device list:
-[    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.970471]   No soundcards found.
-[    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
-[    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
-[    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
-[    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
-[    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.975005] Freeing unused kernel memory: 10112K
-[    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
-[    1.975442] Run init as init process
-
-Another information is that if "maxcpus=3" is removed from the kernel command 
-line,
-it will be OK.
-
-I am not sure if there is a bug about vsmmu. It will be very appreciated if 
-anyone
-know this issue or can take a look at it.
-
-Thanks,
-Zhou
-
-On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->
-Hi All,
->
->
-When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
-during kernel booting up.
-Does it still do this if you either:
- (1) use the v9.1.0 release (commit fd1952d814da)
- (2) use "-machine virt-9.1" instead of "-machine virt"
-
-?
-
-My suspicion is that this will have started happening now that
-we expose an SMMU with two-stage translation support to the guest
-in the "virt" machine type (which we do not if you either
-use virt-9.1 or in the v9.1.0 release).
-
-I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
-the two-stage support).
-
->
-qemu command which I use is as below:
->
->
-qemu-system-aarch64 -machine
->
-virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
--kernel Image -initrd minifs.cpio.gz \
->
--enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
--append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
--device
->
-pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
-\
->
--device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
--device
->
-virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
--drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->
-smmuv3 event 0x10 log:
->
-[...]
->
-[    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
-[    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
-[    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
-[    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
-(1.07 GB/1.00 GiB)
->
-[    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
-[    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.968381] clk: Disabling unused clocks
->
-[    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.968990] PM: genpd: Disabling unused power domains
->
-[    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.969814] ALSA device list:
->
-[    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.970471]   No soundcards found.
->
-[    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.975005] Freeing unused kernel memory: 10112K
->
-[    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.975442] Run init as init process
->
->
-Another information is that if "maxcpus=3" is removed from the kernel command
->
-line,
->
-it will be OK.
->
->
-I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
-anyone
->
-know this issue or can take a look at it.
-thanks
--- PMM
-
-On 2024/9/9 22:31, Peter Maydell wrote:
->
-On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->
->
-> Hi All,
->
->
->
-> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
-> during kernel booting up.
->
->
-Does it still do this if you either:
->
-(1) use the v9.1.0 release (commit fd1952d814da)
->
-(2) use "-machine virt-9.1" instead of "-machine virt"
-I tested above two cases, the problem is still there.
-
->
->
-?
->
->
-My suspicion is that this will have started happening now that
->
-we expose an SMMU with two-stage translation support to the guest
->
-in the "virt" machine type (which we do not if you either
->
-use virt-9.1 or in the v9.1.0 release).
->
->
-I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
->
-the two-stage support).
->
->
-> qemu command which I use is as below:
->
->
->
-> qemu-system-aarch64 -machine
->
-> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
-> -kernel Image -initrd minifs.cpio.gz \
->
-> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
-> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
-> -device
->
-> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
->  \
->
-> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
-> -device
->
-> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
-> -drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->
->
-> smmuv3 event 0x10 log:
->
-> [...]
->
-> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
-> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
-> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
-> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
-> (1.07 GB/1.00 GiB)
->
-> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
-> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.968381] clk: Disabling unused clocks
->
-> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.968990] PM: genpd: Disabling unused power domains
->
-> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.969814] ALSA device list:
->
-> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.970471]   No soundcards found.
->
-> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.975005] Freeing unused kernel memory: 10112K
->
-> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.975442] Run init as init process
->
->
->
-> Another information is that if "maxcpus=3" is removed from the kernel
->
-> command line,
->
-> it will be OK.
->
->
->
-> I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
-> anyone
->
-> know this issue or can take a look at it.
->
->
-thanks
->
--- PMM
->
-.
-
-Hi Zhou,
-On 9/10/24 03:24, Zhou Wang via wrote:
->
-On 2024/9/9 22:31, Peter Maydell wrote:
->
-> On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->> Hi All,
->
->>
->
->> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
->> during kernel booting up.
->
-> Does it still do this if you either:
->
->  (1) use the v9.1.0 release (commit fd1952d814da)
->
->  (2) use "-machine virt-9.1" instead of "-machine virt"
->
-I tested above two cases, the problem is still there.
-Thank you for reporting. I am able to reproduce and effectively the
-maxcpus kernel option is triggering the issue. It works without. I will
-come back to you asap.
-
-Eric
->
->
-> ?
->
->
->
-> My suspicion is that this will have started happening now that
->
-> we expose an SMMU with two-stage translation support to the guest
->
-> in the "virt" machine type (which we do not if you either
->
-> use virt-9.1 or in the v9.1.0 release).
->
->
->
-> I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
->
-> the two-stage support).
->
->
->
->> qemu command which I use is as below:
->
->>
->
->> qemu-system-aarch64 -machine
->
->> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
->> -kernel Image -initrd minifs.cpio.gz \
->
->> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
->> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
->> -device
->
->> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
->>  \
->
->> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
->> -device
->
->> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
->> -drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->>
->
->> smmuv3 event 0x10 log:
->
->> [...]
->
->> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
->> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
->> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
->> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
->> (1.07 GB/1.00 GiB)
->
->> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
->> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.968381] clk: Disabling unused clocks
->
->> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.968990] PM: genpd: Disabling unused power domains
->
->> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.969814] ALSA device list:
->
->> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.970471]   No soundcards found.
->
->> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975005] Freeing unused kernel memory: 10112K
->
->> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975442] Run init as init process
->
->>
->
->> Another information is that if "maxcpus=3" is removed from the kernel
->
->> command line,
->
->> it will be OK.
->
->>
->
->> I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
->> anyone
->
->> know this issue or can take a look at it.
->
-> thanks
->
-> -- PMM
->
-> .
-
-Hi,
-
-On 9/10/24 03:24, Zhou Wang via wrote:
->
-On 2024/9/9 22:31, Peter Maydell wrote:
->
-> On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->> Hi All,
->
->>
->
->> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
->> during kernel booting up.
->
-> Does it still do this if you either:
->
->  (1) use the v9.1.0 release (commit fd1952d814da)
->
->  (2) use "-machine virt-9.1" instead of "-machine virt"
->
-I tested above two cases, the problem is still there.
-I have not much progressed yet but I see it comes with
-qemu traces.
-
-smmuv3-iommu-memory-region-0-0 translation failed for iova=0x0
-(SMMU_EVT_F_TRANSLATION)
-../..
-qemu-system-aarch64: virtio-blk failed to set guest notifier (-22),
-ensure -accel kvm is set.
-qemu-system-aarch64: virtio_bus_start_ioeventfd: failed. Fallback to
-userspace (slower).
-
-the PCIe Host bridge seems to cause that translation failure at iova=0
-
-Also virtio-iommu has the same issue:
-qemu-system-aarch64: virtio_iommu_translate no mapping for 0x0 for sid=1024
-qemu-system-aarch64: virtio-blk failed to set guest notifier (-22),
-ensure -accel kvm is set.
-qemu-system-aarch64: virtio_bus_start_ioeventfd: failed. Fallback to
-userspace (slower).
-
-Only happens with maxcpus=3. Note the virtio-blk-pci is not protected by
-the vIOMMU in your case.
-
-Thanks
-
-Eric
-
->
->
-> ?
->
->
->
-> My suspicion is that this will have started happening now that
->
-> we expose an SMMU with two-stage translation support to the guest
->
-> in the "virt" machine type (which we do not if you either
->
-> use virt-9.1 or in the v9.1.0 release).
->
->
->
-> I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of
->
-> the two-stage support).
->
->
->
->> qemu command which I use is as below:
->
->>
->
->> qemu-system-aarch64 -machine
->
->> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
->> -kernel Image -initrd minifs.cpio.gz \
->
->> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
->> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
->> -device
->
->> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
->>  \
->
->> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
->> -device
->
->> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
->> -drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->>
->
->> smmuv3 event 0x10 log:
->
->> [...]
->
->> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
->> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
->> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
->> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
->> (1.07 GB/1.00 GiB)
->
->> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
->> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.968381] clk: Disabling unused clocks
->
->> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.968990] PM: genpd: Disabling unused power domains
->
->> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.969814] ALSA device list:
->
->> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.970471]   No soundcards found.
->
->> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975005] Freeing unused kernel memory: 10112K
->
->> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975442] Run init as init process
->
->>
->
->> Another information is that if "maxcpus=3" is removed from the kernel
->
->> command line,
->
->> it will be OK.
->
->>
->
->> I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
->> anyone
->
->> know this issue or can take a look at it.
->
-> thanks
->
-> -- PMM
->
-> .
-
-Hi Zhou,
-
-On Mon, Sep 9, 2024 at 3:22 PM Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->
-Hi All,
->
->
-When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
-during kernel booting up.
->
->
-qemu command which I use is as below:
->
->
-qemu-system-aarch64 -machine
->
-virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
--kernel Image -initrd minifs.cpio.gz \
->
--enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
--append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
--device
->
-pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
-\
->
--device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
--device
->
-virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
--drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->
-smmuv3 event 0x10 log:
->
-[...]
->
-[    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
-[    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
-[    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
-[    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
-(1.07 GB/1.00 GiB)
->
-[    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
-[    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.968381] clk: Disabling unused clocks
->
-[    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.968990] PM: genpd: Disabling unused power domains
->
-[    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.969814] ALSA device list:
->
-[    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.970471]   No soundcards found.
->
-[    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-[    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-[    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-[    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.975005] Freeing unused kernel memory: 10112K
->
-[    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-[    1.975442] Run init as init process
->
->
-Another information is that if "maxcpus=3" is removed from the kernel command
->
-line,
->
-it will be OK.
->
-That's interesting, not sure how that would be related.
-
->
-I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
-anyone
->
-know this issue or can take a look at it.
->
-Can you please provide logs with adding "-d trace:smmu*" to qemu invocation.
-
-Also if possible, can you please provide which Linux kernel version
-you are using, I will see if I can repro.
-
-Thanks,
-Mostafa
-
->
-Thanks,
->
-Zhou
->
->
->
-
-On 2024/9/9 22:47, Mostafa Saleh wrote:
->
-Hi Zhou,
->
->
-On Mon, Sep 9, 2024 at 3:22 PM Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->
->
-> Hi All,
->
->
->
-> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10
->
-> during kernel booting up.
->
->
->
-> qemu command which I use is as below:
->
->
->
-> qemu-system-aarch64 -machine
->
-> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
-> -kernel Image -initrd minifs.cpio.gz \
->
-> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
-> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
-> -device
->
-> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
->  \
->
-> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \
->
-> -device
->
-> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
-> -drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->
->
-> smmuv3 event 0x10 log:
->
-> [...]
->
-> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
-> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
-> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
-> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
-> (1.07 GB/1.00 GiB)
->
-> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
-> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.968381] clk: Disabling unused clocks
->
-> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.968990] PM: genpd: Disabling unused power domains
->
-> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.969814] ALSA device list:
->
-> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.970471]   No soundcards found.
->
-> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
-> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
-> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
-> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.975005] Freeing unused kernel memory: 10112K
->
-> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
-> [    1.975442] Run init as init process
->
->
->
-> Another information is that if "maxcpus=3" is removed from the kernel
->
-> command line,
->
-> it will be OK.
->
->
->
->
-That's interesting, not sure how that would be related.
->
->
-> I am not sure if there is a bug about vsmmu. It will be very appreciated if
->
-> anyone
->
-> know this issue or can take a look at it.
->
->
->
->
-Can you please provide logs with adding "-d trace:smmu*" to qemu invocation.
-Sure. Please see the attached log(using above qemu commit and command).
-
->
->
-Also if possible, can you please provide which Linux kernel version
->
-you are using, I will see if I can repro.
-I just use the latest mainline kernel(commit b831f83e40a2) with defconfig.
-
-Thanks,
-Zhou
-
->
->
-Thanks,
->
-Mostafa
->
->
-> Thanks,
->
-> Zhou
->
->
->
->
->
->
->
->
-.
-qemu_boot_log.txt
-Description:
-Text document
-
-On Tue, Sep 10, 2024 at 2:51 AM Zhou Wang <wangzhou1@hisilicon.com> wrote:
->
->
-On 2024/9/9 22:47, Mostafa Saleh wrote:
->
-> Hi Zhou,
->
->
->
-> On Mon, Sep 9, 2024 at 3:22 PM Zhou Wang via <qemu-devel@nongnu.org> wrote:
->
->>
->
->> Hi All,
->
->>
->
->> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event
->
->> 0x10
->
->> during kernel booting up.
->
->>
->
->> qemu command which I use is as below:
->
->>
->
->> qemu-system-aarch64 -machine
->
->> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \
->
->> -kernel Image -initrd minifs.cpio.gz \
->
->> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \
->
->> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \
->
->> -device
->
->> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2
->
->>  \
->
->> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1
->
->> \
->
->> -device
->
->> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \
->
->> -drive file=/home/boot.img,if=none,id=drive0,format=raw
->
->>
->
->> smmuv3 event 0x10 log:
->
->> [...]
->
->> [    1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0
->
->> [    1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002)
->
->> [    1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues
->
->> [    1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks
->
->> (1.07 GB/1.00 GiB)
->
->> [    1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0
->
->> [    1.967478] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.968381] clk: Disabling unused clocks
->
->> [    1.968677] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.968990] PM: genpd: Disabling unused power domains
->
->> [    1.969424] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.969814] ALSA device list:
->
->> [    1.970240] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.970471]   No soundcards found.
->
->> [    1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971600] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.971601] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971602] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
->
->> [    1.971607] arm-smmu-v3 9050000.smmuv3:      0x0000020000000010
->
->> [    1.974202] arm-smmu-v3 9050000.smmuv3:      0x0000020000000000
->
->> [    1.974634] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975005] Freeing unused kernel memory: 10112K
->
->> [    1.975062] arm-smmu-v3 9050000.smmuv3:      0x0000000000000000
->
->> [    1.975442] Run init as init process
->
->>
->
->> Another information is that if "maxcpus=3" is removed from the kernel
->
->> command line,
->
->> it will be OK.
->
->>
->
->
->
-> That's interesting, not sure how that would be related.
->
->
->
->> I am not sure if there is a bug about vsmmu. It will be very appreciated
->
->> if anyone
->
->> know this issue or can take a look at it.
->
->>
->
->
->
-> Can you please provide logs with adding "-d trace:smmu*" to qemu invocation.
->
->
-Sure. Please see the attached log(using above qemu commit and command).
->
-Thanks a lot, it seems the SMMUv3 indeed receives a translation
-request with addr 0x0 which causes this event.
-I don't see any kind of modification (alignment) of the address in this path.
-So my hunch it's not related to the SMMUv3 and the initiator is
-issuing bogus addresses.
-
->
->
->
-> Also if possible, can you please provide which Linux kernel version
->
-> you are using, I will see if I can repro.
->
->
-I just use the latest mainline kernel(commit b831f83e40a2) with defconfig.
->
-I see, I can't repro in my setup which has no "--enable-kvm" and with
-"-cpu max" instead of host.
-I will try other options and see if I can repro.
-
-Thanks,
-Mostafa
->
-Thanks,
->
-Zhou
->
->
->
->
-> Thanks,
->
-> Mostafa
->
->
->
->> Thanks,
->
->> Zhou
->
->>
->
->>
->
->>
->
->
->
-> .
-
diff --git a/results/classifier/008/other/70868267 b/results/classifier/008/other/70868267
deleted file mode 100644
index d6c291f50..000000000
--- a/results/classifier/008/other/70868267
+++ /dev/null
@@ -1,50 +0,0 @@
-graphic: 0.706
-device: 0.643
-semantic: 0.635
-files: 0.552
-performance: 0.525
-debug: 0.521
-PID: 0.420
-socket: 0.418
-network: 0.411
-permissions: 0.265
-other: 0.236
-vnc: 0.227
-boot: 0.197
-KVM: 0.167
-
-[Qemu-devel] [BUG] Failed to compile using gcc7.1
-
-Hi all,
-
-After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc.
-
-The error is:
-
-------
-  CC      block/blkdebug.o
-block/blkdebug.c: In function 'blkdebug_refresh_filename':
-block/blkdebug.c:693:31: error: '%s' directive output may be truncated
-writing up to 4095 bytes into a region of size 4086
-[-Werror=format-truncation=]
-"blkdebug:%s:%s", s->config_file ?: "",
-                               ^~
-In file included from /usr/include/stdio.h:939:0,
-                 from /home/adam/qemu/include/qemu/osdep.h:68,
-                 from block/blkdebug.c:25:
-/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk'
-output 11 or more bytes (assuming 4106) into a destination of size 4096
-return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
-          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-        __bos (__s), __fmt, __va_arg_pack ());
-        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-cc1: all warnings being treated as errors
-make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1
-------
-
-It seems that gcc 7 is introducing more restrict check for printf.
-If using clang, although there are some extra warning, it can at least
-pass the compile.
-Thanks,
-Qu
-
diff --git a/results/classifier/008/other/71456293 b/results/classifier/008/other/71456293
deleted file mode 100644
index 17efa0d19..000000000
--- a/results/classifier/008/other/71456293
+++ /dev/null
@@ -1,1496 +0,0 @@
-KVM: 0.691
-vnc: 0.625
-debug: 0.620
-PID: 0.614
-permissions: 0.613
-graphic: 0.603
-device: 0.601
-semantic: 0.600
-other: 0.598
-boot: 0.598
-socket: 0.596
-performance: 0.594
-files: 0.592
-network: 0.491
-
-[Qemu-devel][bug] qemu crash when migrate vm and vm's disks
-
-When migrate vm and vm’s disks target host qemu crash due to an invalid free.
-#0  object_unref (obj=0x1000) at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920
-#1  0x0000560434d79e79 in memory_region_unref (mr=<optimized out>)
-at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730
-#2  flatview_destroy (view=0x560439653880) at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292
-#3  0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>)
-at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284
-#4  0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0
-#5  0x00007fbc2b099bad in clone () from /lib64/libc.so.6
-test base qemu-2.12.0
-,
-but use lastest qemu(v6.0.0-rc2) also reproduce.
-As follow patch can resolve this problem:
-https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html
-Steps to reproduce:
-(1) Create VM (virsh define)
-(2) Add 64 virtio scsi disks
-(3) migrate vm and vm’disks
--------------------------------------------------------------------------------------------------------------------------------------
-本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出
-的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
-或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
-邮件!
-This e-mail and its attachments contain confidential information from New H3C, which is
-intended only for the person or entity whose address is listed above. Any use of the
-information contained herein in any way (including, but not limited to, total or partial
-disclosure, reproduction, or dissemination) by persons other than the intended
-recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
-by phone or email immediately and delete it!
-
-* Yuchen (yu.chen@h3c.com) wrote:
->
-When migrate vm and vm’s disks target host qemu crash due to an invalid free.
->
->
-#0  object_unref (obj=0x1000) at
->
-/qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920
->
-#1  0x0000560434d79e79 in memory_region_unref (mr=<optimized out>)
->
-at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730
->
-#2  flatview_destroy (view=0x560439653880) at
->
-/qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292
->
-#3  0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>)
->
-at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284
->
-#4  0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0
->
-#5  0x00007fbc2b099bad in clone () from /lib64/libc.so.6
->
->
-test base qemu-2.12.0,but use lastest qemu(v6.0.0-rc2) also reproduce.
-Interesting.
-
->
-As follow patch can resolve this problem:
->
-https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html
-That's a pci/rcu change; ccing Paolo and Micahel.
-
->
-Steps to reproduce:
->
-(1) Create VM (virsh define)
->
-(2) Add 64 virtio scsi disks
-Is that hot adding the disks later, or are they included in the VM at
-creation?
-Can you provide a libvirt XML example?
-
->
-(3) migrate vm and vm’disks
-What do you mean by 'and vm disks' - are you doing a block migration?
-
-Dave
-
->
--------------------------------------------------------------------------------------------------------------------------------------
->
-本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出
->
-的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
->
-或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
->
-邮件!
->
-This e-mail and its attachments contain confidential information from New
->
-H3C, which is
->
-intended only for the person or entity whose address is listed above. Any use
->
-of the
->
-information contained herein in any way (including, but not limited to, total
->
-or partial
->
-disclosure, reproduction, or dissemination) by persons other than the intended
->
-recipient(s) is prohibited. If you receive this e-mail in error, please
->
-notify the sender
->
-by phone or email immediately and delete it!
--- 
-Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
-
->
------邮件原件-----
->
-发件人: Dr. David Alan Gilbert [
-mailto:dgilbert@redhat.com
-]
->
-发送时间: 2021年4月8日 19:27
->
-收件人: yuchen (Cloud) <yu.chen@h3c.com>; pbonzini@redhat.com;
->
-mst@redhat.com
->
-抄送: qemu-devel@nongnu.org
->
-主题: Re: [Qemu-devel][bug] qemu crash when migrate vm and vm's disks
->
->
-* Yuchen (yu.chen@h3c.com) wrote:
->
-> When migrate vm and vm’s disks target host qemu crash due to an invalid
->
-free.
->
->
->
-> #0  object_unref (obj=0x1000) at
->
-> /qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920
->
-> #1  0x0000560434d79e79 in memory_region_unref (mr=<optimized out>)
->
->     at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730
->
-> #2  flatview_destroy (view=0x560439653880) at
->
-> /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292
->
-> #3  0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>)
->
->     at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284
->
-> #4  0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0
->
-> #5  0x00007fbc2b099bad in clone () from /lib64/libc.so.6
->
->
->
-> test base qemu-2.12.0,but use lastest qemu(v6.0.0-rc2) also reproduce.
->
->
-Interesting.
->
->
-> As follow patch can resolve this problem:
->
->
-https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html
->
->
-That's a pci/rcu change; ccing Paolo and Micahel.
->
->
-> Steps to reproduce:
->
-> (1) Create VM (virsh define)
->
-> (2) Add 64 virtio scsi disks
->
->
-Is that hot adding the disks later, or are they included in the VM at
->
-creation?
->
-Can you provide a libvirt XML example?
->
-Include disks in the VM at creation
-
-vm disks xml (only virtio scsi disks):
-  <devices>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native'/>
-      <source file='/vms/tempp/vm-os'/>
-      <target dev='vda' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data1'/>
-      <target dev='sda' bus='scsi'/>
-      <address type='drive' controller='2' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data2'/>
-      <target dev='sdb' bus='scsi'/>
-      <address type='drive' controller='3' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data3'/>
-      <target dev='sdc' bus='scsi'/>
-      <address type='drive' controller='4' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data4'/>
-      <target dev='sdd' bus='scsi'/>
-      <address type='drive' controller='5' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data5'/>
-      <target dev='sde' bus='scsi'/>
-      <address type='drive' controller='6' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data6'/>
-      <target dev='sdf' bus='scsi'/>
-      <address type='drive' controller='7' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data7'/>
-      <target dev='sdg' bus='scsi'/>
-      <address type='drive' controller='8' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data8'/>
-      <target dev='sdh' bus='scsi'/>
-      <address type='drive' controller='9' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data9'/>
-      <target dev='sdi' bus='scsi'/>
-      <address type='drive' controller='10' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data10'/>
-      <target dev='sdj' bus='scsi'/>
-      <address type='drive' controller='11' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data11'/>
-      <target dev='sdk' bus='scsi'/>
-      <address type='drive' controller='12' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data12'/>
-      <target dev='sdl' bus='scsi'/>
-      <address type='drive' controller='13' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data13'/>
-      <target dev='sdm' bus='scsi'/>
-      <address type='drive' controller='14' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data14'/>
-      <target dev='sdn' bus='scsi'/>
-      <address type='drive' controller='15' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data15'/>
-      <target dev='sdo' bus='scsi'/>
-      <address type='drive' controller='16' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data16'/>
-      <target dev='sdp' bus='scsi'/>
-      <address type='drive' controller='17' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data17'/>
-      <target dev='sdq' bus='scsi'/>
-      <address type='drive' controller='18' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data18'/>
-      <target dev='sdr' bus='scsi'/>
-      <address type='drive' controller='19' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data19'/>
-      <target dev='sds' bus='scsi'/>
-      <address type='drive' controller='20' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data20'/>
-      <target dev='sdt' bus='scsi'/>
-      <address type='drive' controller='21' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data21'/>
-      <target dev='sdu' bus='scsi'/>
-      <address type='drive' controller='22' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data22'/>
-      <target dev='sdv' bus='scsi'/>
-      <address type='drive' controller='23' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data23'/>
-      <target dev='sdw' bus='scsi'/>
-      <address type='drive' controller='24' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data24'/>
-      <target dev='sdx' bus='scsi'/>
-      <address type='drive' controller='25' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data25'/>
-      <target dev='sdy' bus='scsi'/>
-      <address type='drive' controller='26' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data26'/>
-      <target dev='sdz' bus='scsi'/>
-      <address type='drive' controller='27' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data27'/>
-      <target dev='sdaa' bus='scsi'/>
-      <address type='drive' controller='28' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data28'/>
-      <target dev='sdab' bus='scsi'/>
-      <address type='drive' controller='29' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data29'/>
-      <target dev='sdac' bus='scsi'/>
-      <address type='drive' controller='30' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data30'/>
-      <target dev='sdad' bus='scsi'/>
-      <address type='drive' controller='31' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data31'/>
-      <target dev='sdae' bus='scsi'/>
-      <address type='drive' controller='32' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data32'/>
-      <target dev='sdaf' bus='scsi'/>
-      <address type='drive' controller='33' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data33'/>
-      <target dev='sdag' bus='scsi'/>
-      <address type='drive' controller='34' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data34'/>
-      <target dev='sdah' bus='scsi'/>
-      <address type='drive' controller='35' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data35'/>
-      <target dev='sdai' bus='scsi'/>
-      <address type='drive' controller='36' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data36'/>
-      <target dev='sdaj' bus='scsi'/>
-      <address type='drive' controller='37' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data37'/>
-      <target dev='sdak' bus='scsi'/>
-      <address type='drive' controller='38' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data38'/>
-      <target dev='sdal' bus='scsi'/>
-      <address type='drive' controller='39' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data39'/>
-      <target dev='sdam' bus='scsi'/>
-      <address type='drive' controller='40' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data40'/>
-      <target dev='sdan' bus='scsi'/>
-      <address type='drive' controller='41' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data41'/>
-      <target dev='sdao' bus='scsi'/>
-      <address type='drive' controller='42' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data42'/>
-      <target dev='sdap' bus='scsi'/>
-      <address type='drive' controller='43' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data43'/>
-      <target dev='sdaq' bus='scsi'/>
-      <address type='drive' controller='44' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data44'/>
-      <target dev='sdar' bus='scsi'/>
-      <address type='drive' controller='45' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data45'/>
-      <target dev='sdas' bus='scsi'/>
-      <address type='drive' controller='46' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data46'/>
-      <target dev='sdat' bus='scsi'/>
-      <address type='drive' controller='47' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data47'/>
-      <target dev='sdau' bus='scsi'/>
-      <address type='drive' controller='48' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data48'/>
-      <target dev='sdav' bus='scsi'/>
-      <address type='drive' controller='49' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data49'/>
-      <target dev='sdaw' bus='scsi'/>
-      <address type='drive' controller='50' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data50'/>
-      <target dev='sdax' bus='scsi'/>
-      <address type='drive' controller='51' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data51'/>
-      <target dev='sday' bus='scsi'/>
-      <address type='drive' controller='52' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data52'/>
-      <target dev='sdaz' bus='scsi'/>
-      <address type='drive' controller='53' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data53'/>
-      <target dev='sdba' bus='scsi'/>
-      <address type='drive' controller='54' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data54'/>
-      <target dev='sdbb' bus='scsi'/>
-      <address type='drive' controller='55' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data55'/>
-      <target dev='sdbc' bus='scsi'/>
-      <address type='drive' controller='56' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data56'/>
-      <target dev='sdbd' bus='scsi'/>
-      <address type='drive' controller='57' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data57'/>
-      <target dev='sdbe' bus='scsi'/>
-      <address type='drive' controller='58' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data58'/>
-      <target dev='sdbf' bus='scsi'/>
-      <address type='drive' controller='59' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data59'/>
-      <target dev='sdbg' bus='scsi'/>
-      <address type='drive' controller='60' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data60'/>
-      <target dev='sdbh' bus='scsi'/>
-      <address type='drive' controller='61' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data61'/>
-      <target dev='sdbi' bus='scsi'/>
-      <address type='drive' controller='62' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data62'/>
-      <target dev='sdbj' bus='scsi'/>
-      <address type='drive' controller='63' bus='0' target='0' unit='0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data63'/>
-      <target dev='sdbk' bus='scsi'/>
-      <address type='drive' controller='64' bus='0' target='0' unit='0'/>
-    </disk>
-    <controller type='scsi' index='0'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x02' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='1' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='2' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='3' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x03' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='4' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x04' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='5' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x05' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='6' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x06' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='7' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x07' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='8' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x08' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='9' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x09' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='10' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0a' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='11' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0b' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='12' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0c' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='13' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0d' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='14' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0e' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='15' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0f' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='16' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x10' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='17' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x11' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='18' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x12' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='19' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x13' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='20' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x14' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='21' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x15' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='22' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x16' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='23' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x17' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='24' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x18' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='25' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x19' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='26' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1a' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='27' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1b' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='28' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1c' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='29' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1d' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='30' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1e' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='31' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='32' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='33' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='34' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x04' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='35' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x05' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='36' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x06' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='37' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='38' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x08' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='39' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x09' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='40' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0a' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='41' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0b' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='42' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0c' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='43' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0d' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='44' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='45' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='46' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='47' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='48' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='49' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0e' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='50' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0f' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='51' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x10' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='52' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x11' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='53' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x12' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='54' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x13' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='55' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x14' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='56' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x15' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='57' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x16' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='58' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x17' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='59' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x18' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='60' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x19' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='61' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1a' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='62' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='63' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' 
-function='0x0'/>
-    </controller>
-    <controller type='scsi' index='64' model='virtio-scsi'>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' 
-function='0x0'/>
-    </controller>
-    <controller type='pci' index='0' model='pci-root'/>
-    <controller type='pci' index='1' model='pci-bridge'>
-      <model name='pci-bridge'/>
-      <target chassisNr='1'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' 
-function='0x0'/>
-    </controller>
-    <controller type='pci' index='2' model='pci-bridge'>
-      <model name='pci-bridge'/>
-      <target chassisNr='2'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' 
-function='0x0'/>
-    </controller>
-  </devices>
-
-vm disks xml (only virtio disks):
-  <devices>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native'/>
-      <source file='/vms/tempp/vm-os'/>
-      <target dev='vda' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data2'/>
-      <target dev='vdb' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data3'/>
-      <target dev='vdc' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data4'/>
-      <target dev='vdd' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data5'/>
-      <target dev='vde' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data6'/>
-      <target dev='vdf' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data7'/>
-      <target dev='vdg' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0e' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data8'/>
-      <target dev='vdh' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x0f' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data9'/>
-      <target dev='vdi' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x10' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data10'/>
-      <target dev='vdj' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x11' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data11'/>
-      <target dev='vdk' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x12' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data12'/>
-      <target dev='vdl' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x13' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data13'/>
-      <target dev='vdm' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x14' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data14'/>
-      <target dev='vdn' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x15' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data15'/>
-      <target dev='vdo' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x16' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data16'/>
-      <target dev='vdp' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x17' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data17'/>
-      <target dev='vdq' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x18' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data18'/>
-      <target dev='vdr' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x19' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data19'/>
-      <target dev='vds' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1a' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data20'/>
-      <target dev='vdt' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data21'/>
-      <target dev='vdu' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data22'/>
-      <target dev='vdv' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data23'/>
-      <target dev='vdw' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data24'/>
-      <target dev='vdx' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data25'/>
-      <target dev='vdy' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x03' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data26'/>
-      <target dev='vdz' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x04' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data27'/>
-      <target dev='vdaa' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x05' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data28'/>
-      <target dev='vdab' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x06' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data29'/>
-      <target dev='vdac' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x07' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data30'/>
-      <target dev='vdad' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x08' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data31'/>
-      <target dev='vdae' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x09' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data32'/>
-      <target dev='vdaf' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0a' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data33'/>
-      <target dev='vdag' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0b' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data34'/>
-      <target dev='vdah' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0c' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data35'/>
-      <target dev='vdai' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0d' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data36'/>
-      <target dev='vdaj' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0e' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data37'/>
-      <target dev='vdak' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x0f' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data38'/>
-      <target dev='vdal' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x10' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data39'/>
-      <target dev='vdam' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x11' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data40'/>
-      <target dev='vdan' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x12' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data41'/>
-      <target dev='vdao' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x13' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data42'/>
-      <target dev='vdap' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x14' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data43'/>
-      <target dev='vdaq' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x15' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data44'/>
-      <target dev='vdar' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x16' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data45'/>
-      <target dev='vdas' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x17' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data46'/>
-      <target dev='vdat' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x18' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data47'/>
-      <target dev='vdau' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x19' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data48'/>
-      <target dev='vdav' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1a' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data49'/>
-      <target dev='vdaw' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1b' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data50'/>
-      <target dev='vdax' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1c' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data51'/>
-      <target dev='vday' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1d' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data52'/>
-      <target dev='vdaz' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1e' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data53'/>
-      <target dev='vdba' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data54'/>
-      <target dev='vdbb' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data55'/>
-      <target dev='vdbc' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data56'/>
-      <target dev='vdbd' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x04' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data57'/>
-      <target dev='vdbe' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x05' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data58'/>
-      <target dev='vdbf' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x06' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data59'/>
-      <target dev='vdbg' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data60'/>
-      <target dev='vdbh' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x08' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data61'/>
-      <target dev='vdbi' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x09' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data62'/>
-      <target dev='vdbj' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0a' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data63'/>
-      <target dev='vdbk' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x02' slot='0x0b' 
-function='0x0'/>
-    </disk>
-    <disk type='file' device='disk'>
-      <driver name='qemu' type='qcow2' cache='directsync' io='native' 
-discard='unmap'/>
-      <source file='/vms/tempp/vm-data1'/>
-      <target dev='vdbl' bus='virtio'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
-function='0x0'/>
-    </disk>
-    <controller type='pci' index='0' model='pci-root'/>
-    <controller type='pci' index='1' model='pci-bridge'>
-      <model name='pci-bridge'/>
-      <target chassisNr='1'/>
-      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' 
-function='0x0'/>
-    </controller>
-    <controller type='pci' index='2' model='pci-bridge'>
-      <model name='pci-bridge'/>
-      <target chassisNr='2'/>
-      <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' 
-function='0x0'/>
-    </controller>
-  </devices>
-
->
-> (3) migrate vm and vm’disks
->
->
-What do you mean by 'and vm disks' - are you doing a block migration?
->
-Yes, block migration.
-In fact, only migration domain also reproduced.
-
->
-Dave
->
->
-> ----------------------------------------------------------------------
->
-> ---------------------------------------------------------------
->
-Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--------------------------------------------------------------------------------------------------------------------------------------
-本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出
-的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
-或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
-邮件!
-This e-mail and its attachments contain confidential information from New H3C, 
-which is
-intended only for the person or entity whose address is listed above. Any use 
-of the
-information contained herein in any way (including, but not limited to, total 
-or partial
-disclosure, reproduction, or dissemination) by persons other than the intended
-recipient(s) is prohibited. If you receive this e-mail in error, please notify 
-the sender
-by phone or email immediately and delete it!
-
diff --git a/results/classifier/008/other/73660729 b/results/classifier/008/other/73660729
deleted file mode 100644
index 28b0bbc21..000000000
--- a/results/classifier/008/other/73660729
+++ /dev/null
@@ -1,41 +0,0 @@
-graphic: 0.858
-performance: 0.786
-semantic: 0.698
-device: 0.697
-files: 0.640
-other: 0.620
-network: 0.598
-debug: 0.580
-socket: 0.556
-PID: 0.549
-vnc: 0.467
-permissions: 0.403
-boot: 0.367
-KVM: 0.272
-
-[BUG]The latest qemu crashed when I tested cxl
-
-I test cxl with the patch:[v11,0/2] arm/virt:
- CXL support via pxb_cxl.
-https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/
-But the qemu crashed,and showing an error:
-qemu-system-aarch64: ../hw/arm/virt.c:1735: virt_get_high_memmap_enabled:
- Assertion `ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST == ARRAY_SIZE(enabled_array)' failed.
-Then I modify the patch to fix the bug:
-diff --git a/hw/arm/virt.c b/hw/arm/virt.c
-index ea2413a0ba..3d4cee3491 100644
---- a/hw/arm/virt.c
-+++ b/hw/arm/virt.c
-@@ -1710,6 +1730,7 @@ static inline bool *virt_get_high_memmap_enabled(VirtMachineState
- *vms,
-&vms->highmem_redists,
-&vms->highmem_ecam,
-&vms->highmem_mmio,
-+ &vms->cxl_devices_state.is_enabled,
-};
-Now qemu works good.
-Could you tell me when the patch(
-arm/virt:
- CXL support via pxb_cxl
-) will be merged into upstream?
-
diff --git a/results/classifier/008/other/74466963 b/results/classifier/008/other/74466963
deleted file mode 100644
index 55d41733b..000000000
--- a/results/classifier/008/other/74466963
+++ /dev/null
@@ -1,1888 +0,0 @@
-device: 0.909
-permissions: 0.907
-KVM: 0.903
-debug: 0.897
-files: 0.896
-graphic: 0.895
-boot: 0.894
-performance: 0.892
-semantic: 0.891
-PID: 0.886
-socket: 0.879
-vnc: 0.878
-other: 0.877
-network: 0.871
-
-[Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration
-
-Hi all,
-
-Does anyboday remember the similar issue post by hailiang months ago
-http://patchwork.ozlabs.org/patch/454322/
-At least tow bugs about migration had been fixed since that.
-And now we found the same issue at the tcg vm(kvm is fine), after
-migration, the content VM's memory is inconsistent.
-we add a patch to check memory content, you can find it from affix
-
-steps to reporduce:
-1) apply the patch and re-build qemu
-2) prepare the ubuntu guest and run memtest in grub.
-soruce side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off
-destination side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
-3) start migration
-with 1000M NIC, migration will finish within 3 min.
-
-at source:
-(qemu) migrate tcp:192.168.2.66:8881
-after saving ram complete
-e9e725df678d392b1a83b3a917f332bb
-qemu-system-x86_64: end ram md5
-(qemu)
-
-at destination:
-...skip...
-Completed load of VM with exit code 0 seq iteration 1264
-Completed load of VM with exit code 0 seq iteration 1265
-Completed load of VM with exit code 0 seq iteration 1266
-qemu-system-x86_64: after loading state section id 2(ram)
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
-
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-
-This occurs occasionally and only at tcg machine. It seems that
-some pages dirtied in source side don't transferred to destination.
-This problem can be reproduced even if we disable virtio.
-Is it OK for some pages that not transferred to destination when do
-migration ? Or is it a bug?
-Any idea...
-
-=================md5 check patch=============================
-
-diff --git a/Makefile.target b/Makefile.target
-index 962d004..e2cb8e9 100644
---- a/Makefile.target
-+++ b/Makefile.target
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
- obj-y += memory_mapping.o
- obj-y += dump.o
- obj-y += migration/ram.o migration/savevm.o
--LIBS := $(libs_softmmu) $(LIBS)
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
-
- # xen support
- obj-$(CONFIG_XEN) += xen-common.o
-diff --git a/migration/ram.c b/migration/ram.c
-index 1eb155a..3b7a09d 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
-version_id)
-}
-
-     rcu_read_unlock();
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
-+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
-             "%" PRIu64 "\n", ret, seq_iter);
-     return ret;
- }
-diff --git a/migration/savevm.c b/migration/savevm.c
-index 0ad1b93..3feaa61 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
-
- }
-
-+#include "exec/ram_addr.h"
-+#include "qemu/rcu_queue.h"
-+#include <clplumbing/md5.h>
-+#ifndef MD5_DIGEST_LENGTH
-+#define MD5_DIGEST_LENGTH 16
-+#endif
-+
-+static void check_host_md5(void)
-+{
-+    int i;
-+    unsigned char md[MD5_DIGEST_LENGTH];
-+    rcu_read_lock();
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
-'pc.ram' block */
-+    rcu_read_unlock();
-+
-+    MD5(block->host, block->used_length, md);
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
-+        fprintf(stderr, "%02x", md[i]);
-+    }
-+    fprintf(stderr, "\n");
-+    error_report("end ram md5");
-+}
-+
- void qemu_savevm_state_begin(QEMUFile *f,
-                              const MigrationParams *params)
- {
-@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile
-*f, bool iterable_only)
-save_section_header(f, se, QEMU_VM_SECTION_END);
-
-         ret = se->ops->save_live_complete_precopy(f, se->opaque);
-+
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
-+        check_host_md5();
-+
-         trace_savevm_section_end(se->idstr, se->section_id, ret);
-         save_section_footer(f, se);
-         if (ret < 0) {
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
-MigrationIncomingState *mis)
-section_id, le->se->idstr);
-                 return ret;
-             }
-+            if (section_type == QEMU_VM_SECTION_END) {
-+                error_report("after loading state section id %d(%s)",
-+                             section_id, le->se->idstr);
-+                check_host_md5();
-+            }
-             if (!check_section_footer(f, le)) {
-                 return -EINVAL;
-             }
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
-     }
-
-     cpu_synchronize_all_post_init();
-+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
-+    check_host_md5();
-
-     return ret;
- }
-
-* Li Zhijian (address@hidden) wrote:
->
-Hi all,
->
->
-Does anyboday remember the similar issue post by hailiang months ago
->
-http://patchwork.ozlabs.org/patch/454322/
->
-At least tow bugs about migration had been fixed since that.
-Yes, I wondered what happened to that.
-
->
-And now we found the same issue at the tcg vm(kvm is fine), after migration,
->
-the content VM's memory is inconsistent.
-Hmm, TCG only - I don't know much about that; but I guess something must
-be accessing memory without using the proper macros/functions so
-it doesn't mark it as dirty.
-
->
-we add a patch to check memory content, you can find it from affix
->
->
-steps to reporduce:
->
-1) apply the patch and re-build qemu
->
-2) prepare the ubuntu guest and run memtest in grub.
->
-soruce side:
->
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
-pc-i440fx-2.3,accel=tcg,usb=off
->
->
-destination side:
->
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
->
->
-3) start migration
->
-with 1000M NIC, migration will finish within 3 min.
->
->
-at source:
->
-(qemu) migrate tcp:192.168.2.66:8881
->
-after saving ram complete
->
-e9e725df678d392b1a83b3a917f332bb
->
-qemu-system-x86_64: end ram md5
->
-(qemu)
->
->
-at destination:
->
-...skip...
->
-Completed load of VM with exit code 0 seq iteration 1264
->
-Completed load of VM with exit code 0 seq iteration 1265
->
-Completed load of VM with exit code 0 seq iteration 1266
->
-qemu-system-x86_64: after loading state section id 2(ram)
->
-49c2dac7bde0e5e22db7280dcb3824f9
->
-qemu-system-x86_64: end ram md5
->
-qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
->
->
-49c2dac7bde0e5e22db7280dcb3824f9
->
-qemu-system-x86_64: end ram md5
->
->
-This occurs occasionally and only at tcg machine. It seems that
->
-some pages dirtied in source side don't transferred to destination.
->
-This problem can be reproduced even if we disable virtio.
->
->
-Is it OK for some pages that not transferred to destination when do
->
-migration ? Or is it a bug?
-I'm pretty sure that means it's a bug.  Hard to find though, I guess
-at least memtest is smaller than a big OS.  I think I'd dump the whole
-of memory on both sides, hexdump and diff them  - I'd guess it would
-just be one byte/word different, maybe that would offer some idea what
-wrote it.
-
-Dave
-
->
-Any idea...
->
->
-=================md5 check patch=============================
->
->
-diff --git a/Makefile.target b/Makefile.target
->
-index 962d004..e2cb8e9 100644
->
---- a/Makefile.target
->
-+++ b/Makefile.target
->
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
->
-obj-y += memory_mapping.o
->
-obj-y += dump.o
->
-obj-y += migration/ram.o migration/savevm.o
->
--LIBS := $(libs_softmmu) $(LIBS)
->
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
->
->
-# xen support
->
-obj-$(CONFIG_XEN) += xen-common.o
->
-diff --git a/migration/ram.c b/migration/ram.c
->
-index 1eb155a..3b7a09d 100644
->
---- a/migration/ram.c
->
-+++ b/migration/ram.c
->
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
->
-version_id)
->
-}
->
->
-rcu_read_unlock();
->
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
->
-+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
->
-"%" PRIu64 "\n", ret, seq_iter);
->
-return ret;
->
-}
->
-diff --git a/migration/savevm.c b/migration/savevm.c
->
-index 0ad1b93..3feaa61 100644
->
---- a/migration/savevm.c
->
-+++ b/migration/savevm.c
->
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
->
->
-}
->
->
-+#include "exec/ram_addr.h"
->
-+#include "qemu/rcu_queue.h"
->
-+#include <clplumbing/md5.h>
->
-+#ifndef MD5_DIGEST_LENGTH
->
-+#define MD5_DIGEST_LENGTH 16
->
-+#endif
->
-+
->
-+static void check_host_md5(void)
->
-+{
->
-+    int i;
->
-+    unsigned char md[MD5_DIGEST_LENGTH];
->
-+    rcu_read_lock();
->
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
->
-'pc.ram' block */
->
-+    rcu_read_unlock();
->
-+
->
-+    MD5(block->host, block->used_length, md);
->
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
->
-+        fprintf(stderr, "%02x", md[i]);
->
-+    }
->
-+    fprintf(stderr, "\n");
->
-+    error_report("end ram md5");
->
-+}
->
-+
->
-void qemu_savevm_state_begin(QEMUFile *f,
->
-const MigrationParams *params)
->
-{
->
-@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
->
-bool iterable_only)
->
-save_section_header(f, se, QEMU_VM_SECTION_END);
->
->
-ret = se->ops->save_live_complete_precopy(f, se->opaque);
->
-+
->
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
->
-+        check_host_md5();
->
-+
->
-trace_savevm_section_end(se->idstr, se->section_id, ret);
->
-save_section_footer(f, se);
->
-if (ret < 0) {
->
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
->
-MigrationIncomingState *mis)
->
-section_id, le->se->idstr);
->
-return ret;
->
-}
->
-+            if (section_type == QEMU_VM_SECTION_END) {
->
-+                error_report("after loading state section id %d(%s)",
->
-+                             section_id, le->se->idstr);
->
-+                check_host_md5();
->
-+            }
->
-if (!check_section_footer(f, le)) {
->
-return -EINVAL;
->
-}
->
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
->
-}
->
->
-cpu_synchronize_all_post_init();
->
-+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
->
-+    check_host_md5();
->
->
-return ret;
->
-}
->
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-On 2015/12/3 17:24, Dr. David Alan Gilbert wrote:
-* Li Zhijian (address@hidden) wrote:
-Hi all,
-
-Does anyboday remember the similar issue post by hailiang months ago
-http://patchwork.ozlabs.org/patch/454322/
-At least tow bugs about migration had been fixed since that.
-Yes, I wondered what happened to that.
-And now we found the same issue at the tcg vm(kvm is fine), after migration,
-the content VM's memory is inconsistent.
-Hmm, TCG only - I don't know much about that; but I guess something must
-be accessing memory without using the proper macros/functions so
-it doesn't mark it as dirty.
-we add a patch to check memory content, you can find it from affix
-
-steps to reporduce:
-1) apply the patch and re-build qemu
-2) prepare the ubuntu guest and run memtest in grub.
-soruce side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off
-
-destination side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
-
-3) start migration
-with 1000M NIC, migration will finish within 3 min.
-
-at source:
-(qemu) migrate tcp:192.168.2.66:8881
-after saving ram complete
-e9e725df678d392b1a83b3a917f332bb
-qemu-system-x86_64: end ram md5
-(qemu)
-
-at destination:
-...skip...
-Completed load of VM with exit code 0 seq iteration 1264
-Completed load of VM with exit code 0 seq iteration 1265
-Completed load of VM with exit code 0 seq iteration 1266
-qemu-system-x86_64: after loading state section id 2(ram)
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
-
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-
-This occurs occasionally and only at tcg machine. It seems that
-some pages dirtied in source side don't transferred to destination.
-This problem can be reproduced even if we disable virtio.
-
-Is it OK for some pages that not transferred to destination when do
-migration ? Or is it a bug?
-I'm pretty sure that means it's a bug.  Hard to find though, I guess
-at least memtest is smaller than a big OS.  I think I'd dump the whole
-of memory on both sides, hexdump and diff them  - I'd guess it would
-just be one byte/word different, maybe that would offer some idea what
-wrote it.
-Maybe one better way to do that is with the help of userfaultfd's write-protect
-capability. It is still in the development by Andrea Arcangeli, but there
-is a RFC version available, please refer to
-http://www.spinics.net/lists/linux-mm/msg97422.html
-(I'm developing live memory snapshot which based on it, maybe this is another 
-scene where we
-can use userfaultfd's WP ;) ).
-Dave
-Any idea...
-
-=================md5 check patch=============================
-
-diff --git a/Makefile.target b/Makefile.target
-index 962d004..e2cb8e9 100644
---- a/Makefile.target
-+++ b/Makefile.target
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
-  obj-y += memory_mapping.o
-  obj-y += dump.o
-  obj-y += migration/ram.o migration/savevm.o
--LIBS := $(libs_softmmu) $(LIBS)
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
-
-  # xen support
-  obj-$(CONFIG_XEN) += xen-common.o
-diff --git a/migration/ram.c b/migration/ram.c
-index 1eb155a..3b7a09d 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
-version_id)
-      }
-
-      rcu_read_unlock();
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
-+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
-              "%" PRIu64 "\n", ret, seq_iter);
-      return ret;
-  }
-diff --git a/migration/savevm.c b/migration/savevm.c
-index 0ad1b93..3feaa61 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
-
-  }
-
-+#include "exec/ram_addr.h"
-+#include "qemu/rcu_queue.h"
-+#include <clplumbing/md5.h>
-+#ifndef MD5_DIGEST_LENGTH
-+#define MD5_DIGEST_LENGTH 16
-+#endif
-+
-+static void check_host_md5(void)
-+{
-+    int i;
-+    unsigned char md[MD5_DIGEST_LENGTH];
-+    rcu_read_lock();
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
-'pc.ram' block */
-+    rcu_read_unlock();
-+
-+    MD5(block->host, block->used_length, md);
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
-+        fprintf(stderr, "%02x", md[i]);
-+    }
-+    fprintf(stderr, "\n");
-+    error_report("end ram md5");
-+}
-+
-  void qemu_savevm_state_begin(QEMUFile *f,
-                               const MigrationParams *params)
-  {
-@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
-bool iterable_only)
-          save_section_header(f, se, QEMU_VM_SECTION_END);
-
-          ret = se->ops->save_live_complete_precopy(f, se->opaque);
-+
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
-+        check_host_md5();
-+
-          trace_savevm_section_end(se->idstr, se->section_id, ret);
-          save_section_footer(f, se);
-          if (ret < 0) {
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
-MigrationIncomingState *mis)
-                               section_id, le->se->idstr);
-                  return ret;
-              }
-+            if (section_type == QEMU_VM_SECTION_END) {
-+                error_report("after loading state section id %d(%s)",
-+                             section_id, le->se->idstr);
-+                check_host_md5();
-+            }
-              if (!check_section_footer(f, le)) {
-                  return -EINVAL;
-              }
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
-      }
-
-      cpu_synchronize_all_post_init();
-+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
-+    check_host_md5();
-
-      return ret;
-  }
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-.
-
-On 12/03/2015 05:37 PM, Hailiang Zhang wrote:
-On 2015/12/3 17:24, Dr. David Alan Gilbert wrote:
-* Li Zhijian (address@hidden) wrote:
-Hi all,
-
-Does anyboday remember the similar issue post by hailiang months ago
-http://patchwork.ozlabs.org/patch/454322/
-At least tow bugs about migration had been fixed since that.
-Yes, I wondered what happened to that.
-And now we found the same issue at the tcg vm(kvm is fine), after
-migration,
-the content VM's memory is inconsistent.
-Hmm, TCG only - I don't know much about that; but I guess something must
-be accessing memory without using the proper macros/functions so
-it doesn't mark it as dirty.
-we add a patch to check memory content, you can find it from affix
-
-steps to reporduce:
-1) apply the patch and re-build qemu
-2) prepare the ubuntu guest and run memtest in grub.
-soruce side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
-
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off
-
-destination side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
-
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
-
-3) start migration
-with 1000M NIC, migration will finish within 3 min.
-
-at source:
-(qemu) migrate tcp:192.168.2.66:8881
-after saving ram complete
-e9e725df678d392b1a83b3a917f332bb
-qemu-system-x86_64: end ram md5
-(qemu)
-
-at destination:
-...skip...
-Completed load of VM with exit code 0 seq iteration 1264
-Completed load of VM with exit code 0 seq iteration 1265
-Completed load of VM with exit code 0 seq iteration 1266
-qemu-system-x86_64: after loading state section id 2(ram)
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-qemu-system-x86_64: qemu_loadvm_state: after
-cpu_synchronize_all_post_init
-
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-
-This occurs occasionally and only at tcg machine. It seems that
-some pages dirtied in source side don't transferred to destination.
-This problem can be reproduced even if we disable virtio.
-
-Is it OK for some pages that not transferred to destination when do
-migration ? Or is it a bug?
-I'm pretty sure that means it's a bug.  Hard to find though, I guess
-at least memtest is smaller than a big OS.  I think I'd dump the whole
-of memory on both sides, hexdump and diff them  - I'd guess it would
-just be one byte/word different, maybe that would offer some idea what
-wrote it.
-Maybe one better way to do that is with the help of userfaultfd's
-write-protect
-capability. It is still in the development by Andrea Arcangeli, but there
-is a RFC version available, please refer to
-http://www.spinics.net/lists/linux-mm/msg97422.html
-(I'm developing live memory snapshot which based on it, maybe this is
-another scene where we
-can use userfaultfd's WP ;) ).
-sounds good.
-
-thanks
-Li
-Dave
-Any idea...
-
-=================md5 check patch=============================
-
-diff --git a/Makefile.target b/Makefile.target
-index 962d004..e2cb8e9 100644
---- a/Makefile.target
-+++ b/Makefile.target
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
-  obj-y += memory_mapping.o
-  obj-y += dump.o
-  obj-y += migration/ram.o migration/savevm.o
--LIBS := $(libs_softmmu) $(LIBS)
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
-
-  # xen support
-  obj-$(CONFIG_XEN) += xen-common.o
-diff --git a/migration/ram.c b/migration/ram.c
-index 1eb155a..3b7a09d 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
-version_id)
-      }
-
-      rcu_read_unlock();
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
-+    fprintf(stderr, "Completed load of VM with exit code %d seq
-iteration "
-              "%" PRIu64 "\n", ret, seq_iter);
-      return ret;
-  }
-diff --git a/migration/savevm.c b/migration/savevm.c
-index 0ad1b93..3feaa61 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
-
-  }
-
-+#include "exec/ram_addr.h"
-+#include "qemu/rcu_queue.h"
-+#include <clplumbing/md5.h>
-+#ifndef MD5_DIGEST_LENGTH
-+#define MD5_DIGEST_LENGTH 16
-+#endif
-+
-+static void check_host_md5(void)
-+{
-+    int i;
-+    unsigned char md[MD5_DIGEST_LENGTH];
-+    rcu_read_lock();
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
-'pc.ram' block */
-+    rcu_read_unlock();
-+
-+    MD5(block->host, block->used_length, md);
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
-+        fprintf(stderr, "%02x", md[i]);
-+    }
-+    fprintf(stderr, "\n");
-+    error_report("end ram md5");
-+}
-+
-  void qemu_savevm_state_begin(QEMUFile *f,
-                               const MigrationParams *params)
-  {
-@@ -1056,6 +1079,10 @@ void
-qemu_savevm_state_complete_precopy(QEMUFile *f,
-bool iterable_only)
-          save_section_header(f, se, QEMU_VM_SECTION_END);
-
-          ret = se->ops->save_live_complete_precopy(f, se->opaque);
-+
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
-+        check_host_md5();
-+
-          trace_savevm_section_end(se->idstr, se->section_id, ret);
-          save_section_footer(f, se);
-          if (ret < 0) {
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
-MigrationIncomingState *mis)
-                               section_id, le->se->idstr);
-                  return ret;
-              }
-+            if (section_type == QEMU_VM_SECTION_END) {
-+                error_report("after loading state section id %d(%s)",
-+                             section_id, le->se->idstr);
-+                check_host_md5();
-+            }
-              if (!check_section_footer(f, le)) {
-                  return -EINVAL;
-              }
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
-      }
-
-      cpu_synchronize_all_post_init();
-+    error_report("%s: after cpu_synchronize_all_post_init\n",
-__func__);
-+    check_host_md5();
-
-      return ret;
-  }
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-.
-.
---
-Best regards.
-Li Zhijian (8555)
-
-On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote:
-* Li Zhijian (address@hidden) wrote:
-Hi all,
-
-Does anyboday remember the similar issue post by hailiang months ago
-http://patchwork.ozlabs.org/patch/454322/
-At least tow bugs about migration had been fixed since that.
-Yes, I wondered what happened to that.
-And now we found the same issue at the tcg vm(kvm is fine), after migration,
-the content VM's memory is inconsistent.
-Hmm, TCG only - I don't know much about that; but I guess something must
-be accessing memory without using the proper macros/functions so
-it doesn't mark it as dirty.
-we add a patch to check memory content, you can find it from affix
-
-steps to reporduce:
-1) apply the patch and re-build qemu
-2) prepare the ubuntu guest and run memtest in grub.
-soruce side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off
-
-destination side:
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
-
-3) start migration
-with 1000M NIC, migration will finish within 3 min.
-
-at source:
-(qemu) migrate tcp:192.168.2.66:8881
-after saving ram complete
-e9e725df678d392b1a83b3a917f332bb
-qemu-system-x86_64: end ram md5
-(qemu)
-
-at destination:
-...skip...
-Completed load of VM with exit code 0 seq iteration 1264
-Completed load of VM with exit code 0 seq iteration 1265
-Completed load of VM with exit code 0 seq iteration 1266
-qemu-system-x86_64: after loading state section id 2(ram)
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
-
-49c2dac7bde0e5e22db7280dcb3824f9
-qemu-system-x86_64: end ram md5
-
-This occurs occasionally and only at tcg machine. It seems that
-some pages dirtied in source side don't transferred to destination.
-This problem can be reproduced even if we disable virtio.
-
-Is it OK for some pages that not transferred to destination when do
-migration ? Or is it a bug?
-I'm pretty sure that means it's a bug.  Hard to find though, I guess
-at least memtest is smaller than a big OS.  I think I'd dump the whole
-of memory on both sides, hexdump and diff them  - I'd guess it would
-just be one byte/word different, maybe that would offer some idea what
-wrote it.
-I try to dump and compare them, more than 10 pages are different.
-in source side, they are random value rather than always 'FF' 'FB' 'EF'
-'BF'... in destination.
-and not all of the different pages are continuous.
-
-thanks
-Li
-Dave
-Any idea...
-
-=================md5 check patch=============================
-
-diff --git a/Makefile.target b/Makefile.target
-index 962d004..e2cb8e9 100644
---- a/Makefile.target
-+++ b/Makefile.target
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
-  obj-y += memory_mapping.o
-  obj-y += dump.o
-  obj-y += migration/ram.o migration/savevm.o
--LIBS := $(libs_softmmu) $(LIBS)
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
-
-  # xen support
-  obj-$(CONFIG_XEN) += xen-common.o
-diff --git a/migration/ram.c b/migration/ram.c
-index 1eb155a..3b7a09d 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
-version_id)
-      }
-
-      rcu_read_unlock();
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
-+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
-              "%" PRIu64 "\n", ret, seq_iter);
-      return ret;
-  }
-diff --git a/migration/savevm.c b/migration/savevm.c
-index 0ad1b93..3feaa61 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
-
-  }
-
-+#include "exec/ram_addr.h"
-+#include "qemu/rcu_queue.h"
-+#include <clplumbing/md5.h>
-+#ifndef MD5_DIGEST_LENGTH
-+#define MD5_DIGEST_LENGTH 16
-+#endif
-+
-+static void check_host_md5(void)
-+{
-+    int i;
-+    unsigned char md[MD5_DIGEST_LENGTH];
-+    rcu_read_lock();
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
-'pc.ram' block */
-+    rcu_read_unlock();
-+
-+    MD5(block->host, block->used_length, md);
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
-+        fprintf(stderr, "%02x", md[i]);
-+    }
-+    fprintf(stderr, "\n");
-+    error_report("end ram md5");
-+}
-+
-  void qemu_savevm_state_begin(QEMUFile *f,
-                               const MigrationParams *params)
-  {
-@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
-bool iterable_only)
-          save_section_header(f, se, QEMU_VM_SECTION_END);
-
-          ret = se->ops->save_live_complete_precopy(f, se->opaque);
-+
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
-+        check_host_md5();
-+
-          trace_savevm_section_end(se->idstr, se->section_id, ret);
-          save_section_footer(f, se);
-          if (ret < 0) {
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
-MigrationIncomingState *mis)
-                               section_id, le->se->idstr);
-                  return ret;
-              }
-+            if (section_type == QEMU_VM_SECTION_END) {
-+                error_report("after loading state section id %d(%s)",
-+                             section_id, le->se->idstr);
-+                check_host_md5();
-+            }
-              if (!check_section_footer(f, le)) {
-                  return -EINVAL;
-              }
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
-      }
-
-      cpu_synchronize_all_post_init();
-+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
-+    check_host_md5();
-
-      return ret;
-  }
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-
-.
---
-Best regards.
-Li Zhijian (8555)
-
-* Li Zhijian (address@hidden) wrote:
->
->
->
-On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote:
->
->* Li Zhijian (address@hidden) wrote:
->
->>Hi all,
->
->>
->
->>Does anyboday remember the similar issue post by hailiang months ago
->
->>
-http://patchwork.ozlabs.org/patch/454322/
->
->>At least tow bugs about migration had been fixed since that.
->
->
->
->Yes, I wondered what happened to that.
->
->
->
->>And now we found the same issue at the tcg vm(kvm is fine), after migration,
->
->>the content VM's memory is inconsistent.
->
->
->
->Hmm, TCG only - I don't know much about that; but I guess something must
->
->be accessing memory without using the proper macros/functions so
->
->it doesn't mark it as dirty.
->
->
->
->>we add a patch to check memory content, you can find it from affix
->
->>
->
->>steps to reporduce:
->
->>1) apply the patch and re-build qemu
->
->>2) prepare the ubuntu guest and run memtest in grub.
->
->>soruce side:
->
->>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
->>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
->>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
->>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
->>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
->>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
->>pc-i440fx-2.3,accel=tcg,usb=off
->
->>
->
->>destination side:
->
->>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
->>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
->>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
->>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
->>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
->>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
->>pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
->
->>
->
->>3) start migration
->
->>with 1000M NIC, migration will finish within 3 min.
->
->>
->
->>at source:
->
->>(qemu) migrate tcp:192.168.2.66:8881
->
->>after saving ram complete
->
->>e9e725df678d392b1a83b3a917f332bb
->
->>qemu-system-x86_64: end ram md5
->
->>(qemu)
->
->>
->
->>at destination:
->
->>...skip...
->
->>Completed load of VM with exit code 0 seq iteration 1264
->
->>Completed load of VM with exit code 0 seq iteration 1265
->
->>Completed load of VM with exit code 0 seq iteration 1266
->
->>qemu-system-x86_64: after loading state section id 2(ram)
->
->>49c2dac7bde0e5e22db7280dcb3824f9
->
->>qemu-system-x86_64: end ram md5
->
->>qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
->
->>
->
->>49c2dac7bde0e5e22db7280dcb3824f9
->
->>qemu-system-x86_64: end ram md5
->
->>
->
->>This occurs occasionally and only at tcg machine. It seems that
->
->>some pages dirtied in source side don't transferred to destination.
->
->>This problem can be reproduced even if we disable virtio.
->
->>
->
->>Is it OK for some pages that not transferred to destination when do
->
->>migration ? Or is it a bug?
->
->
->
->I'm pretty sure that means it's a bug.  Hard to find though, I guess
->
->at least memtest is smaller than a big OS.  I think I'd dump the whole
->
->of memory on both sides, hexdump and diff them  - I'd guess it would
->
->just be one byte/word different, maybe that would offer some idea what
->
->wrote it.
->
->
-I try to dump and compare them, more than 10 pages are different.
->
-in source side, they are random value rather than always 'FF' 'FB' 'EF'
->
-'BF'... in destination.
->
->
-and not all of the different pages are continuous.
-I wonder if it happens on all of memtest's different test patterns,
-perhaps it might be possible to narrow it down if you tell memtest
-to only run one test at a time.
-
-Dave
-
->
->
-thanks
->
-Li
->
->
->
->
->
->Dave
->
->
->
->>Any idea...
->
->>
->
->>=================md5 check patch=============================
->
->>
->
->>diff --git a/Makefile.target b/Makefile.target
->
->>index 962d004..e2cb8e9 100644
->
->>--- a/Makefile.target
->
->>+++ b/Makefile.target
->
->>@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
->
->>  obj-y += memory_mapping.o
->
->>  obj-y += dump.o
->
->>  obj-y += migration/ram.o migration/savevm.o
->
->>-LIBS := $(libs_softmmu) $(LIBS)
->
->>+LIBS := $(libs_softmmu) $(LIBS) -lplumb
->
->>
->
->>  # xen support
->
->>  obj-$(CONFIG_XEN) += xen-common.o
->
->>diff --git a/migration/ram.c b/migration/ram.c
->
->>index 1eb155a..3b7a09d 100644
->
->>--- a/migration/ram.c
->
->>+++ b/migration/ram.c
->
->>@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int
->
->>version_id)
->
->>      }
->
->>
->
->>      rcu_read_unlock();
->
->>-    DPRINTF("Completed load of VM with exit code %d seq iteration "
->
->>+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
->
->>              "%" PRIu64 "\n", ret, seq_iter);
->
->>      return ret;
->
->>  }
->
->>diff --git a/migration/savevm.c b/migration/savevm.c
->
->>index 0ad1b93..3feaa61 100644
->
->>--- a/migration/savevm.c
->
->>+++ b/migration/savevm.c
->
->>@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
->
->>
->
->>  }
->
->>
->
->>+#include "exec/ram_addr.h"
->
->>+#include "qemu/rcu_queue.h"
->
->>+#include <clplumbing/md5.h>
->
->>+#ifndef MD5_DIGEST_LENGTH
->
->>+#define MD5_DIGEST_LENGTH 16
->
->>+#endif
->
->>+
->
->>+static void check_host_md5(void)
->
->>+{
->
->>+    int i;
->
->>+    unsigned char md[MD5_DIGEST_LENGTH];
->
->>+    rcu_read_lock();
->
->>+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
->
->>'pc.ram' block */
->
->>+    rcu_read_unlock();
->
->>+
->
->>+    MD5(block->host, block->used_length, md);
->
->>+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
->
->>+        fprintf(stderr, "%02x", md[i]);
->
->>+    }
->
->>+    fprintf(stderr, "\n");
->
->>+    error_report("end ram md5");
->
->>+}
->
->>+
->
->>  void qemu_savevm_state_begin(QEMUFile *f,
->
->>                               const MigrationParams *params)
->
->>  {
->
->>@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
->
->>bool iterable_only)
->
->>          save_section_header(f, se, QEMU_VM_SECTION_END);
->
->>
->
->>          ret = se->ops->save_live_complete_precopy(f, se->opaque);
->
->>+
->
->>+        fprintf(stderr, "after saving %s complete\n", se->idstr);
->
->>+        check_host_md5();
->
->>+
->
->>          trace_savevm_section_end(se->idstr, se->section_id, ret);
->
->>          save_section_footer(f, se);
->
->>          if (ret < 0) {
->
->>@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
->
->>MigrationIncomingState *mis)
->
->>                               section_id, le->se->idstr);
->
->>                  return ret;
->
->>              }
->
->>+            if (section_type == QEMU_VM_SECTION_END) {
->
->>+                error_report("after loading state section id %d(%s)",
->
->>+                             section_id, le->se->idstr);
->
->>+                check_host_md5();
->
->>+            }
->
->>              if (!check_section_footer(f, le)) {
->
->>                  return -EINVAL;
->
->>              }
->
->>@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
->
->>      }
->
->>
->
->>      cpu_synchronize_all_post_init();
->
->>+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
->
->>+    check_host_md5();
->
->>
->
->>      return ret;
->
->>  }
->
->>
->
->>
->
->>
->
->--
->
->Dr. David Alan Gilbert / address@hidden / Manchester, UK
->
->
->
->
->
->.
->
->
->
->
---
->
-Best regards.
->
-Li Zhijian (8555)
->
->
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
-Li Zhijian <address@hidden> wrote:
->
-Hi all,
->
->
-Does anyboday remember the similar issue post by hailiang months ago
->
-http://patchwork.ozlabs.org/patch/454322/
->
-At least tow bugs about migration had been fixed since that.
->
->
-And now we found the same issue at the tcg vm(kvm is fine), after
->
-migration, the content VM's memory is inconsistent.
->
->
-we add a patch to check memory content, you can find it from affix
->
->
-steps to reporduce:
->
-1) apply the patch and re-build qemu
->
-2) prepare the ubuntu guest and run memtest in grub.
->
-soruce side:
->
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
-pc-i440fx-2.3,accel=tcg,usb=off
->
->
-destination side:
->
-x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
->
-e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
->
-if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
->
-virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
->
--vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
->
-tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
->
-pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881
->
->
-3) start migration
->
-with 1000M NIC, migration will finish within 3 min.
->
->
-at source:
->
-(qemu) migrate tcp:192.168.2.66:8881
->
-after saving ram complete
->
-e9e725df678d392b1a83b3a917f332bb
->
-qemu-system-x86_64: end ram md5
->
-(qemu)
->
->
-at destination:
->
-...skip...
->
-Completed load of VM with exit code 0 seq iteration 1264
->
-Completed load of VM with exit code 0 seq iteration 1265
->
-Completed load of VM with exit code 0 seq iteration 1266
->
-qemu-system-x86_64: after loading state section id 2(ram)
->
-49c2dac7bde0e5e22db7280dcb3824f9
->
-qemu-system-x86_64: end ram md5
->
-qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init
->
->
-49c2dac7bde0e5e22db7280dcb3824f9
->
-qemu-system-x86_64: end ram md5
->
->
-This occurs occasionally and only at tcg machine. It seems that
->
-some pages dirtied in source side don't transferred to destination.
->
-This problem can be reproduced even if we disable virtio.
->
->
-Is it OK for some pages that not transferred to destination when do
->
-migration ? Or is it a bug?
->
->
-Any idea...
-Thanks for describing how to reproduce the bug.
-If some pages are not transferred to destination then it is a bug, so we
-need to know what the problem is, notice that the problem can be that
-TCG is not marking dirty some page, that Migration code "forgets" about
-that page, or anything eles altogether, that is what we need to find.
-
-There are more posibilities, I am not sure that memtest is on 32bit
-mode, and it is inside posibility that we are missing some state when we
-are on real mode.
-
-Will try to take a look at this.
-
-THanks, again.
-
-
->
->
-=================md5 check patch=============================
->
->
-diff --git a/Makefile.target b/Makefile.target
->
-index 962d004..e2cb8e9 100644
->
---- a/Makefile.target
->
-+++ b/Makefile.target
->
-@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
->
-obj-y += memory_mapping.o
->
-obj-y += dump.o
->
-obj-y += migration/ram.o migration/savevm.o
->
--LIBS := $(libs_softmmu) $(LIBS)
->
-+LIBS := $(libs_softmmu) $(LIBS) -lplumb
->
->
-# xen support
->
-obj-$(CONFIG_XEN) += xen-common.o
->
-diff --git a/migration/ram.c b/migration/ram.c
->
-index 1eb155a..3b7a09d 100644
->
---- a/migration/ram.c
->
-+++ b/migration/ram.c
->
-@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque,
->
-int version_id)
->
-}
->
->
-rcu_read_unlock();
->
--    DPRINTF("Completed load of VM with exit code %d seq iteration "
->
-+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
->
-"%" PRIu64 "\n", ret, seq_iter);
->
-return ret;
->
-}
->
-diff --git a/migration/savevm.c b/migration/savevm.c
->
-index 0ad1b93..3feaa61 100644
->
---- a/migration/savevm.c
->
-+++ b/migration/savevm.c
->
-@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)
->
->
-}
->
->
-+#include "exec/ram_addr.h"
->
-+#include "qemu/rcu_queue.h"
->
-+#include <clplumbing/md5.h>
->
-+#ifndef MD5_DIGEST_LENGTH
->
-+#define MD5_DIGEST_LENGTH 16
->
-+#endif
->
-+
->
-+static void check_host_md5(void)
->
-+{
->
-+    int i;
->
-+    unsigned char md[MD5_DIGEST_LENGTH];
->
-+    rcu_read_lock();
->
-+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
->
-'pc.ram' block */
->
-+    rcu_read_unlock();
->
-+
->
-+    MD5(block->host, block->used_length, md);
->
-+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
->
-+        fprintf(stderr, "%02x", md[i]);
->
-+    }
->
-+    fprintf(stderr, "\n");
->
-+    error_report("end ram md5");
->
-+}
->
-+
->
-void qemu_savevm_state_begin(QEMUFile *f,
->
-const MigrationParams *params)
->
-{
->
-@@ -1056,6 +1079,10 @@ void
->
-qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only)
->
-save_section_header(f, se, QEMU_VM_SECTION_END);
->
->
-ret = se->ops->save_live_complete_precopy(f, se->opaque);
->
-+
->
-+        fprintf(stderr, "after saving %s complete\n", se->idstr);
->
-+        check_host_md5();
->
-+
->
-trace_savevm_section_end(se->idstr, se->section_id, ret);
->
-save_section_footer(f, se);
->
-if (ret < 0) {
->
-@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
->
-MigrationIncomingState *mis)
->
-section_id, le->se->idstr);
->
-return ret;
->
-}
->
-+            if (section_type == QEMU_VM_SECTION_END) {
->
-+                error_report("after loading state section id %d(%s)",
->
-+                             section_id, le->se->idstr);
->
-+                check_host_md5();
->
-+            }
->
-if (!check_section_footer(f, le)) {
->
-return -EINVAL;
->
-}
->
-@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)
->
-}
->
->
-cpu_synchronize_all_post_init();
->
-+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
->
-+    check_host_md5();
->
->
-return ret;
->
-}
-
->
->
-Thanks for describing how to reproduce the bug.
->
-If some pages are not transferred to destination then it is a bug, so we need
->
-to know what the problem is, notice that the problem can be that TCG is not
->
-marking dirty some page, that Migration code "forgets" about that page, or
->
-anything eles altogether, that is what we need to find.
->
->
-There are more posibilities, I am not sure that memtest is on 32bit mode, and
->
-it is inside posibility that we are missing some state when we are on real
->
-mode.
->
->
-Will try to take a look at this.
->
->
-THanks, again.
->
-Hi Juan & Amit
-
- Do you think we should add a mechanism to check the data integrity during LM 
-like Zhijian's patch did?  it may be very helpful for developers. 
- Actually, I did the similar thing before in order to make sure that I did the 
-right thing we I change the code related to LM.
-
-Liang
-
-On (Fri) 04 Dec 2015 [01:43:07], Li, Liang Z wrote:
->
->
->
-> Thanks for describing how to reproduce the bug.
->
-> If some pages are not transferred to destination then it is a bug, so we
->
-> need
->
-> to know what the problem is, notice that the problem can be that TCG is not
->
-> marking dirty some page, that Migration code "forgets" about that page, or
->
-> anything eles altogether, that is what we need to find.
->
->
->
-> There are more posibilities, I am not sure that memtest is on 32bit mode,
->
-> and
->
-> it is inside posibility that we are missing some state when we are on real
->
-> mode.
->
->
->
-> Will try to take a look at this.
->
->
->
-> THanks, again.
->
->
->
->
-Hi Juan & Amit
->
->
-Do you think we should add a mechanism to check the data integrity during LM
->
-like Zhijian's patch did?  it may be very helpful for developers.
->
-Actually, I did the similar thing before in order to make sure that I did
->
-the right thing we I change the code related to LM.
-If you mean for debugging, something that's not always on, then I'm
-fine with it.
-
-A script that goes along that shows the result of comparison of the
-diff will be helpful too, something that shows how many pages are
-differnt, how many bytes in a page on average, and so on.
-
-                Amit
-
diff --git a/results/classifier/008/other/74545755 b/results/classifier/008/other/74545755
deleted file mode 100644
index 85e70c209..000000000
--- a/results/classifier/008/other/74545755
+++ /dev/null
@@ -1,354 +0,0 @@
-permissions: 0.770
-debug: 0.740
-performance: 0.721
-device: 0.720
-other: 0.683
-semantic: 0.669
-KVM: 0.661
-graphic: 0.660
-vnc: 0.650
-boot: 0.607
-files: 0.577
-network: 0.550
-socket: 0.549
-PID: 0.479
-
-[Bug Report][RFC PATCH 0/1] block: fix failing assert on paused VM migration
-
-There's a bug (failing assert) which is reproduced during migration of
-a paused VM.  I am able to reproduce it on a stand with 2 nodes and a common
-NFS share, with VM's disk on that share.
-
-root@fedora40-1-vm:~# virsh domblklist alma8-vm
- Target   Source
-------------------------------------------
- sda      /mnt/shared/images/alma8.qcow2
-
-root@fedora40-1-vm:~# df -Th /mnt/shared
-Filesystem          Type  Size  Used Avail Use% Mounted on
-127.0.0.1:/srv/nfsd nfs4   63G   16G   48G  25% /mnt/shared
-
-On the 1st node:
-
-root@fedora40-1-vm:~# virsh start alma8-vm ; virsh suspend alma8-vm
-root@fedora40-1-vm:~# virsh migrate --compressed --p2p --persistent 
---undefinesource --live alma8-vm qemu+ssh://fedora40-2-vm/system
-
-Then on the 2nd node:
-
-root@fedora40-2-vm:~# virsh migrate --compressed --p2p --persistent 
---undefinesource --live alma8-vm qemu+ssh://fedora40-1-vm/system
-error: operation failed: domain is not running
-
-root@fedora40-2-vm:~# tail -3 /var/log/libvirt/qemu/alma8-vm.log
-2024-09-19 13:53:33.336+0000: initiating migration
-qemu-system-x86_64: ../block.c:6976: int 
-bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & 
-BDRV_O_INACTIVE)' failed.
-2024-09-19 13:53:42.991+0000: shutting down, reason=crashed
-
-Backtrace:
-
-(gdb) bt
-#0  0x00007f7eaa2f1664 in __pthread_kill_implementation () at /lib64/libc.so.6
-#1  0x00007f7eaa298c4e in raise () at /lib64/libc.so.6
-#2  0x00007f7eaa280902 in abort () at /lib64/libc.so.6
-#3  0x00007f7eaa28081e in __assert_fail_base.cold () at /lib64/libc.so.6
-#4  0x00007f7eaa290d87 in __assert_fail () at /lib64/libc.so.6
-#5  0x0000563c38b95eb8 in bdrv_inactivate_recurse (bs=0x563c3b6c60c0) at 
-../block.c:6976
-#6  0x0000563c38b95aeb in bdrv_inactivate_all () at ../block.c:7038
-#7  0x0000563c3884d354 in qemu_savevm_state_complete_precopy_non_iterable 
-(f=0x563c3b700c20, in_postcopy=false, inactivate_disks=true)
-    at ../migration/savevm.c:1571
-#8  0x0000563c3884dc1a in qemu_savevm_state_complete_precopy (f=0x563c3b700c20, 
-iterable_only=false, inactivate_disks=true) at ../migration/savevm.c:1631
-#9  0x0000563c3883a340 in migration_completion_precopy (s=0x563c3b4d51f0, 
-current_active_state=<optimized out>) at ../migration/migration.c:2780
-#10 migration_completion (s=0x563c3b4d51f0) at ../migration/migration.c:2844
-#11 migration_iteration_run (s=0x563c3b4d51f0) at ../migration/migration.c:3270
-#12 migration_thread (opaque=0x563c3b4d51f0) at ../migration/migration.c:3536
-#13 0x0000563c38dbcf14 in qemu_thread_start (args=0x563c3c2d5bf0) at 
-../util/qemu-thread-posix.c:541
-#14 0x00007f7eaa2ef6d7 in start_thread () at /lib64/libc.so.6
-#15 0x00007f7eaa373414 in clone () at /lib64/libc.so.6
-
-What happens here is that after 1st migration BDS related to HDD remains
-inactive as VM is still paused.  Then when we initiate 2nd migration,
-bdrv_inactivate_all() leads to the attempt to set BDRV_O_INACTIVE flag
-on that node which is already set, thus assert fails.
-
-Attached patch which simply skips setting flag if it's already set is more
-of a kludge than a clean solution.  Should we use more sophisticated logic
-which allows some of the nodes be in inactive state prior to the migration,
-and takes them into account during bdrv_inactivate_all()?  Comments would
-be appreciated.
-
-Andrey
-
-Andrey Drobyshev (1):
-  block: do not fail when inactivating node which is inactive
-
- block.c | 10 +++++++++-
- 1 file changed, 9 insertions(+), 1 deletion(-)
-
--- 
-2.39.3
-
-Instead of throwing an assert let's just ignore that flag is already set
-and return.  We assume that it's going to be safe to ignore.  Otherwise
-this assert fails when migrating a paused VM back and forth.
-
-Ideally we'd like to have a more sophisticated solution, e.g. not even
-scan the nodes which should be inactive at this point.
-
-Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
----
- block.c | 10 +++++++++-
- 1 file changed, 9 insertions(+), 1 deletion(-)
-
-diff --git a/block.c b/block.c
-index 7d90007cae..c1dcf906d1 100644
---- a/block.c
-+++ b/block.c
-@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK 
-bdrv_inactivate_recurse(BlockDriverState *bs)
-         return 0;
-     }
- 
--    assert(!(bs->open_flags & BDRV_O_INACTIVE));
-+    if (bs->open_flags & BDRV_O_INACTIVE) {
-+        /*
-+         * Return here instead of throwing assert as a workaround to
-+         * prevent failure on migrating paused VM.
-+         * Here we assume that if we're trying to inactivate BDS that's
-+         * already inactive, it's safe to just ignore it.
-+         */
-+        return 0;
-+    }
- 
-     /* Inactivate this node */
-     if (bs->drv->bdrv_inactivate) {
--- 
-2.39.3
-
-[add migration maintainers]
-
-On 24.09.24 15:56, Andrey Drobyshev wrote:
-Instead of throwing an assert let's just ignore that flag is already set
-and return.  We assume that it's going to be safe to ignore.  Otherwise
-this assert fails when migrating a paused VM back and forth.
-
-Ideally we'd like to have a more sophisticated solution, e.g. not even
-scan the nodes which should be inactive at this point.
-
-Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
----
-  block.c | 10 +++++++++-
-  1 file changed, 9 insertions(+), 1 deletion(-)
-
-diff --git a/block.c b/block.c
-index 7d90007cae..c1dcf906d1 100644
---- a/block.c
-+++ b/block.c
-@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK 
-bdrv_inactivate_recurse(BlockDriverState *bs)
-          return 0;
-      }
--    assert(!(bs->open_flags & BDRV_O_INACTIVE));
-+    if (bs->open_flags & BDRV_O_INACTIVE) {
-+        /*
-+         * Return here instead of throwing assert as a workaround to
-+         * prevent failure on migrating paused VM.
-+         * Here we assume that if we're trying to inactivate BDS that's
-+         * already inactive, it's safe to just ignore it.
-+         */
-+        return 0;
-+    }
-/* Inactivate this node */
-if (bs->drv->bdrv_inactivate) {
-I doubt that this a correct way to go.
-
-As far as I understand, "inactive" actually means that "storage is not belong to 
-qemu, but to someone else (another qemu process for example), and may be changed 
-transparently". In turn this means that Qemu should do nothing with inactive disks. So the 
-problem is that nobody called bdrv_activate_all on target, and we shouldn't ignore that.
-
-Hmm, I see in process_incoming_migration_bh() we do call bdrv_activate_all(), 
-but only in some scenarios. May be, the condition should be less strict here.
-
-Why we need any condition here at all? Don't we want to activate block-layer on 
-target after migration anyway?
-
---
-Best regards,
-Vladimir
-
-On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote:
->
-[add migration maintainers]
->
->
-On 24.09.24 15:56, Andrey Drobyshev wrote:
->
-> [...]
->
->
-I doubt that this a correct way to go.
->
->
-As far as I understand, "inactive" actually means that "storage is not
->
-belong to qemu, but to someone else (another qemu process for example),
->
-and may be changed transparently". In turn this means that Qemu should
->
-do nothing with inactive disks. So the problem is that nobody called
->
-bdrv_activate_all on target, and we shouldn't ignore that.
->
->
-Hmm, I see in process_incoming_migration_bh() we do call
->
-bdrv_activate_all(), but only in some scenarios. May be, the condition
->
-should be less strict here.
->
->
-Why we need any condition here at all? Don't we want to activate
->
-block-layer on target after migration anyway?
->
-Hmm I'm not sure about the unconditional activation, since we at least
-have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it
-in such a case).  In current libvirt upstream I see such code:
-
->
-/* Migration capabilities which should always be enabled as long as they
->
->
-* are supported by QEMU. If the capability is supposed to be enabled on both
->
->
-* sides of migration, it won't be enabled unless both sides support it.
->
->
-*/
->
->
-static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] =
->
-{
->
->
-{QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER,
->
->
-QEMU_MIGRATION_SOURCE},
->
->
->
->
-{QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE,
->
->
-QEMU_MIGRATION_DESTINATION},
->
->
-};
-which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set.
-
-The code from process_incoming_migration_bh() you're referring to:
-
->
-/* If capability late_block_activate is set:
->
->
-* Only fire up the block code now if we're going to restart the
->
->
-* VM, else 'cont' will do it.
->
->
-* This causes file locking to happen; so we don't want it to happen
->
->
-* unless we really are starting the VM.
->
->
-*/
->
->
-if (!migrate_late_block_activate() ||
->
->
-(autostart && (!global_state_received() ||
->
->
-runstate_is_live(global_state_get_runstate())))) {
->
->
-/* Make sure all file formats throw away their mutable metadata.
->
->
->
-* If we get an error here, just don't restart the VM yet. */
->
->
-bdrv_activate_all(&local_err);
->
->
-if (local_err) {
->
->
-error_report_err(local_err);
->
->
-local_err = NULL;
->
->
-autostart = false;
->
->
-}
->
->
-}
-It states explicitly that we're either going to start VM right at this
-point if (autostart == true), or we wait till "cont" command happens.
-None of this is going to happen if we start another migration while
-still being in PAUSED state.  So I think it seems reasonable to take
-such case into account.  For instance, this patch does prevent the crash:
-
->
-diff --git a/migration/migration.c b/migration/migration.c
->
-index ae2be31557..3222f6745b 100644
->
---- a/migration/migration.c
->
-+++ b/migration/migration.c
->
-@@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque)
->
-*/
->
-if (!migrate_late_block_activate() ||
->
-(autostart && (!global_state_received() ||
->
--            runstate_is_live(global_state_get_runstate())))) {
->
-+            runstate_is_live(global_state_get_runstate()))) ||
->
-+         (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) {
->
-/* Make sure all file formats throw away their mutable metadata.
->
-* If we get an error here, just don't restart the VM yet. */
->
-bdrv_activate_all(&local_err);
-What are your thoughts on it?
-
-Andrey
-
diff --git a/results/classifier/008/other/80604314 b/results/classifier/008/other/80604314
deleted file mode 100644
index 79b9997e8..000000000
--- a/results/classifier/008/other/80604314
+++ /dev/null
@@ -1,1490 +0,0 @@
-performance: 0.919
-device: 0.917
-debug: 0.901
-graphic: 0.901
-other: 0.898
-PID: 0.896
-permissions: 0.892
-KVM: 0.891
-semantic: 0.890
-socket: 0.884
-vnc: 0.881
-network: 0.865
-files: 0.861
-boot: 0.860
-
-[BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
-
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
-    config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-
-Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
-the autogenerated virtio-net-ccw device is present) works. Specifying
-several "-device virtio-net-pci" works as well.
-
-Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
-client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
-works (in-between state does not compile).
-
-This is reproducible with tcg as well. Same problem both with
---enable-vhost-vdpa and --disable-vhost-vdpa.
-
-Have not yet tried to figure out what might be special with
-virtio-ccw... anyone have an idea?
-
-[This should probably be considered a blocker?]
-
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-When I start qemu with a second virtio-net-ccw device (i.e. adding
->
--device virtio-net-ccw in addition to the autogenerated device), I get
->
-a segfault. gdb points to
->
->
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-config=0x55d6ad9e3f80 "RT") at
->
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->
-(backtrace doesn't go further)
->
->
-Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-the autogenerated virtio-net-ccw device is present) works. Specifying
->
-several "-device virtio-net-pci" works as well.
->
->
-Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-works (in-between state does not compile).
-Ouch. I didn't test all in-between states :(
-But I wish we had a 0-day instrastructure like kernel has,
-that catches things like that.
-
->
-This is reproducible with tcg as well. Same problem both with
->
---enable-vhost-vdpa and --disable-vhost-vdpa.
->
->
-Have not yet tried to figure out what might be special with
->
-virtio-ccw... anyone have an idea?
->
->
-[This should probably be considered a blocker?]
-
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
-
->
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> a segfault. gdb points to
->
->
->
-> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->     config=0x55d6ad9e3f80 "RT") at
->
-> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> 146     if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->
->
-> (backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-
->
->
->
-> Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> several "-device virtio-net-pci" works as well.
->
->
->
-> Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> works (in-between state does not compile).
->
->
-Ouch. I didn't test all in-between states :(
->
-But I wish we had a 0-day instrastructure like kernel has,
->
-that catches things like that.
-Yep, that would be useful... so patchew only builds the complete series?
-
->
->
-> This is reproducible with tcg as well. Same problem both with
->
-> --enable-vhost-vdpa and --disable-vhost-vdpa.
->
->
->
-> Have not yet tried to figure out what might be special with
->
-> virtio-ccw... anyone have an idea?
->
->
->
-> [This should probably be considered a blocker?]
-I think so, as it makes s390x unusable with more that one
-virtio-net-ccw device, and I don't even see a workaround.
-
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-On Fri, 24 Jul 2020 09:30:58 -0400
->
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
->
->
-> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> > -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> > a segfault. gdb points to
->
-> >
->
-> > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-> >     config=0x55d6ad9e3f80 "RT") at
->
-> > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > 146           if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> >
->
-> > (backtrace doesn't go further)
->
->
-The core was incomplete, but running under gdb directly shows that it
->
-is just a bog-standard config space access (first for that device).
->
->
-The cause of the crash is that nc->peer is not set... no idea how that
->
-can happen, not that familiar with that part of QEMU. (Should the code
->
-check, or is that really something that should not happen?)
->
->
-What I don't understand is why it is set correctly for the first,
->
-autogenerated virtio-net-ccw device, but not for the second one, and
->
-why virtio-net-pci doesn't show these problems. The only difference
->
-between -ccw and -pci that comes to my mind here is that config space
->
-accesses for ccw are done via an asynchronous operation, so timing
->
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-
->
-> >
->
-> > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> > the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> > several "-device virtio-net-pci" works as well.
->
-> >
->
-> > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> > works (in-between state does not compile).
->
->
->
-> Ouch. I didn't test all in-between states :(
->
-> But I wish we had a 0-day instrastructure like kernel has,
->
-> that catches things like that.
->
->
-Yep, that would be useful... so patchew only builds the complete series?
->
->
->
->
-> > This is reproducible with tcg as well. Same problem both with
->
-> > --enable-vhost-vdpa and --disable-vhost-vdpa.
->
-> >
->
-> > Have not yet tried to figure out what might be special with
->
-> > virtio-ccw... anyone have an idea?
->
-> >
->
-> > [This should probably be considered a blocker?]
->
->
-I think so, as it makes s390x unusable with more that one
->
-virtio-net-ccw device, and I don't even see a workaround.
-
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
-
->
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> On Fri, 24 Jul 2020 09:30:58 -0400
->
-> "Michael S. Tsirkin" <mst@redhat.com> wrote:
->
->
->
-> > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > > When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> > > -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> > > a segfault. gdb points to
->
-> > >
->
-> > > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-> > >     config=0x55d6ad9e3f80 "RT") at
->
-> > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > >
->
-> > > (backtrace doesn't go further)
->
->
->
-> The core was incomplete, but running under gdb directly shows that it
->
-> is just a bog-standard config space access (first for that device).
->
->
->
-> The cause of the crash is that nc->peer is not set... no idea how that
->
-> can happen, not that familiar with that part of QEMU. (Should the code
->
-> check, or is that really something that should not happen?)
->
->
->
-> What I don't understand is why it is set correctly for the first,
->
-> autogenerated virtio-net-ccw device, but not for the second one, and
->
-> why virtio-net-pci doesn't show these problems. The only difference
->
-> between -ccw and -pci that comes to my mind here is that config space
->
-> accesses for ccw are done via an asynchronous operation, so timing
->
-> might be different.
->
->
-Hopefully Jason has an idea. Could you post a full command line
->
-please? Do you need a working guest to trigger this? Does this trigger
->
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on 
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
- 
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-
->
->
-> > >
->
-> > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> > > the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> > > several "-device virtio-net-pci" works as well.
->
-> > >
->
-> > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> > > works (in-between state does not compile).
->
-> >
->
-> > Ouch. I didn't test all in-between states :(
->
-> > But I wish we had a 0-day instrastructure like kernel has,
->
-> > that catches things like that.
->
->
->
-> Yep, that would be useful... so patchew only builds the complete series?
->
->
->
-> >
->
-> > > This is reproducible with tcg as well. Same problem both with
->
-> > > --enable-vhost-vdpa and --disable-vhost-vdpa.
->
-> > >
->
-> > > Have not yet tried to figure out what might be special with
->
-> > > virtio-ccw... anyone have an idea?
->
-> > >
->
-> > > [This should probably be considered a blocker?]
->
->
->
-> I think so, as it makes s390x unusable with more that one
->
-> virtio-net-ccw device, and I don't even see a workaround.
->
-
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-     config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-
-Thanks
-0001-virtio-net-check-the-existence-of-peer-before-accesi.patch
-Description:
-Text Data
-
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-
->
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
->
-> On Fri, 24 Jul 2020 11:17:57 -0400
->
-> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->
->
->> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
->>> On Fri, 24 Jul 2020 09:30:58 -0400
->
->>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>
->
->>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
->>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
->>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
->
->>>>> a segfault. gdb points to
->
->>>>>
->
->>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->>>>>      config=0x55d6ad9e3f80 "RT") at
->
->>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
->>>>> 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->>>>>
->
->>>>> (backtrace doesn't go further)
->
->>> The core was incomplete, but running under gdb directly shows that it
->
->>> is just a bog-standard config space access (first for that device).
->
->>>
->
->>> The cause of the crash is that nc->peer is not set... no idea how that
->
->>> can happen, not that familiar with that part of QEMU. (Should the code
->
->>> check, or is that really something that should not happen?)
->
->>>
->
->>> What I don't understand is why it is set correctly for the first,
->
->>> autogenerated virtio-net-ccw device, but not for the second one, and
->
->>> why virtio-net-pci doesn't show these problems. The only difference
->
->>> between -ccw and -pci that comes to my mind here is that config space
->
->>> accesses for ccw are done via an asynchronous operation, so timing
->
->>> might be different.
->
->> Hopefully Jason has an idea. Could you post a full command line
->
->> please? Do you need a working guest to trigger this? Does this trigger
->
->> on an x86 host?
->
-> Yes, it does trigger with tcg-on-x86 as well. I've been using
->
->
->
-> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
-> qemu,zpci=on
->
-> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> -device
->
-> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> -device virtio-net-ccw
->
->
->
-> It seems it needs the guest actually doing something with the nics; I
->
-> cannot reproduce the crash if I use the old advent calendar moon buggy
->
-> image and just add a virtio-net-ccw device.
->
->
->
-> (I don't think it's a problem with my local build, as I see the problem
->
-> both on my laptop and on an LPAR.)
->
->
->
-It looks to me we forget the check the existence of peer.
->
->
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-
-On 2020/7/27 下午2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-      config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-For autogenerated virtio-net-cww, I think the reason is that it has
-already had a peer set.
-Thanks
-
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-
->
-On 2020/7/27 下午2:43, Cornelia Huck wrote:
->
-> On Sat, 25 Jul 2020 08:40:07 +0800
->
-> Jason Wang <jasowang@redhat.com> wrote:
->
->
->
->> On 2020/7/24 下午11:34, Cornelia Huck wrote:
->
->>> On Fri, 24 Jul 2020 11:17:57 -0400
->
->>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>
->
->>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
->>>>> On Fri, 24 Jul 2020 09:30:58 -0400
->
->>>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>>>
->
->>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
->>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
->>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
->
->>>>>>> a segfault. gdb points to
->
->>>>>>>
->
->>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->>>>>>>       config=0x55d6ad9e3f80 "RT") at
->
->>>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
->>>>>>> 146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->>>>>>>
->
->>>>>>> (backtrace doesn't go further)
->
->>>>> The core was incomplete, but running under gdb directly shows that it
->
->>>>> is just a bog-standard config space access (first for that device).
->
->>>>>
->
->>>>> The cause of the crash is that nc->peer is not set... no idea how that
->
->>>>> can happen, not that familiar with that part of QEMU. (Should the code
->
->>>>> check, or is that really something that should not happen?)
->
->>>>>
->
->>>>> What I don't understand is why it is set correctly for the first,
->
->>>>> autogenerated virtio-net-ccw device, but not for the second one, and
->
->>>>> why virtio-net-pci doesn't show these problems. The only difference
->
->>>>> between -ccw and -pci that comes to my mind here is that config space
->
->>>>> accesses for ccw are done via an asynchronous operation, so timing
->
->>>>> might be different.
->
->>>> Hopefully Jason has an idea. Could you post a full command line
->
->>>> please? Do you need a working guest to trigger this? Does this trigger
->
->>>> on an x86 host?
->
->>> Yes, it does trigger with tcg-on-x86 as well. I've been using
->
->>>
->
->>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
->>> qemu,zpci=on
->
->>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
->>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
->>> -device
->
->>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
->>> -device virtio-net-ccw
->
->>>
->
->>> It seems it needs the guest actually doing something with the nics; I
->
->>> cannot reproduce the crash if I use the old advent calendar moon buggy
->
->>> image and just add a virtio-net-ccw device.
->
->>>
->
->>> (I don't think it's a problem with my local build, as I see the problem
->
->>> both on my laptop and on an LPAR.)
->
->>
->
->> It looks to me we forget the check the existence of peer.
->
->>
->
->> Please try the attached patch to see if it works.
->
-> Thanks, that patch gets my guest up and running again. So, FWIW,
->
->
->
-> Tested-by: Cornelia Huck <cohuck@redhat.com>
->
->
->
-> Any idea why this did not hit with virtio-net-pci (or the autogenerated
->
-> virtio-net-ccw device)?
->
->
->
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-
->
->
-For autogenerated virtio-net-cww, I think the reason is that it has
->
-already had a peer set.
-Ok, that might well be.
-
-On 2020/7/27 下午4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/27 下午2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-       config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need
-start without peer, and you need a real guest (any Linux) that is trying
-to access the config space of virtio-net.
-Thanks
-For autogenerated virtio-net-cww, I think the reason is that it has
-already had a peer set.
-Ok, that might well be.
-
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
->
->
-On 2020/7/27 下午4:41, Cornelia Huck wrote:
->
-> On Mon, 27 Jul 2020 15:38:12 +0800
->
-> Jason Wang <jasowang@redhat.com> wrote:
->
->
->
-> > On 2020/7/27 下午2:43, Cornelia Huck wrote:
->
-> > > On Sat, 25 Jul 2020 08:40:07 +0800
->
-> > > Jason Wang <jasowang@redhat.com> wrote:
->
-> > > > On 2020/7/24 下午11:34, Cornelia Huck wrote:
->
-> > > > > On Fri, 24 Jul 2020 11:17:57 -0400
->
-> > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
-> > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
->
-> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
-> > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e.
->
-> > > > > > > > > adding
->
-> > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
->
-> > > > > > > > > device), I get
->
-> > > > > > > > > a segfault. gdb points to
->
-> > > > > > > > >
->
-> > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
->
-> > > > > > > > > (vdev=<optimized out>,
->
-> > > > > > > > >        config=0x55d6ad9e3f80 "RT") at
->
-> > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > > > > > > > 146     if (nc->peer->info->type ==
->
-> > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > > > > > > > >
->
-> > > > > > > > > (backtrace doesn't go further)
->
-> > > > > > > The core was incomplete, but running under gdb directly shows
->
-> > > > > > > that it
->
-> > > > > > > is just a bog-standard config space access (first for that
->
-> > > > > > > device).
->
-> > > > > > >
->
-> > > > > > > The cause of the crash is that nc->peer is not set... no idea
->
-> > > > > > > how that
->
-> > > > > > > can happen, not that familiar with that part of QEMU. (Should
->
-> > > > > > > the code
->
-> > > > > > > check, or is that really something that should not happen?)
->
-> > > > > > >
->
-> > > > > > > What I don't understand is why it is set correctly for the
->
-> > > > > > > first,
->
-> > > > > > > autogenerated virtio-net-ccw device, but not for the second
->
-> > > > > > > one, and
->
-> > > > > > > why virtio-net-pci doesn't show these problems. The only
->
-> > > > > > > difference
->
-> > > > > > > between -ccw and -pci that comes to my mind here is that config
->
-> > > > > > > space
->
-> > > > > > > accesses for ccw are done via an asynchronous operation, so
->
-> > > > > > > timing
->
-> > > > > > > might be different.
->
-> > > > > > Hopefully Jason has an idea. Could you post a full command line
->
-> > > > > > please? Do you need a working guest to trigger this? Does this
->
-> > > > > > trigger
->
-> > > > > > on an x86 host?
->
-> > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
->
-> > > > >
->
-> > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
-> > > > > qemu,zpci=on
->
-> > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> > > > > -device
->
-> > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> > > > > -device virtio-net-ccw
->
-> > > > >
->
-> > > > > It seems it needs the guest actually doing something with the nics;
->
-> > > > > I
->
-> > > > > cannot reproduce the crash if I use the old advent calendar moon
->
-> > > > > buggy
->
-> > > > > image and just add a virtio-net-ccw device.
->
-> > > > >
->
-> > > > > (I don't think it's a problem with my local build, as I see the
->
-> > > > > problem
->
-> > > > > both on my laptop and on an LPAR.)
->
-> > > > It looks to me we forget the check the existence of peer.
->
-> > > >
->
-> > > > Please try the attached patch to see if it works.
->
-> > > Thanks, that patch gets my guest up and running again. So, FWIW,
->
-> > >
->
-> > > Tested-by: Cornelia Huck <cohuck@redhat.com>
->
-> > >
->
-> > > Any idea why this did not hit with virtio-net-pci (or the autogenerated
->
-> > > virtio-net-ccw device)?
->
-> >
->
-> > It can be hit with virtio-net-pci as well (just start without peer).
->
-> Hm, I had not been able to reproduce the crash with a 'naked' -device
->
-> virtio-net-pci. But checking seems to be the right idea anyway.
->
->
->
-Sorry for being unclear, I meant for networking part, you just need start
->
-without peer, and you need a real guest (any Linux) that is trying to access
->
-the config space of virtio-net.
->
->
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-
->
->
->
->
-> > For autogenerated virtio-net-cww, I think the reason is that it has
->
-> > already had a peer set.
->
-> Ok, that might well be.
->
->
->
->
-
-On 2020/7/27 下午7:43, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
-On 2020/7/27 下午4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/27 下午2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-        config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck<cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need start
-without peer, and you need a real guest (any Linux) that is trying to access
-the config space of virtio-net.
-
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-Yes, it depends on the cli actually.
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-I don't test pxe but I can reproduce this with pci (just start a linux
-guest without a peer).
-Thanks
-
-On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
->
->
-On 2020/7/27 下午7:43, Michael S. Tsirkin wrote:
->
-> On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
->
-> > On 2020/7/27 下午4:41, Cornelia Huck wrote:
->
-> > > On Mon, 27 Jul 2020 15:38:12 +0800
->
-> > > Jason Wang<jasowang@redhat.com>  wrote:
->
-> > >
->
-> > > > On 2020/7/27 下午2:43, Cornelia Huck wrote:
->
-> > > > > On Sat, 25 Jul 2020 08:40:07 +0800
->
-> > > > > Jason Wang<jasowang@redhat.com>  wrote:
->
-> > > > > > On 2020/7/24 下午11:34, Cornelia Huck wrote:
->
-> > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400
->
-> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
->
-> > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
->
-> > > > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
->
-> > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck
->
-> > > > > > > > > > wrote:
->
-> > > > > > > > > > > When I start qemu with a second virtio-net-ccw device
->
-> > > > > > > > > > > (i.e. adding
->
-> > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
->
-> > > > > > > > > > > device), I get
->
-> > > > > > > > > > > a segfault. gdb points to
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
->
-> > > > > > > > > > > (vdev=<optimized out>,
->
-> > > > > > > > > > >         config=0x55d6ad9e3f80 "RT") at
->
-> > > > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > > > > > > > > > 146         if (nc->peer->info->type ==
->
-> > > > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > (backtrace doesn't go further)
->
-> > > > > > > > > The core was incomplete, but running under gdb directly
->
-> > > > > > > > > shows that it
->
-> > > > > > > > > is just a bog-standard config space access (first for that
->
-> > > > > > > > > device).
->
-> > > > > > > > >
->
-> > > > > > > > > The cause of the crash is that nc->peer is not set... no
->
-> > > > > > > > > idea how that
->
-> > > > > > > > > can happen, not that familiar with that part of QEMU.
->
-> > > > > > > > > (Should the code
->
-> > > > > > > > > check, or is that really something that should not happen?)
->
-> > > > > > > > >
->
-> > > > > > > > > What I don't understand is why it is set correctly for the
->
-> > > > > > > > > first,
->
-> > > > > > > > > autogenerated virtio-net-ccw device, but not for the second
->
-> > > > > > > > > one, and
->
-> > > > > > > > > why virtio-net-pci doesn't show these problems. The only
->
-> > > > > > > > > difference
->
-> > > > > > > > > between -ccw and -pci that comes to my mind here is that
->
-> > > > > > > > > config space
->
-> > > > > > > > > accesses for ccw are done via an asynchronous operation, so
->
-> > > > > > > > > timing
->
-> > > > > > > > > might be different.
->
-> > > > > > > > Hopefully Jason has an idea. Could you post a full command
->
-> > > > > > > > line
->
-> > > > > > > > please? Do you need a working guest to trigger this? Does
->
-> > > > > > > > this trigger
->
-> > > > > > > > on an x86 host?
->
-> > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
->
-> > > > > > >
->
-> > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg
->
-> > > > > > > -cpu qemu,zpci=on
->
-> > > > > > > -m 1024 -nographic -device
->
-> > > > > > > virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> > > > > > > -drive
->
-> > > > > > > file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> > > > > > > -device
->
-> > > > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> > > > > > > -device virtio-net-ccw
->
-> > > > > > >
->
-> > > > > > > It seems it needs the guest actually doing something with the
->
-> > > > > > > nics; I
->
-> > > > > > > cannot reproduce the crash if I use the old advent calendar
->
-> > > > > > > moon buggy
->
-> > > > > > > image and just add a virtio-net-ccw device.
->
-> > > > > > >
->
-> > > > > > > (I don't think it's a problem with my local build, as I see the
->
-> > > > > > > problem
->
-> > > > > > > both on my laptop and on an LPAR.)
->
-> > > > > > It looks to me we forget the check the existence of peer.
->
-> > > > > >
->
-> > > > > > Please try the attached patch to see if it works.
->
-> > > > > Thanks, that patch gets my guest up and running again. So, FWIW,
->
-> > > > >
->
-> > > > > Tested-by: Cornelia Huck<cohuck@redhat.com>
->
-> > > > >
->
-> > > > > Any idea why this did not hit with virtio-net-pci (or the
->
-> > > > > autogenerated
->
-> > > > > virtio-net-ccw device)?
->
-> > > > It can be hit with virtio-net-pci as well (just start without peer).
->
-> > > Hm, I had not been able to reproduce the crash with a 'naked' -device
->
-> > > virtio-net-pci. But checking seems to be the right idea anyway.
->
-> > Sorry for being unclear, I meant for networking part, you just need start
->
-> > without peer, and you need a real guest (any Linux) that is trying to
->
-> > access
->
-> > the config space of virtio-net.
->
-> >
->
-> > Thanks
->
-> A pxe guest will do it, but that doesn't support ccw, right?
->
->
->
-Yes, it depends on the cli actually.
->
->
->
->
->
-> I'm still unclear why this triggers with ccw but not pci -
->
-> any idea?
->
->
->
-I don't test pxe but I can reproduce this with pci (just start a linux guest
->
-without a peer).
->
->
-Thanks
->
-Might be a good addition to a unit test. Not sure what would the
-test do exactly: just make sure guest runs? Looks like a lot of work
-for an empty test ... maybe we can poke at the guest config with
-qtest commands at least.
-
--- 
-MST
-
-On 2020/7/27 下午9:16, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
-On 2020/7/27 下午7:43, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
-On 2020/7/27 下午4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/27 下午2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-         config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck<cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need start
-without peer, and you need a real guest (any Linux) that is trying to access
-the config space of virtio-net.
-
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-Yes, it depends on the cli actually.
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-I don't test pxe but I can reproduce this with pci (just start a linux guest
-without a peer).
-
-Thanks
-Might be a good addition to a unit test. Not sure what would the
-test do exactly: just make sure guest runs? Looks like a lot of work
-for an empty test ... maybe we can poke at the guest config with
-qtest commands at least.
-That should work or we can simply extend the exist virtio-net qtest to
-do that.
-Thanks
-
diff --git a/results/classifier/008/other/80615920 b/results/classifier/008/other/80615920
deleted file mode 100644
index 3ce4056fd..000000000
--- a/results/classifier/008/other/80615920
+++ /dev/null
@@ -1,358 +0,0 @@
-KVM: 0.803
-other: 0.786
-vnc: 0.768
-performance: 0.758
-permissions: 0.758
-files: 0.751
-boot: 0.750
-device: 0.748
-debug: 0.746
-semantic: 0.737
-network: 0.732
-socket: 0.732
-graphic: 0.730
-PID: 0.727
-
-[BUG] accel/tcg: cpu_exec_longjmp_cleanup: assertion failed: (cpu == current_cpu)
-
-It seems there is a bug in SIGALRM handling when 486 system emulates x86_64 
-code.
-
-This code: 
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <pthread.h>
-#include <signal.h>
-#include <unistd.h>
-
-pthread_t thread1, thread2;
-
-// Signal handler for SIGALRM
-void alarm_handler(int sig) {
-    // Do nothing, just wake up the other thread
-}
-
-// Thread 1 function
-void* thread1_func(void* arg) {
-    // Set up the signal handler for SIGALRM
-    signal(SIGALRM, alarm_handler);
-
-    // Wait for 5 seconds
-    sleep(1);
-
-    // Send SIGALRM signal to thread 2
-    pthread_kill(thread2, SIGALRM);
-
-    return NULL;
-}
-
-// Thread 2 function
-void* thread2_func(void* arg) {
-    // Wait for the SIGALRM signal
-    pause();
-
-    printf("Thread 2 woke up!\n");
-
-    return NULL;
-}
-
-int main() {
-    // Create thread 1
-    if (pthread_create(&thread1, NULL, thread1_func, NULL) != 0) {
-        fprintf(stderr, "Failed to create thread 1\n");
-        return 1;
-    }
-
-    // Create thread 2
-    if (pthread_create(&thread2, NULL, thread2_func, NULL) != 0) {
-        fprintf(stderr, "Failed to create thread 2\n");
-        return 1;
-    }
-
-    // Wait for both threads to finish
-    pthread_join(thread1, NULL);
-    pthread_join(thread2, NULL);
-
-    return 0;
-}
-
-
-Fails with this -strace log (there are also unsupported syscalls 334 and 435, 
-but it seems it doesn't affect the code much):
-
-...
-736 rt_sigaction(SIGALRM,0x000000001123ec20,0x000000001123ecc0) = 0
-736 clock_nanosleep(CLOCK_REALTIME,0,{tv_sec = 1,tv_nsec = 0},{tv_sec = 
-1,tv_nsec = 0})
-736 rt_sigprocmask(SIG_BLOCK,0x00000000109fad20,0x0000000010800b38,8) = 0
-736 Unknown syscall 435
-736 
-clone(CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|
- ...
-736 rt_sigprocmask(SIG_SETMASK,0x0000000010800b38,NULL,8)
-736 set_robust_list(0x11a419a0,0) = -1 errno=38 (Function not implemented)
-736 rt_sigprocmask(SIG_SETMASK,0x0000000011a41fb0,NULL,8) = 0
- = 0
-736 pause(0,0,2,277186368,0,295966400)
-736 
-futex(0x000000001123f990,FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET,738,NULL,NULL,0)
- = 0
-736 rt_sigprocmask(SIG_BLOCK,0x00000000109fad20,0x000000001123ee88,8) = 0
-736 getpid() = 736
-736 tgkill(736,739,SIGALRM) = 0
- = -1 errno=4 (Interrupted system call)
---- SIGALRM {si_signo=SIGALRM, si_code=SI_TKILL, si_pid=736, si_uid=0} ---
-0x48874a != 0x3c69e10
-736 rt_sigprocmask(SIG_SETMASK,0x000000001123ee88,NULL,8) = 0
-**
-ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: 
-(cpu == current_cpu)
-Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion 
-failed: (cpu == current_cpu)
-0x48874a != 0x3c69e10
-**
-ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: 
-(cpu == current_cpu)
-Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion 
-failed: (cpu == current_cpu)
-# 
-
-The code fails either with or without -singlestep, the command line:
-
-/usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep  /opt/x86_64/alarm.bin
-
-Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't 
-use RDTSC on i486" [1], 
-with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now prints 
-current pointers of 
-cpu and current_cpu (line "0x48874a != 0x3c69e10").
-
-config.log (built as a part of buildroot, basically the minimal possible 
-configuration for running x86_64 on 486):
-
-# Configured with: 
-'/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/build/qemu-8.1.1/configure'
- '--prefix=/usr' 
-'--cross-prefix=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/i486-buildroot-linux-gnu-'
- '--audio-drv-list=' 
-'--python=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/python3'
- 
-'--ninja=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/ninja' 
-'--disable-alsa' '--disable-bpf' '--disable-brlapi' '--disable-bsd-user' 
-'--disable-cap-ng' '--disable-capstone' '--disable-containers' 
-'--disable-coreaudio' '--disable-curl' '--disable-curses' 
-'--disable-dbus-display' '--disable-docs' '--disable-dsound' '--disable-hvf' 
-'--disable-jack' '--disable-libiscsi' '--disable-linux-aio' 
-'--disable-linux-io-uring' '--disable-malloc-trim' '--disable-membarrier' 
-'--disable-mpath' '--disable-netmap' '--disable-opengl' '--disable-oss' 
-'--disable-pa' '--disable-rbd' '--disable-sanitizers' '--disable-selinux' 
-'--disable-sparse' '--disable-strip' '--disable-vde' '--disable-vhost-crypto' 
-'--disable-vhost-user-blk-server' '--disable-virtfs' '--disable-whpx' 
-'--disable-xen' '--disable-attr' '--disable-kvm' '--disable-vhost-net' 
-'--disable-download' '--disable-hexagon-idef-parser' '--disable-system' 
-'--enable-linux-user' '--target-list=x86_64-linux-user' '--disable-vhost-user' 
-'--disable-slirp' '--disable-sdl' '--disable-fdt' '--enable-trace-backends=nop' 
-'--disable-tools' '--disable-guest-agent' '--disable-fuse' 
-'--disable-fuse-lseek' '--disable-seccomp' '--disable-libssh' 
-'--disable-libusb' '--disable-vnc' '--disable-nettle' '--disable-numa' 
-'--disable-pipewire' '--disable-spice' '--disable-usb-redir' 
-'--disable-install-blobs'
-
-Emulation of the same x86_64 code with qemu 6.2.0 installed on another x86_64 
-native machine works fine.
-
-[1]
-https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg05387.html
-Best regards,
-Petr
-
-On Sat, 25 Nov 2023 at 13:09, Petr Cvek <petrcvekcz@gmail.com> wrote:
->
->
-It seems there is a bug in SIGALRM handling when 486 system emulates x86_64
->
-code.
-486 host is pretty well out of support currently. Can you reproduce
-this on a less ancient host CPU type ?
-
->
-ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed:
->
-(cpu == current_cpu)
->
-Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
->
-assertion failed: (cpu == current_cpu)
->
-0x48874a != 0x3c69e10
->
-**
->
-ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed:
->
-(cpu == current_cpu)
->
-Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
->
-assertion failed: (cpu == current_cpu)
-What compiler version do you build QEMU with? That
-assert is there because we have seen some buggy compilers
-in the past which don't correctly preserve the variable
-value as the setjmp/longjmp spec requires them to.
-
-thanks
--- PMM
-
-Dne 27. 11. 23 v 10:37 Peter Maydell napsal(a):
->
-On Sat, 25 Nov 2023 at 13:09, Petr Cvek <petrcvekcz@gmail.com> wrote:
->
->
->
-> It seems there is a bug in SIGALRM handling when 486 system emulates x86_64
->
-> code.
->
->
-486 host is pretty well out of support currently. Can you reproduce
->
-this on a less ancient host CPU type ?
->
-It seems it only fails when the code is compiled for i486. QEMU built with the 
-same compiler with -march=i586 and above runs on the same physical hardware 
-without a problem. All -march= variants were executed on ryzen 3600.
-
->
-> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion
->
-> failed: (cpu == current_cpu)
->
-> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
->
-> assertion failed: (cpu == current_cpu)
->
-> 0x48874a != 0x3c69e10
->
-> **
->
-> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion
->
-> failed: (cpu == current_cpu)
->
-> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
->
-> assertion failed: (cpu == current_cpu)
->
->
-What compiler version do you build QEMU with? That
->
-assert is there because we have seen some buggy compilers
->
-in the past which don't correctly preserve the variable
->
-value as the setjmp/longjmp spec requires them to.
->
-i486 and i586+ code variants were compiled with GCC 13.2.0 (more exactly, 
-slackware64 current multilib distribution).
-
-i486 binary which runs on the real 486 is also GCC 13.2.0 and installed as a 
-part of the buildroot crosscompiler (about two week old git snapshot).
-
->
-thanks
->
--- PMM
-best regards,
-Petr
-
-On 11/25/23 07:08, Petr Cvek wrote:
-ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: 
-(cpu == current_cpu)
-Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion 
-failed: (cpu == current_cpu)
-#
-
-The code fails either with or without -singlestep, the command line:
-
-/usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep  /opt/x86_64/alarm.bin
-
-Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't use 
-RDTSC on i486" [1],
-with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now prints 
-current pointers of
-cpu and current_cpu (line "0x48874a != 0x3c69e10").
-If you try this again with 8.2-rc2, you should not see an assertion failure.
-You should see instead
-
-QEMU internal SIGILL {code=ILLOPC, addr=0x12345678}
-which I think more accurately summarizes the situation of attempting RDTSC on hardware
-that does not support it.
-r~
-
-Dne 29. 11. 23 v 15:25 Richard Henderson napsal(a):
->
-On 11/25/23 07:08, Petr Cvek wrote:
->
-> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion
->
-> failed: (cpu == current_cpu)
->
-> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup:
->
-> assertion failed: (cpu == current_cpu)
->
-> #
->
->
->
-> The code fails either with or without -singlestep, the command line:
->
->
->
-> /usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestepÂ
->
-> /opt/x86_64/alarm.bin
->
->
->
-> Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't
->
-> use RDTSC on i486" [1],
->
-> with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now
->
-> prints current pointers of
->
-> cpu and current_cpu (line "0x48874a != 0x3c69e10").
->
->
->
-If you try this again with 8.2-rc2, you should not see an assertion failure.
->
-You should see instead
->
->
-QEMU internal SIGILL {code=ILLOPC, addr=0x12345678}
->
->
-which I think more accurately summarizes the situation of attempting RDTSC on
->
-hardware that does not support it.
->
->
-Compilation of vanilla qemu v8.2.0-rc2 with -march=i486 by GCC 13.2.0 and 
-running the resulting binary on ryzen still leads to:
-
-**
-ERROR:../accel/tcg/cpu-exec.c:533:cpu_exec_longjmp_cleanup: assertion failed: 
-(cpu == current_cpu)
-Bail out! ERROR:../accel/tcg/cpu-exec.c:533:cpu_exec_longjmp_cleanup: assertion 
-failed: (cpu == current_cpu)
-Aborted
-
->
->
-r~
-Petr
-
diff --git a/results/classifier/008/other/81775929 b/results/classifier/008/other/81775929
deleted file mode 100644
index 0c1809c02..000000000
--- a/results/classifier/008/other/81775929
+++ /dev/null
@@ -1,245 +0,0 @@
-other: 0.877
-permissions: 0.849
-PID: 0.847
-performance: 0.831
-vnc: 0.825
-semantic: 0.825
-graphic: 0.818
-KVM: 0.815
-socket: 0.810
-debug: 0.799
-files: 0.788
-device: 0.777
-network: 0.759
-boot: 0.742
-
-[Qemu-devel] [BUG] Monitor QMP is broken ?
-
-Hello!
-
- I have updated my qemu to the recent version and it seems to have lost 
-compatibility with
-libvirt. The error message is:
---- cut ---
-internal error: unable to execute QEMU command 'qmp_capabilities': QMP input 
-object member
-'id' is unexpected
---- cut ---
- What does it mean? Is it intentional or not?
-
-Kind regards,
-Pavel Fedin
-Expert Engineer
-Samsung Electronics Research center Russia
-
-Hello! 
-
->
-I have updated my qemu to the recent version and it seems to have lost
->
-compatibility
-with
->
-libvirt. The error message is:
->
---- cut ---
->
-internal error: unable to execute QEMU command 'qmp_capabilities': QMP input
->
-object
->
-member
->
-'id' is unexpected
->
---- cut ---
->
-What does it mean? Is it intentional or not?
-I have found the problem. It is caused by commit
-65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to use the 
-removed
-asynchronous interface but it still feeds in JSONs with 'id' field set to 
-something. So i
-think the related fragment in qmp_check_input_obj() function should be brought 
-back
-
-Kind regards,
-Pavel Fedin
-Expert Engineer
-Samsung Electronics Research center Russia
-
-On Fri, Jun 05, 2015 at 04:58:46PM +0300, Pavel Fedin wrote:
->
-Hello!
->
->
->  I have updated my qemu to the recent version and it seems to have lost
->
-> compatibility
->
-with
->
-> libvirt. The error message is:
->
-> --- cut ---
->
-> internal error: unable to execute QEMU command 'qmp_capabilities': QMP
->
-> input object
->
-> member
->
-> 'id' is unexpected
->
-> --- cut ---
->
->  What does it mean? Is it intentional or not?
->
->
-I have found the problem. It is caused by commit
->
-65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to use the
->
-removed
->
-asynchronous interface but it still feeds in JSONs with 'id' field set to
->
-something. So i
->
-think the related fragment in qmp_check_input_obj() function should be
->
-brought back
-If QMP is rejecting the 'id' parameter that is a regression bug.
-
-[quote]
-The QMP spec says
-
-2.3 Issuing Commands
---------------------
-
-The format for command execution is:
-
-{ "execute": json-string, "arguments": json-object, "id": json-value }
-
- Where,
-
-- The "execute" member identifies the command to be executed by the Server
-- The "arguments" member is used to pass any arguments required for the
-  execution of the command, it is optional when no arguments are
-  required. Each command documents what contents will be considered
-  valid when handling the json-argument
-- The "id" member is a transaction identification associated with the
-  command execution, it is optional and will be part of the response if
-  provided. The "id" member can be any json-value, although most
-  clients merely use a json-number incremented for each successive
-  command
-
-
-2.4 Commands Responses
-----------------------
-
-There are two possible responses which the Server will issue as the result
-of a command execution: success or error.
-
-2.4.1 success
--------------
-
-The format of a success response is:
-
-{ "return": json-value, "id": json-value }
-
- Where,
-
-- The "return" member contains the data returned by the command, which
-  is defined on a per-command basis (usually a json-object or
-  json-array of json-objects, but sometimes a json-number, json-string,
-  or json-array of json-strings); it is an empty json-object if the
-  command does not return data
-- The "id" member contains the transaction identification associated
-  with the command execution if issued by the Client
-
-[/quote]
-
-And as such, libvirt chose to /always/ send an 'id' parameter in all
-commands it issues.
-
-We don't however validate the id in the reply, though arguably we
-should have done so.
-
-Regards,
-Daniel
--- 
-|:
-http://berrange.com
--o-
-http://www.flickr.com/photos/dberrange/
-:|
-|:
-http://libvirt.org
--o-
-http://virt-manager.org
-:|
-|:
-http://autobuild.org
--o-
-http://search.cpan.org/~danberr/
-:|
-|:
-http://entangle-photo.org
--o-
-http://live.gnome.org/gtk-vnc
-:|
-
-"Daniel P. Berrange" <address@hidden> writes:
-
->
-On Fri, Jun 05, 2015 at 04:58:46PM +0300, Pavel Fedin wrote:
->
->  Hello!
->
->
->
-> >  I have updated my qemu to the recent version and it seems to have
->
-> > lost compatibility
->
-> with
->
-> > libvirt. The error message is:
->
-> > --- cut ---
->
-> > internal error: unable to execute QEMU command 'qmp_capabilities':
->
-> > QMP input object
->
-> > member
->
-> > 'id' is unexpected
->
-> > --- cut ---
->
-> >  What does it mean? Is it intentional or not?
->
->
->
->  I have found the problem. It is caused by commit
->
-> 65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to
->
-> use the removed
->
-> asynchronous interface but it still feeds in JSONs with 'id' field
->
-> set to something. So i
->
-> think the related fragment in qmp_check_input_obj() function should
->
-> be brought back
->
->
-If QMP is rejecting the 'id' parameter that is a regression bug.
-It is definitely a regression, my fault, and I'll get it fixed a.s.a.p.
-
-[...]
-
diff --git a/results/classifier/008/other/99674399 b/results/classifier/008/other/99674399
deleted file mode 100644
index 3e80733b0..000000000
--- a/results/classifier/008/other/99674399
+++ /dev/null
@@ -1,158 +0,0 @@
-permissions: 0.896
-device: 0.886
-other: 0.883
-debug: 0.857
-performance: 0.845
-semantic: 0.822
-boot: 0.822
-PID: 0.812
-graphic: 0.794
-files: 0.787
-socket: 0.747
-network: 0.711
-KVM: 0.698
-vnc: 0.673
-
-[BUG] qemu crashes on assertion in cpu_asidx_from_attrs when cpu is in smm mode
-
-Hi all!
-
-First, I see this issue:
-https://gitlab.com/qemu-project/qemu/-/issues/1198
-. 
-where some kvm/hardware failure leads to guest crash, and finally to this 
-assertion:
-
-   cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
-
-But in the ticket the talk is about the guest crash and fixing the kernel, not 
-about the final QEMU assertion (which definitely show that something should be 
-fixed in QEMU code too).
-
-
-We've faced same stack one time:
-
-(gdb) bt
-#0  raise () from /lib/x86_64-linux-gnu/libc.so.6
-#1  abort () from /lib/x86_64-linux-gnu/libc.so.6
-#2  ?? () from /lib/x86_64-linux-gnu/libc.so.6
-#3  __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
-#4  cpu_asidx_from_attrs  at ../hw/core/cpu-sysemu.c:76
-#5  cpu_memory_rw_debug  at ../softmmu/physmem.c:3529
-#6  x86_cpu_dump_state  at ../target/i386/cpu-dump.c:560
-#7  kvm_cpu_exec  at ../accel/kvm/kvm-all.c:3000
-#8  kvm_vcpu_thread_fn  at ../accel/kvm/kvm-accel-ops.c:51
-#9  qemu_thread_start  at ../util/qemu-thread-posix.c:505
-#10 start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
-#11 clone () from /lib/x86_64-linux-gnu/libc.so.6
-
-
-And what I see:
-
-static inline int x86_asidx_from_attrs(CPUState *cs, MemTxAttrs attrs)
-{
-    return !!attrs.secure;
-}
-
-int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs)
-{
-    int ret = 0;
-
-    if (cpu->cc->sysemu_ops->asidx_from_attrs) {
-        ret = cpu->cc->sysemu_ops->asidx_from_attrs(cpu, attrs);
-        assert(ret < cpu->num_ases && ret >= 0);         <<<<<<<<<<<<<<<<<
-    }
-    return ret;
-}
-
-(gdb) p cpu->num_ases
-$3 = 1
-
-(gdb) fr 5
-#5  0x00005578c8814ba3 in cpu_memory_rw_debug (cpu=c...
-(gdb) p attrs
-$6 = {unspecified = 0, secure = 1, user = 0, memory = 0, requester_id = 0, 
-byte_swap = 0, target_tlb_bit0 = 0, target_tlb_bit1 = 0, target_tlb_bit2 = 0}
-
-so .secure is 1, therefore ret is 1, in the same time num_ases is 1 too and 
-assertion fails.
-
-
-
-Where is .secure from?
-
-static inline MemTxAttrs cpu_get_mem_attrs(CPUX86State *env)
-{
-    return ((MemTxAttrs) { .secure = (env->hflags & HF_SMM_MASK) != 0 });
-}
-
-Ok, it means we in SMM mode.
-
-
-
-On the other hand, it seems that num_ases seems to be always 1 for x86:
-
-vsementsov@vsementsov-lin:~/work/src/qemu/yc-7.2$ git grep 'num_ases = '
-cpu.c:    cpu->num_ases = 0;
-softmmu/cpus.c:        cpu->num_ases = 1;
-target/arm/cpu.c:        cs->num_ases = 3 + has_secure;
-target/arm/cpu.c:        cs->num_ases = 1 + has_secure;
-target/i386/tcg/sysemu/tcg-cpu.c:    cs->num_ases = 2;
-
-
-So, something is wrong around cpu->num_ases and x86_asidx_from_attrs() which 
-may return more in SMM mode.
-
-
-The stack starts in
-//7  0x00005578c882f539 in kvm_cpu_exec (cpu=cpu@entry=0x5578ca2eb340) at 
-../accel/kvm/kvm-all.c:3000
-    if (ret < 0) {
-        cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
-        vm_stop(RUN_STATE_INTERNAL_ERROR);
-    }
-
-So that was some kvm error, and we decided to call cpu_dump_state(). And it 
-crashes. cpu_dump_state() is also called from hmp_info_registers, so I can 
-reproduce the crash with a tiny patch to master (as only CPU_DUMP_CODE path 
-calls cpu_memory_rw_debug(), as it is in kvm_cpu_exec()):
-
-diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c
-index ff01cf9d8d..dcf0189048 100644
---- a/monitor/hmp-cmds-target.c
-+++ b/monitor/hmp-cmds-target.c
-@@ -116,7 +116,7 @@ void hmp_info_registers(Monitor *mon, const QDict *qdict)
-         }
-
-         monitor_printf(mon, "\nCPU#%d\n", cs->cpu_index);
--        cpu_dump_state(cs, NULL, CPU_DUMP_FPU);
-+        cpu_dump_state(cs, NULL, CPU_DUMP_CODE);
-     }
- }
-
-
-Than run
-
-yes "info registers" | ./build/qemu-system-x86_64 -accel kvm -monitor stdio \
-   -global driver=cfi.pflash01,property=secure,value=on \
-   -blockdev "{'driver': 'file', 'filename': 
-'/usr/share/OVMF/OVMF_CODE_4M.secboot.fd', 'node-name': 'ovmf-code', 'read-only': 
-true}" \
-   -blockdev "{'driver': 'file', 'filename': '/usr/share/OVMF/OVMF_VARS_4M.fd', 
-'node-name': 'ovmf-vars', 'read-only': true}" \
-   -machine q35,smm=on,pflash0=ovmf-code,pflash1=ovmf-vars -m 2G -nodefaults
-
-And after some time (less than 20 seconds for me) it leads to
-
-qemu-system-x86_64: ../hw/core/cpu-sysemu.c:76: cpu_asidx_from_attrs: Assertion `ret < 
-cpu->num_ases && ret >= 0' failed.
-Aborted (core dumped)
-
-
-I've no idea how to correctly fix this bug, but I hope that my reproducer and 
-investigation will help a bit.
-
---
-Best regards,
-Vladimir
-