9 files changed, 1509 insertions, 0 deletions
diff --git a/results/classifier/108/other/258 b/results/classifier/108/other/258
new file mode 100644
index 00000000..39100607
--- /dev/null
+++ b/results/classifier/108/other/258
@@ -0,0 +1,16 @@
+graphic: 0.894
+device: 0.621
+performance: 0.394
+vnc: 0.255
+permissions: 0.207
+other: 0.201
+semantic: 0.179
+boot: 0.124
+network: 0.060
+debug: 0.050
+files: 0.034
+PID: 0.022
+socket: 0.017
+KVM: 0.002
+
+Add Illumnos VM image
diff --git a/results/classifier/108/other/2581 b/results/classifier/108/other/2581
new file mode 100644
index 00000000..0712adc2
--- /dev/null
+++ b/results/classifier/108/other/2581
@@ -0,0 +1,27 @@
+graphic: 0.901
+device: 0.882
+files: 0.791
+semantic: 0.749
+PID: 0.618
+socket: 0.492
+vnc: 0.482
+debug: 0.452
+permissions: 0.410
+boot: 0.279
+network: 0.158
+performance: 0.133
+other: 0.033
+KVM: 0.004
+
+Assert failure "target/i386/tcg/translate.c:748:gen_helper_out_func" when emulating Windows
+Description of problem:
+qemu crashes with:
+```
+ERROR:../target/i386/tcg/translate.c:748:gen_helper_out_func: code should not be reached
+```
+Steps to reproduce:
+1. Run the command listed above
+2. Wait a random amount of time (anywhere between 30mins to 2hours)
+3. Qemu will crash at some point
+Additional information:
+- Relevant part of the macOS crash log: [qemu-crash.txt](/uploads/5cc296fd0e8c603ba08379749a67071d/qemu-crash.txt)
diff --git a/results/classifier/108/other/2583 b/results/classifier/108/other/2583
new file mode 100644
index 00000000..3a05ffb9
--- /dev/null
+++ b/results/classifier/108/other/2583
@@ -0,0 +1,40 @@
+device: 0.863
+graphic: 0.846
+boot: 0.840
+PID: 0.827
+vnc: 0.803
+KVM: 0.759
+performance: 0.755
+socket: 0.717
+debug: 0.701
+semantic: 0.657
+network: 0.599
+permissions: 0.553
+files: 0.553
+other: 0.514
+
+libvfio-user.so.0 missing in /lib/x86_64-linux-gnu/  in fresh install of 9.1.50
+Description of problem:
+Library libvfio-user.so.0  is missing from /lib/x86_64-linux-gnu. qemu-system-x86_64 does not start due to missing library.
+
+````
+root@jpbdeb:~# ls -al /usr/local/bin/qemu-system-x86_64 
+-rwxr-xr-x 1 root root 81734576 Sep 21 21:48 /usr/local/bin/qemu-system-x86_64
+root@jpbdeb:~# ldd /usr/local/bin/qemu-system-x86_64 
+	linux-vdso.so.1 (0x00007fff511de000)
+	libvfio-user.so.0 => not found
+	libslirp.so.0 => /lib/x86_64-linux-gnu/libslirp.so.0 (0x00007f73eba33000)
+	libxenctrl.so.4.17 => /lib/x86_64-linux-gnu/libxenctrl.so.4.17 (0x00007f73eba09000)
+	libxenstore.so.4 => /lib/x86_64-linux-gnu/libxenstore.so.4 (0x00007f73eb9fe000)
+	libxenforeignmemory.so.1 => /lib/x86_64-linux-gnu/libxenforeignmemory.so.1 (0x00007f73eb9f9000)
+        ...
+````
+Steps to reproduce:
+1. Fresh OS install, including all packages necessary to build from source.
+2. Download source from gitlab and proceed with documented build instructions.
+3. make install
+4. Attempt to run /usr/local/bin/qemu-system-x86_64  fails, due to missing library.
+Additional information:
+Adding the link to the library that exists in /usr/lib/x86_64-linux-gnu  resolves the issue:
+
+(as root) ln -s /usr/local/lib/x86_64-linux-gnu/libvfio-user.so.0  /lib/x86_64-linux-gnu/libvfio-user.so.0
diff --git a/results/classifier/108/other/25842545 b/results/classifier/108/other/25842545
new file mode 100644
index 00000000..103fe372
--- /dev/null
+++ b/results/classifier/108/other/25842545
@@ -0,0 +1,212 @@
+other: 0.912
+KVM: 0.867
+vnc: 0.862
+device: 0.847
+debug: 0.836
+performance: 0.831
+semantic: 0.829
+PID: 0.829
+boot: 0.824
+graphic: 0.822
+permissions: 0.817
+socket: 0.808
+files: 0.806
+network: 0.796
+
+[Qemu-devel] [Bug?] Guest pause because VMPTRLD failed in KVM
+
+Hello,
+
+  We encountered a problem that a guest paused because the KMOD report VMPTRLD 
+failed.
+
+The related information is as follows:
+
+1) Qemu command:
+   /usr/bin/qemu-kvm -name omu1 -S -machine pc-i440fx-2.3,accel=kvm,usb=off -cpu
+host -m 15625 -realtime mlock=off -smp 8,sockets=1,cores=8,threads=1 -uuid
+a2aacfff-6583-48b4-b6a4-e6830e519931 -no-user-config -nodefaults -chardev
+socket,id=charmonitor,path=/var/lib/libvirt/qemu/omu1.monitor,server,nowait
+-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
+-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
+virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
+file=/home/env/guest1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native
+  -device
+virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0
+  -drive
+file=/home/env/guest_300G.img,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
+  -device
+virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1
+  -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device
+virtio-net-pci,netdev=hostnet0,id=net0,mac=00:00:80:05:00:00,bus=pci.0,addr=0x3
+-netdev tap,fd=27,id=hostnet1,vhost=on,vhostfd=28 -device
+virtio-net-pci,netdev=hostnet1,id=net1,mac=00:00:80:05:00:01,bus=pci.0,addr=0x4
+-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
+-device usb-tablet,id=input0 -vnc 0.0.0.0:0 -device
+cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device
+virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
+
+   2) Qemu log:
+   KVM: entry failed, hardware error 0x4
+   RAX=00000000ffffffed RBX=ffff8803fa2d7fd8 RCX=0100000000000000
+RDX=0000000000000000
+   RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8803fa2d7e90
+RSP=ffff8803fa2efe90
+   R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000
+R11=000000000000b69a
+   R12=0000000000000001 R13=ffffffff81a25b40 R14=0000000000000000
+R15=ffff8803fa2d7fd8
+   RIP=ffffffff81053e16 RFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
+   ES =0000 0000000000000000 ffffffff 00c00000
+   CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
+   SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
+   DS =0000 0000000000000000 ffffffff 00c00000
+   FS =0000 0000000000000000 ffffffff 00c00000
+   GS =0000 ffff88040f540000 ffffffff 00c00000
+   LDT=0000 0000000000000000 ffffffff 00c00000
+   TR =0040 ffff88040f550a40 00002087 00008b00 DPL=0 TSS64-busy
+   GDT=     ffff88040f549000 0000007f
+   IDT=     ffffffffff529000 00000fff
+   CR0=80050033 CR2=00007f81ca0c5000 CR3=00000003f5081000 CR4=000407e0
+   DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
+DR3=0000000000000000
+   DR6=00000000ffff0ff0 DR7=0000000000000400
+   EFER=0000000000000d01
+   Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ??
+?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
+
+   3) Demsg
+   [347315.028339] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
+   klogd 1.4.1, ---------- state change ----------
+   [347315.039506] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
+   [347315.051728] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed
+   [347315.057472] vmwrite error: reg 6c0a value ffff88307e66e480 (err
+2120672384)
+   [347315.064567] Pid: 69523, comm: qemu-kvm Tainted: GF           X
+3.0.93-0.8-default #1
+   [347315.064569] Call Trace:
+   [347315.064587]  [<ffffffff810049d5>] dump_trace+0x75/0x300
+   [347315.064595]  [<ffffffff8145e3e3>] dump_stack+0x69/0x6f
+   [347315.064617]  [<ffffffffa03738de>] vmx_vcpu_load+0x11e/0x1d0 [kvm_intel]
+   [347315.064647]  [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm]
+   [347315.064669]  [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0
+   [347315.064676]  [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7
+   [347315.064687]  [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+   [347315.064703]  [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm]
+   [347315.064732]  [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 
+[kvm]
+   [347315.064759]  [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+   [347315.064771]  [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0
+   [347315.064776]  [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0
+   [347315.064783]  [<ffffffff81469272>] system_call_fastpath+0x16/0x1b
+   [347315.064797]  [<00007fee51969ce7>] 0x7fee51969ce6
+   [347315.064799] vmwrite error: reg 6c0c value ffff88307e664000 (err
+2120630272)
+   [347315.064802] Pid: 69523, comm: qemu-kvm Tainted: GF           X
+3.0.93-0.8-default #1
+   [347315.064803] Call Trace:
+   [347315.064807]  [<ffffffff810049d5>] dump_trace+0x75/0x300
+   [347315.064811]  [<ffffffff8145e3e3>] dump_stack+0x69/0x6f
+   [347315.064817]  [<ffffffffa03738ec>] vmx_vcpu_load+0x12c/0x1d0 [kvm_intel]
+   [347315.064832]  [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm]
+   [347315.064851]  [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0
+   [347315.064855]  [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7
+   [347315.064865]  [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+   [347315.064880]  [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm]
+   [347315.064907]  [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 
+[kvm]
+   [347315.064933]  [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+   [347315.064943]  [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0
+   [347315.064947]  [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0
+   [347315.064951]  [<ffffffff81469272>] system_call_fastpath+0x16/0x1b
+   [347315.064957]  [<00007fee51969ce7>] 0x7fee51969ce6
+   [347315.064959] vmwrite error: reg 6c10 value 0 (err 0)
+
+   4) The isssue can't be reporduced. I search the Intel VMX sepc about reaseons
+of vmptrld failure:
+   The instruction fails if its operand is not properly aligned, sets
+unsupported physical-address bits, or is equal to the VMXON
+   pointer. In addition, the instruction fails if the 32 bits in memory
+referenced by the operand do not match the VMCS
+   revision identifier supported by this processor.
+
+   But I can't find any cues from the KVM source code. It seems each
+   error conditions is impossible in theory. :(
+
+Any suggestions will be appreciated! Paolo?
+
+-- 
+Regards,
+-Gonglei
+
+On 10/11/2016 15:10, gong lei wrote:
+>
+4) The isssue can't be reporduced. I search the Intel VMX sepc about
+>
+reaseons
+>
+of vmptrld failure:
+>
+The instruction fails if its operand is not properly aligned, sets
+>
+unsupported physical-address bits, or is equal to the VMXON
+>
+pointer. In addition, the instruction fails if the 32 bits in memory
+>
+referenced by the operand do not match the VMCS
+>
+revision identifier supported by this processor.
+>
+>
+But I can't find any cues from the KVM source code. It seems each
+>
+error conditions is impossible in theory. :(
+Yes, it should not happen. :(
+
+If it's not reproducible, it's really hard to say what it was, except a
+random memory corruption elsewhere or even a bit flip (!).
+
+Paolo
+
+On 2016/11/17 20:39, Paolo Bonzini wrote:
+>
+>
+On 10/11/2016 15:10, gong lei wrote:
+>
+>     4) The isssue can't be reporduced. I search the Intel VMX sepc about
+>
+> reaseons
+>
+> of vmptrld failure:
+>
+>     The instruction fails if its operand is not properly aligned, sets
+>
+> unsupported physical-address bits, or is equal to the VMXON
+>
+>     pointer. In addition, the instruction fails if the 32 bits in memory
+>
+> referenced by the operand do not match the VMCS
+>
+>     revision identifier supported by this processor.
+>
+>
+>
+>     But I can't find any cues from the KVM source code. It seems each
+>
+>     error conditions is impossible in theory. :(
+>
+Yes, it should not happen. :(
+>
+>
+If it's not reproducible, it's really hard to say what it was, except a
+>
+random memory corruption elsewhere or even a bit flip (!).
+>
+>
+Paolo
+Thanks for your reply, Paolo :)
+
+-- 
+Regards,
+-Gonglei
+
diff --git a/results/classifier/108/other/2585 b/results/classifier/108/other/2585
new file mode 100644
index 00000000..d114502e
--- /dev/null
+++ b/results/classifier/108/other/2585
@@ -0,0 +1,22 @@
+debug: 0.881
+boot: 0.828
+device: 0.822
+other: 0.817
+network: 0.725
+socket: 0.614
+graphic: 0.603
+semantic: 0.548
+performance: 0.528
+PID: 0.516
+vnc: 0.504
+files: 0.246
+KVM: 0.196
+permissions: 0.160
+
+qemu-system-arm highmem support broken with TCG
+Additional information:
+I initially bisected this to commit 39a1fd25287f ("target/arm: Fix handling of LPAE block descriptors"), which introduced an identical bug by masking the wrong address bits due to a type mismatch, but this was in turn fixed by commit c2360eaa0262 ("target/arm: Fix qemu-system-arm handling of LPAE block descriptors for highmem"). The bug resurfaced between qemu-7.1.0 and qemu-7.2.0 after commit f3639a64f602 ("target/arm: Use softmmu tlbs for page table walking"), but may be caused by the preceding 4a35855682ce ("target/arm: Plumb debug into S1Translate") which fails to boot for an unrelated reason.
+
+I reproduced this on qemu-7.2 as shipped by Debian as well as on qemu-9.1 (built locally).
+
+Part of this problem appeared to be hidden by the 'highmem=on' argument not having the intended effect during parts of the bisection, which I worked around by overriding the 'pa_bits' variable in machvirt_init().
diff --git a/results/classifier/108/other/2586 b/results/classifier/108/other/2586
new file mode 100644
index 00000000..51a9a8b7
--- /dev/null
+++ b/results/classifier/108/other/2586
@@ -0,0 +1,18 @@
+device: 0.705
+semantic: 0.637
+graphic: 0.503
+other: 0.353
+performance: 0.202
+debug: 0.197
+socket: 0.168
+files: 0.150
+network: 0.148
+vnc: 0.140
+boot: 0.115
+PID: 0.082
+permissions: 0.060
+KVM: 0.005
+
+qemu-system-x86_64: IGD "legacy mode" support with Q35?
+Additional information:
+Detailed discussion on https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12103
diff --git a/results/classifier/108/other/2587 b/results/classifier/108/other/2587
new file mode 100644
index 00000000..4daa788c
--- /dev/null
+++ b/results/classifier/108/other/2587
@@ -0,0 +1,16 @@
+device: 0.711
+network: 0.383
+semantic: 0.344
+performance: 0.323
+other: 0.179
+graphic: 0.178
+socket: 0.173
+boot: 0.163
+vnc: 0.081
+files: 0.068
+PID: 0.052
+debug: 0.037
+permissions: 0.007
+KVM: 0.001
+
+Avoid using error_setg(&error_fatal, ...)  in the QEMU sources
diff --git a/results/classifier/108/other/2589 b/results/classifier/108/other/2589
new file mode 100644
index 00000000..0bdae345
--- /dev/null
+++ b/results/classifier/108/other/2589
@@ -0,0 +1,71 @@
+performance: 0.761
+KVM: 0.756
+graphic: 0.753
+other: 0.747
+permissions: 0.740
+boot: 0.719
+semantic: 0.712
+device: 0.711
+PID: 0.706
+debug: 0.695
+files: 0.689
+vnc: 0.641
+network: 0.606
+socket: 0.513
+
+Support guest shutdown of Alpine Linux in guest agent
+Description of problem:
+The qemu-guest-agent's shutdown calls `/sbin/shutdown` with the apropriate flags to shut down a posix system. On Alpine Linux, which is based on busybox, there is no `/sbin/shutdown`, instead there are `/sbin/poweroff`, `/sbin/halt` and `/sbin/reboot`. We have used a downstream patch for years that will exec those as a fallback in case execing `/sbin/shutdown` fails.
+
+With qemu 9.2 this patch no longer applies and it is probably time to solve this properly in upstream qemu.
+
+The question is how?
+
+Some options:
+
+- Set the powerdown, halt and reboot commands via build time configure option
+- Add a fallback if the `execlp` fails (similar to what downstream Alpine's patch does now). We could for example give `ga_run_command` a `const char **argv[]`, and try `execvp` all of them before erroring out.
+- Test the existence of `/sbin/shutdown` before calling `ga_run_command`.
+- Do nothing. Let downstream Alpine Linux handle it.
+Steps to reproduce:
+1. Build qemu-guest-agent for Alpine Linux
+2. boot a Alpine linux VM and install the qemu-guest-agent
+3. Try shutdown the VM via qmp command.
+Additional information:
+The patch that we previously used that no longer applies:
+```diff
+diff --git a/qga/commands-posix.c b/qga/commands-posix.c
+index 954efed01..61427652c 100644
+--- a/qga/commands-posix.c
++++ b/qga/commands-posix.c
+@@ -84,6 +84,7 @@ static void ga_wait_child(pid_t pid, int *status, Error **errp)
+ void qmp_guest_shutdown(bool has_mode, const char *mode, Error **errp)
+ {
+     const char *shutdown_flag;
++    const char *fallback_cmd = NULL;
+     Error *local_err = NULL;
+     pid_t pid;
+     int status;
+@@ -101,10 +102,13 @@ void qmp_guest_shutdown(bool has_mode, const char *mode, Error **errp)
+     slog("guest-shutdown called, mode: %s", mode);
+     if (!has_mode || strcmp(mode, "powerdown") == 0) {
+         shutdown_flag = powerdown_flag;
++        fallback_cmd = "/sbin/poweroff";
+     } else if (strcmp(mode, "halt") == 0) {
+         shutdown_flag = halt_flag;
++        fallback_cmd = "/sbin/halt";
+     } else if (strcmp(mode, "reboot") == 0) {
+         shutdown_flag = reboot_flag;
++        fallback_cmd = "/sbin/reboot";
+     } else {
+         error_setg(errp,
+                    "mode is invalid (valid values are: halt|powerdown|reboot");
+@@ -125,6 +129,7 @@ void qmp_guest_shutdown(bool has_mode, const char *mode, Error **errp)
+ #else
+         execl("/sbin/shutdown", "shutdown", "-h", shutdown_flag, "+0",
+                "hypervisor initiated shutdown", (char *)NULL);
++        execle(fallback_cmd, fallback_cmd, (char*)NULL, environ);
+ #endif
+         _exit(EXIT_FAILURE);
+     } else if (pid < 0) {
+```
diff --git a/results/classifier/108/other/25892827 b/results/classifier/108/other/25892827
new file mode 100644
index 00000000..bccf4d81
--- /dev/null
+++ b/results/classifier/108/other/25892827
@@ -0,0 +1,1087 @@
+other: 0.892
+permissions: 0.881
+KVM: 0.872
+debug: 0.868
+vnc: 0.846
+boot: 0.839
+network: 0.839
+device: 0.839
+graphic: 0.832
+semantic: 0.825
+socket: 0.822
+performance: 0.819
+files: 0.804
+PID: 0.792
+
+[Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot
+
+Hi,
+
+Recently we encountered a problem in our project: 2 CPUs in VM are not brought 
+up normally after reboot.
+
+Our host is using KVM kmod 3.6 and QEMU 2.1.
+A SLES 11 sp3 VM configured with 8 vcpus,
+cpu model is configured with 'host-passthrough'.
+
+After VM's first time started up, everything seems to be OK.
+and then VM is paniced and rebooted.
+After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not online.
+
+This is the only message we can get from VM:
+VM dmesg shows:
+[    0.069867] Booting Node   0, Processors  #1
+[    5.060042] CPU1: Stuck ??
+[    5.060499]  #2
+[    5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock
+[    5.088335] KVM setup async PF for cpu 2
+[    5.092967] NMI watchdog enabled, takes one hw-pmu counter.
+[    5.094405]  #3
+[    5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock
+[    5.108333] KVM setup async PF for cpu 3
+[    5.113553] NMI watchdog enabled, takes one hw-pmu counter.
+[    5.114970]  #4
+[    5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock
+[    5.128336] KVM setup async PF for cpu 4
+[    5.134576] NMI watchdog enabled, takes one hw-pmu counter.
+[    5.135998]  #5
+[    5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock
+[    5.152334] KVM setup async PF for cpu 5
+[    5.154764] NMI watchdog enabled, takes one hw-pmu counter.
+[    5.156467]  #6
+[    5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock
+[    5.172341] KVM setup async PF for cpu 6
+[    5.180738] NMI watchdog enabled, takes one hw-pmu counter.
+[    5.182173]  #7 Ok.
+[   10.170815] CPU7: Stuck ??
+[   10.171648] Brought up 6 CPUs
+[   10.172394] Total of 6 processors activated (28799.97 BogoMIPS).
+
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not consuming 
+any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck' warning,
+and found that the only possible is the emulation of 'cpuid' instruct in 
+kvm/qemu has something wrong.
+But since we canât reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+
+Has anyone come across these problem before? Or any idea?
+
+Thanks,
+zhanghailiang
+
+On 06/07/2015 09:54, zhanghailiang wrote:
+>
+>
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+>
+consuming any cpu (Should be in idle state),
+>
+All of VCPUs' stacks in host is like bellow:
+>
+>
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+>
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+>
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+>
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+>
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+>
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+>
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+>
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+>
+[<ffffffffffffffff>] 0xffffffffffffffff
+>
+>
+We looked into the kernel codes that could leading to the above 'Stuck'
+>
+warning,
+>
+and found that the only possible is the emulation of 'cpuid' instruct in
+>
+kvm/qemu has something wrong.
+>
+But since we canât reproduce this problem, we are not quite sure.
+>
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation?  What do the
+traces say about vcpus 1 and 7?
+
+Paolo
+
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+On 06/07/2015 09:54, zhanghailiang wrote:
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+consuming any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck'
+warning,
+and found that the only possible is the emulation of 'cpuid' instruct in
+kvm/qemu has something wrong.
+But since we canât reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation?  What do the
+traces say about vcpus 1 and 7?
+OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
+located in
+do_boot_cpu(). It's in BSP context, the call process is:
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
+-> wakeup_secondary_via_INIT() to trigger APs.
+It will wait 5s for APs to startup, if some AP not startup normally, it will 
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
+begins to execute the code
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
+smp_callin()(smpboot.c).
+The follow is the starup process of BSP and AP.
+BSP:
+start_kernel()
+  ->smp_init()
+     ->smp_boot_cpus()
+       ->do_boot_cpu()
+           ->start_ip = trampoline_address(); //set the address that AP will go 
+to execute
+           ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+           ->for (timeout = 0; timeout < 50000; timeout++)
+               if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if AP 
+startup or not
+
+APs:
+ENTRY(trampoline_data) (trampoline_64.S)
+      ->ENTRY(secondary_startup_64) (head_64.S)
+         ->start_secondary() (smpboot.c)
+            ->cpu_init();
+            ->smp_callin();
+                ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP comes 
+here, the BSP will not prints the error message.
+
+From above call process, we can be sure that, the AP has been stuck between 
+trampoline_data and the cpumask_set_cpu() in
+smp_callin(), we look through these codes path carefully, and only found a 
+'hlt' instruct that could block the process.
+It is located in trampoline_data():
+
+ENTRY(trampoline_data)
+        ...
+
+        call    verify_cpu              # Verify the cpu supports long mode
+        testl   %eax, %eax              # Check for return code
+        jnz     no_longmode
+
+        ...
+
+no_longmode:
+        hlt
+        jmp no_longmode
+
+For the process verify_cpu(),
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
+No-root mode.
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
+the fail in verify_cpu.
+
+From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+[    5.060042] CPU1: Stuck ??
+[   10.170815] CPU7: Stuck ??
+[   10.171648] Brought up 6 CPUs
+
+Besides, the follow is the cpus message got from host.
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
+instance-0000000
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+  CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+  CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+  CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+  CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+  CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+  CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+  CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+
+Oh, i also forgot to mention in the above message that, we have bond each vCPU 
+to different physical CPU in
+host.
+
+Thanks,
+zhanghailiang
+
+On 06/07/2015 11:59, zhanghailiang wrote:
+>
+>
+>
+Besides, the follow is the cpus message got from host.
+>
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
+>
+qemu-monitor-command instance-0000000
+>
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+>
+CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+>
+CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+>
+CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+>
+CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+>
+CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+>
+CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+>
+CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+>
+>
+Oh, i also forgot to mention in the above message that, we have bond
+>
+each vCPU to different physical CPU in
+>
+host.
+Can you capture a trace on the host (trace-cmd record -e kvm) and send
+it privately?  Please note which CPUs get stuck, since I guess it's not
+always 1 and 7.
+
+Paolo
+
+On Mon, 6 Jul 2015 17:59:10 +0800
+zhanghailiang <address@hidden> wrote:
+
+>
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+>
+>
+>
+>
+>
+> On 06/07/2015 09:54, zhanghailiang wrote:
+>
+>>
+>
+>>  From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+>
+>> consuming any cpu (Should be in idle state),
+>
+>> All of VCPUs' stacks in host is like bellow:
+>
+>>
+>
+>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+>
+>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+>
+>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+>
+>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+>
+>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+>
+>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+>
+>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+>
+>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+>
+>> [<ffffffffffffffff>] 0xffffffffffffffff
+>
+>>
+>
+>> We looked into the kernel codes that could leading to the above 'Stuck'
+>
+>> warning,
+in current upstream there isn't any printk(...Stuck...) left since that code 
+path
+has been reworked.
+I've often seen this on over-committed host during guest CPUs up/down torture 
+test.
+Could you update guest kernel to upstream and see if issue reproduces?
+
+>
+>> and found that the only possible is the emulation of 'cpuid' instruct in
+>
+>> kvm/qemu has something wrong.
+>
+>> But since we canât reproduce this problem, we are not quite sure.
+>
+>> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+>
+>
+>
+> Can you explain the relationship to the cpuid emulation?  What do the
+>
+> traces say about vcpus 1 and 7?
+>
+>
+OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is
+>
+located in
+>
+do_boot_cpu(). It's in BSP context, the call process is:
+>
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu()
+>
+-> wakeup_secondary_via_INIT() to trigger APs.
+>
+It will wait 5s for APs to startup, if some AP not startup normally, it will
+>
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+>
+>
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and
+>
+begins to execute the code
+>
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places
+>
+before smp_callin()(smpboot.c).
+>
+The follow is the starup process of BSP and AP.
+>
+BSP:
+>
+start_kernel()
+>
+->smp_init()
+>
+->smp_boot_cpus()
+>
+->do_boot_cpu()
+>
+->start_ip = trampoline_address(); //set the address that AP will
+>
+go to execute
+>
+->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+>
+->for (timeout = 0; timeout < 50000; timeout++)
+>
+if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if
+>
+AP startup or not
+>
+>
+APs:
+>
+ENTRY(trampoline_data) (trampoline_64.S)
+>
+->ENTRY(secondary_startup_64) (head_64.S)
+>
+->start_secondary() (smpboot.c)
+>
+->cpu_init();
+>
+->smp_callin();
+>
+->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
+>
+comes here, the BSP will not prints the error message.
+>
+>
+From above call process, we can be sure that, the AP has been stuck between
+>
+trampoline_data and the cpumask_set_cpu() in
+>
+smp_callin(), we look through these codes path carefully, and only found a
+>
+'hlt' instruct that could block the process.
+>
+It is located in trampoline_data():
+>
+>
+ENTRY(trampoline_data)
+>
+...
+>
+>
+call    verify_cpu              # Verify the cpu supports long mode
+>
+testl   %eax, %eax              # Check for return code
+>
+jnz     no_longmode
+>
+>
+...
+>
+>
+no_longmode:
+>
+hlt
+>
+jmp no_longmode
+>
+>
+For the process verify_cpu(),
+>
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from
+>
+No-root mode.
+>
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to
+>
+the fail in verify_cpu.
+>
+>
+From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+>
+[    5.060042] CPU1: Stuck ??
+>
+[   10.170815] CPU7: Stuck ??
+>
+[   10.171648] Brought up 6 CPUs
+>
+>
+Besides, the follow is the cpus message got from host.
+>
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
+>
+qemu-monitor-command instance-0000000
+>
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+>
+CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+>
+CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+>
+CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+>
+CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+>
+CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+>
+CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+>
+CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+>
+>
+Oh, i also forgot to mention in the above message that, we have bond each
+>
+vCPU to different physical CPU in
+>
+host.
+>
+>
+Thanks,
+>
+zhanghailiang
+>
+>
+>
+>
+>
+--
+>
+To unsubscribe from this list: send the line "unsubscribe kvm" in
+>
+the body of a message to address@hidden
+>
+More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+
+On 2015/7/7 19:23, Igor Mammedov wrote:
+On Mon, 6 Jul 2015 17:59:10 +0800
+zhanghailiang <address@hidden> wrote:
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+On 06/07/2015 09:54, zhanghailiang wrote:
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+consuming any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck'
+warning,
+in current upstream there isn't any printk(...Stuck...) left since that code 
+path
+has been reworked.
+I've often seen this on over-committed host during guest CPUs up/down torture 
+test.
+Could you update guest kernel to upstream and see if issue reproduces?
+Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to 
+reproduce it.
+
+For your test case, is it a kernel bug?
+Or is there any related patch could solve your test problem been merged into
+upstream ?
+
+Thanks,
+zhanghailiang
+and found that the only possible is the emulation of 'cpuid' instruct in
+kvm/qemu has something wrong.
+But since we canât reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation?  What do the
+traces say about vcpus 1 and 7?
+OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
+located in
+do_boot_cpu(). It's in BSP context, the call process is:
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
+-> wakeup_secondary_via_INIT() to trigger APs.
+It will wait 5s for APs to startup, if some AP not startup normally, it will 
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
+begins to execute the code
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
+smp_callin()(smpboot.c).
+The follow is the starup process of BSP and AP.
+BSP:
+start_kernel()
+    ->smp_init()
+       ->smp_boot_cpus()
+         ->do_boot_cpu()
+             ->start_ip = trampoline_address(); //set the address that AP will 
+go to execute
+             ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+             ->for (timeout = 0; timeout < 50000; timeout++)
+                 if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if 
+AP startup or not
+
+APs:
+ENTRY(trampoline_data) (trampoline_64.S)
+        ->ENTRY(secondary_startup_64) (head_64.S)
+           ->start_secondary() (smpboot.c)
+              ->cpu_init();
+              ->smp_callin();
+                  ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP 
+comes here, the BSP will not prints the error message.
+
+  From above call process, we can be sure that, the AP has been stuck between 
+trampoline_data and the cpumask_set_cpu() in
+smp_callin(), we look through these codes path carefully, and only found a 
+'hlt' instruct that could block the process.
+It is located in trampoline_data():
+
+ENTRY(trampoline_data)
+          ...
+
+        call    verify_cpu              # Verify the cpu supports long mode
+        testl   %eax, %eax              # Check for return code
+        jnz     no_longmode
+
+          ...
+
+no_longmode:
+        hlt
+        jmp no_longmode
+
+For the process verify_cpu(),
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
+No-root mode.
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
+the fail in verify_cpu.
+
+  From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+[    5.060042] CPU1: Stuck ??
+[   10.170815] CPU7: Stuck ??
+[   10.171648] Brought up 6 CPUs
+
+Besides, the follow is the cpus message got from host.
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
+instance-0000000
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+    CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+    CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+    CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+    CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+    CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+    CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+    CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+
+Oh, i also forgot to mention in the above message that, we have bond each vCPU 
+to different physical CPU in
+host.
+
+Thanks,
+zhanghailiang
+
+
+
+
+--
+To unsubscribe from this list: send the line "unsubscribe kvm" in
+the body of a message to address@hidden
+More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+.
+
+On Tue, 7 Jul 2015 19:43:35 +0800
+zhanghailiang <address@hidden> wrote:
+
+>
+On 2015/7/7 19:23, Igor Mammedov wrote:
+>
+> On Mon, 6 Jul 2015 17:59:10 +0800
+>
+> zhanghailiang <address@hidden> wrote:
+>
+>
+>
+>> On 2015/7/6 16:45, Paolo Bonzini wrote:
+>
+>>>
+>
+>>>
+>
+>>> On 06/07/2015 09:54, zhanghailiang wrote:
+>
+>>>>
+>
+>>>>   From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+>
+>>>> consuming any cpu (Should be in idle state),
+>
+>>>> All of VCPUs' stacks in host is like bellow:
+>
+>>>>
+>
+>>>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+>
+>>>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+>
+>>>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+>
+>>>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+>
+>>>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+>
+>>>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+>
+>>>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+>
+>>>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+>
+>>>> [<ffffffffffffffff>] 0xffffffffffffffff
+>
+>>>>
+>
+>>>> We looked into the kernel codes that could leading to the above 'Stuck'
+>
+>>>> warning,
+>
+> in current upstream there isn't any printk(...Stuck...) left since that
+>
+> code path
+>
+> has been reworked.
+>
+> I've often seen this on over-committed host during guest CPUs up/down
+>
+> torture test.
+>
+> Could you update guest kernel to upstream and see if issue reproduces?
+>
+>
+>
+>
+Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to
+>
+reproduce it.
+>
+>
+For your test case, is it a kernel bug?
+>
+Or is there any related patch could solve your test problem been merged into
+>
+upstream ?
+I don't remember all prerequisite patches but you should be able to find
+http://marc.info/?l=linux-kernel&m=140326703108009&w=2
+"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
+and then look for dependencies.
+
+
+>
+>
+Thanks,
+>
+zhanghailiang
+>
+>
+>>>> and found that the only possible is the emulation of 'cpuid' instruct in
+>
+>>>> kvm/qemu has something wrong.
+>
+>>>> But since we canât reproduce this problem, we are not quite sure.
+>
+>>>> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+>
+>>>
+>
+>>> Can you explain the relationship to the cpuid emulation?  What do the
+>
+>>> traces say about vcpus 1 and 7?
+>
+>>
+>
+>> OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is
+>
+>> located in
+>
+>> do_boot_cpu(). It's in BSP context, the call process is:
+>
+>> BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() ->
+>
+>> do_boot_cpu() -> wakeup_secondary_via_INIT() to trigger APs.
+>
+>> It will wait 5s for APs to startup, if some AP not startup normally, it
+>
+>> will print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+>
+>>
+>
+>> If it prints 'Stuck', it means the AP has received the SIPI interrupt and
+>
+>> begins to execute the code
+>
+>> 'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places
+>
+>> before smp_callin()(smpboot.c).
+>
+>> The follow is the starup process of BSP and AP.
+>
+>> BSP:
+>
+>> start_kernel()
+>
+>>     ->smp_init()
+>
+>>        ->smp_boot_cpus()
+>
+>>          ->do_boot_cpu()
+>
+>>              ->start_ip = trampoline_address(); //set the address that AP
+>
+>> will go to execute
+>
+>>              ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+>
+>>              ->for (timeout = 0; timeout < 50000; timeout++)
+>
+>>                  if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;//
+>
+>> check if AP startup or not
+>
+>>
+>
+>> APs:
+>
+>> ENTRY(trampoline_data) (trampoline_64.S)
+>
+>>         ->ENTRY(secondary_startup_64) (head_64.S)
+>
+>>            ->start_secondary() (smpboot.c)
+>
+>>               ->cpu_init();
+>
+>>               ->smp_callin();
+>
+>>                   ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
+>
+>> comes here, the BSP will not prints the error message.
+>
+>>
+>
+>>   From above call process, we can be sure that, the AP has been stuck
+>
+>> between trampoline_data and the cpumask_set_cpu() in
+>
+>> smp_callin(), we look through these codes path carefully, and only found a
+>
+>> 'hlt' instruct that could block the process.
+>
+>> It is located in trampoline_data():
+>
+>>
+>
+>> ENTRY(trampoline_data)
+>
+>>           ...
+>
+>>
+>
+>>    call    verify_cpu              # Verify the cpu supports long mode
+>
+>>    testl   %eax, %eax              # Check for return code
+>
+>>    jnz     no_longmode
+>
+>>
+>
+>>           ...
+>
+>>
+>
+>> no_longmode:
+>
+>>    hlt
+>
+>>    jmp no_longmode
+>
+>>
+>
+>> For the process verify_cpu(),
+>
+>> we can only find the 'cpuid' sensitive instruct that could lead VM exit
+>
+>> from No-root mode.
+>
+>> This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading
+>
+>> to the fail in verify_cpu.
+>
+>>
+>
+>>   From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+>
+>> [    5.060042] CPU1: Stuck ??
+>
+>> [   10.170815] CPU7: Stuck ??
+>
+>> [   10.171648] Brought up 6 CPUs
+>
+>>
+>
+>> Besides, the follow is the cpus message got from host.
+>
+>> 80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
+>
+>> qemu-monitor-command instance-0000000
+>
+>> * CPU #0: pc=0x00007f64160c683d thread_id=68570
+>
+>>     CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+>
+>>     CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+>
+>>     CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+>
+>>     CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+>
+>>     CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+>
+>>     CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+>
+>>     CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+>
+>>
+>
+>> Oh, i also forgot to mention in the above message that, we have bond each
+>
+>> vCPU to different physical CPU in
+>
+>> host.
+>
+>>
+>
+>> Thanks,
+>
+>> zhanghailiang
+>
+>>
+>
+>>
+>
+>>
+>
+>>
+>
+>> --
+>
+>> To unsubscribe from this list: send the line "unsubscribe kvm" in
+>
+>> the body of a message to address@hidden
+>
+>> More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+>
+>
+>
+>
+>
+> .
+>
+>
+>
+>
+>
+
+On 2015/7/7 20:21, Igor Mammedov wrote:
+On Tue, 7 Jul 2015 19:43:35 +0800
+zhanghailiang <address@hidden> wrote:
+On 2015/7/7 19:23, Igor Mammedov wrote:
+On Mon, 6 Jul 2015 17:59:10 +0800
+zhanghailiang <address@hidden> wrote:
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+On 06/07/2015 09:54, zhanghailiang wrote:
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+consuming any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck'
+warning,
+in current upstream there isn't any printk(...Stuck...) left since that code 
+path
+has been reworked.
+I've often seen this on over-committed host during guest CPUs up/down torture 
+test.
+Could you update guest kernel to upstream and see if issue reproduces?
+Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to 
+reproduce it.
+
+For your test case, is it a kernel bug?
+Or is there any related patch could solve your test problem been merged into
+upstream ?
+I don't remember all prerequisite patches but you should be able to find
+http://marc.info/?l=linux-kernel&m=140326703108009&w=2
+"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
+and then look for dependencies.
+Er, we have investigated this patch, and it is not related to our problem, :)
+
+Thanks.
+Thanks,
+zhanghailiang
+and found that the only possible is the emulation of 'cpuid' instruct in
+kvm/qemu has something wrong.
+But since we canât reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation?  What do the
+traces say about vcpus 1 and 7?
+OK, we searched the VM's kernel codes with the 'Stuck' message, and  it is 
+located in
+do_boot_cpu(). It's in BSP context, the call process is:
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() 
+-> wakeup_secondary_via_INIT() to trigger APs.
+It will wait 5s for APs to startup, if some AP not startup normally, it will 
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and 
+begins to execute the code
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before 
+smp_callin()(smpboot.c).
+The follow is the starup process of BSP and AP.
+BSP:
+start_kernel()
+     ->smp_init()
+        ->smp_boot_cpus()
+          ->do_boot_cpu()
+              ->start_ip = trampoline_address(); //set the address that AP will 
+go to execute
+              ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+              ->for (timeout = 0; timeout < 50000; timeout++)
+                  if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if 
+AP startup or not
+
+APs:
+ENTRY(trampoline_data) (trampoline_64.S)
+         ->ENTRY(secondary_startup_64) (head_64.S)
+            ->start_secondary() (smpboot.c)
+               ->cpu_init();
+               ->smp_callin();
+                   ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP 
+comes here, the BSP will not prints the error message.
+
+   From above call process, we can be sure that, the AP has been stuck between 
+trampoline_data and the cpumask_set_cpu() in
+smp_callin(), we look through these codes path carefully, and only found a 
+'hlt' instruct that could block the process.
+It is located in trampoline_data():
+
+ENTRY(trampoline_data)
+           ...
+
+        call    verify_cpu              # Verify the cpu supports long mode
+        testl   %eax, %eax              # Check for return code
+        jnz     no_longmode
+
+           ...
+
+no_longmode:
+        hlt
+        jmp no_longmode
+
+For the process verify_cpu(),
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from 
+No-root mode.
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to 
+the fail in verify_cpu.
+
+   From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+[    5.060042] CPU1: Stuck ??
+[   10.170815] CPU7: Stuck ??
+[   10.171648] Brought up 6 CPUs
+
+Besides, the follow is the cpus message got from host.
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command 
+instance-0000000
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+     CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+     CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+     CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+     CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+     CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+     CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+     CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+
+Oh, i also forgot to mention in the above message that, we have bond each vCPU 
+to different physical CPU in
+host.
+
+Thanks,
+zhanghailiang
+
+
+
+
+--
+To unsubscribe from this list: send the line "unsubscribe kvm" in
+the body of a message to address@hidden
+More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+.
+.
+