summaryrefslogtreecommitdiffstats
path: root/results/classifier/017/unknown/25892827
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--results/classifier/017/unknown/258928271136
1 files changed, 1136 insertions, 0 deletions
diff --git a/results/classifier/017/unknown/25892827 b/results/classifier/017/unknown/25892827
new file mode 100644
index 00000000..52545cdf
--- /dev/null
+++ b/results/classifier/017/unknown/25892827
@@ -0,0 +1,1136 @@
+risc-v: 0.908
+user-level: 0.889
+permissions: 0.881
+register: 0.876
+KVM: 0.872
+hypervisor: 0.871
+operating system: 0.871
+debug: 0.868
+x86: 0.849
+vnc: 0.846
+mistranslation: 0.842
+boot: 0.839
+network: 0.839
+VMM: 0.839
+device: 0.839
+TCG: 0.837
+virtual: 0.835
+i386: 0.835
+peripherals: 0.833
+graphic: 0.832
+assembly: 0.829
+architecture: 0.825
+semantic: 0.825
+ppc: 0.824
+socket: 0.822
+arm: 0.821
+performance: 0.819
+alpha: 0.816
+kernel: 0.810
+files: 0.804
+PID: 0.792
+--------------------
+virtual: 0.955
+KVM: 0.881
+x86: 0.715
+debug: 0.617
+hypervisor: 0.546
+PID: 0.132
+register: 0.102
+performance: 0.058
+files: 0.056
+operating system: 0.051
+kernel: 0.049
+boot: 0.025
+assembly: 0.017
+device: 0.014
+socket: 0.013
+semantic: 0.009
+user-level: 0.008
+TCG: 0.008
+risc-v: 0.007
+architecture: 0.006
+ppc: 0.005
+VMM: 0.005
+permissions: 0.004
+vnc: 0.004
+network: 0.004
+peripherals: 0.003
+arm: 0.003
+i386: 0.003
+alpha: 0.002
+graphic: 0.002
+mistranslation: 0.000
+
+[Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot
+
+Hi,
+
+Recently we encountered a problem in our project: 2 CPUs in VM are not brought
+up normally after reboot.
+
+Our host is using KVM kmod 3.6 and QEMU 2.1.
+A SLES 11 sp3 VM configured with 8 vcpus,
+cpu model is configured with 'host-passthrough'.
+
+After VM's first time started up, everything seems to be OK.
+and then VM is paniced and rebooted.
+After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not online.
+
+This is the only message we can get from VM:
+VM dmesg shows:
+[ 0.069867] Booting Node 0, Processors #1
+[ 5.060042] CPU1: Stuck ??
+[ 5.060499] #2
+[ 5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock
+[ 5.088335] KVM setup async PF for cpu 2
+[ 5.092967] NMI watchdog enabled, takes one hw-pmu counter.
+[ 5.094405] #3
+[ 5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock
+[ 5.108333] KVM setup async PF for cpu 3
+[ 5.113553] NMI watchdog enabled, takes one hw-pmu counter.
+[ 5.114970] #4
+[ 5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock
+[ 5.128336] KVM setup async PF for cpu 4
+[ 5.134576] NMI watchdog enabled, takes one hw-pmu counter.
+[ 5.135998] #5
+[ 5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock
+[ 5.152334] KVM setup async PF for cpu 5
+[ 5.154764] NMI watchdog enabled, takes one hw-pmu counter.
+[ 5.156467] #6
+[ 5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock
+[ 5.172341] KVM setup async PF for cpu 6
+[ 5.180738] NMI watchdog enabled, takes one hw-pmu counter.
+[ 5.182173] #7 Ok.
+[ 10.170815] CPU7: Stuck ??
+[ 10.171648] Brought up 6 CPUs
+[ 10.172394] Total of 6 processors activated (28799.97 BogoMIPS).
+
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not consuming
+any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck' warning,
+and found that the only possible is the emulation of 'cpuid' instruct in
+kvm/qemu has something wrong.
+But since we can’t reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+
+Has anyone come across these problem before? Or any idea?
+
+Thanks,
+zhanghailiang
+
+On 06/07/2015 09:54, zhanghailiang wrote:
+>
+>
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+>
+consuming any cpu (Should be in idle state),
+>
+All of VCPUs' stacks in host is like bellow:
+>
+>
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+>
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+>
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+>
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+>
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+>
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+>
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+>
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+>
+[<ffffffffffffffff>] 0xffffffffffffffff
+>
+>
+We looked into the kernel codes that could leading to the above 'Stuck'
+>
+warning,
+>
+and found that the only possible is the emulation of 'cpuid' instruct in
+>
+kvm/qemu has something wrong.
+>
+But since we can’t reproduce this problem, we are not quite sure.
+>
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation? What do the
+traces say about vcpus 1 and 7?
+
+Paolo
+
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+On 06/07/2015 09:54, zhanghailiang wrote:
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+consuming any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck'
+warning,
+and found that the only possible is the emulation of 'cpuid' instruct in
+kvm/qemu has something wrong.
+But since we can’t reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation? What do the
+traces say about vcpus 1 and 7?
+OK, we searched the VM's kernel codes with the 'Stuck' message, and it is
+located in
+do_boot_cpu(). It's in BSP context, the call process is:
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu()
+-> wakeup_secondary_via_INIT() to trigger APs.
+It will wait 5s for APs to startup, if some AP not startup normally, it will
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and
+begins to execute the code
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before
+smp_callin()(smpboot.c).
+The follow is the starup process of BSP and AP.
+BSP:
+start_kernel()
+ ->smp_init()
+ ->smp_boot_cpus()
+ ->do_boot_cpu()
+ ->start_ip = trampoline_address(); //set the address that AP will go
+to execute
+ ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+ ->for (timeout = 0; timeout < 50000; timeout++)
+ if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if AP
+startup or not
+
+APs:
+ENTRY(trampoline_data) (trampoline_64.S)
+ ->ENTRY(secondary_startup_64) (head_64.S)
+ ->start_secondary() (smpboot.c)
+ ->cpu_init();
+ ->smp_callin();
+ ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP comes
+here, the BSP will not prints the error message.
+
+From above call process, we can be sure that, the AP has been stuck between
+trampoline_data and the cpumask_set_cpu() in
+smp_callin(), we look through these codes path carefully, and only found a
+'hlt' instruct that could block the process.
+It is located in trampoline_data():
+
+ENTRY(trampoline_data)
+ ...
+
+ call verify_cpu # Verify the cpu supports long mode
+ testl %eax, %eax # Check for return code
+ jnz no_longmode
+
+ ...
+
+no_longmode:
+ hlt
+ jmp no_longmode
+
+For the process verify_cpu(),
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from
+No-root mode.
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to
+the fail in verify_cpu.
+
+From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+[ 5.060042] CPU1: Stuck ??
+[ 10.170815] CPU7: Stuck ??
+[ 10.171648] Brought up 6 CPUs
+
+Besides, the follow is the cpus message got from host.
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command
+instance-0000000
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+ CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+ CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+ CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+ CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+ CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+ CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+ CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+
+Oh, i also forgot to mention in the above message that, we have bond each vCPU
+to different physical CPU in
+host.
+
+Thanks,
+zhanghailiang
+
+On 06/07/2015 11:59, zhanghailiang wrote:
+>
+>
+>
+Besides, the follow is the cpus message got from host.
+>
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
+>
+qemu-monitor-command instance-0000000
+>
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+>
+CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+>
+CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+>
+CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+>
+CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+>
+CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+>
+CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+>
+CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+>
+>
+Oh, i also forgot to mention in the above message that, we have bond
+>
+each vCPU to different physical CPU in
+>
+host.
+Can you capture a trace on the host (trace-cmd record -e kvm) and send
+it privately? Please note which CPUs get stuck, since I guess it's not
+always 1 and 7.
+
+Paolo
+
+On Mon, 6 Jul 2015 17:59:10 +0800
+zhanghailiang <address@hidden> wrote:
+
+>
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+>
+>
+>
+>
+>
+> On 06/07/2015 09:54, zhanghailiang wrote:
+>
+>>
+>
+>> From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+>
+>> consuming any cpu (Should be in idle state),
+>
+>> All of VCPUs' stacks in host is like bellow:
+>
+>>
+>
+>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+>
+>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+>
+>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+>
+>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+>
+>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+>
+>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+>
+>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+>
+>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+>
+>> [<ffffffffffffffff>] 0xffffffffffffffff
+>
+>>
+>
+>> We looked into the kernel codes that could leading to the above 'Stuck'
+>
+>> warning,
+in current upstream there isn't any printk(...Stuck...) left since that code
+path
+has been reworked.
+I've often seen this on over-committed host during guest CPUs up/down torture
+test.
+Could you update guest kernel to upstream and see if issue reproduces?
+
+>
+>> and found that the only possible is the emulation of 'cpuid' instruct in
+>
+>> kvm/qemu has something wrong.
+>
+>> But since we can’t reproduce this problem, we are not quite sure.
+>
+>> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+>
+>
+>
+> Can you explain the relationship to the cpuid emulation? What do the
+>
+> traces say about vcpus 1 and 7?
+>
+>
+OK, we searched the VM's kernel codes with the 'Stuck' message, and it is
+>
+located in
+>
+do_boot_cpu(). It's in BSP context, the call process is:
+>
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu()
+>
+-> wakeup_secondary_via_INIT() to trigger APs.
+>
+It will wait 5s for APs to startup, if some AP not startup normally, it will
+>
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+>
+>
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and
+>
+begins to execute the code
+>
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places
+>
+before smp_callin()(smpboot.c).
+>
+The follow is the starup process of BSP and AP.
+>
+BSP:
+>
+start_kernel()
+>
+->smp_init()
+>
+->smp_boot_cpus()
+>
+->do_boot_cpu()
+>
+->start_ip = trampoline_address(); //set the address that AP will
+>
+go to execute
+>
+->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+>
+->for (timeout = 0; timeout < 50000; timeout++)
+>
+if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if
+>
+AP startup or not
+>
+>
+APs:
+>
+ENTRY(trampoline_data) (trampoline_64.S)
+>
+->ENTRY(secondary_startup_64) (head_64.S)
+>
+->start_secondary() (smpboot.c)
+>
+->cpu_init();
+>
+->smp_callin();
+>
+->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
+>
+comes here, the BSP will not prints the error message.
+>
+>
+From above call process, we can be sure that, the AP has been stuck between
+>
+trampoline_data and the cpumask_set_cpu() in
+>
+smp_callin(), we look through these codes path carefully, and only found a
+>
+'hlt' instruct that could block the process.
+>
+It is located in trampoline_data():
+>
+>
+ENTRY(trampoline_data)
+>
+...
+>
+>
+call verify_cpu # Verify the cpu supports long mode
+>
+testl %eax, %eax # Check for return code
+>
+jnz no_longmode
+>
+>
+...
+>
+>
+no_longmode:
+>
+hlt
+>
+jmp no_longmode
+>
+>
+For the process verify_cpu(),
+>
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from
+>
+No-root mode.
+>
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to
+>
+the fail in verify_cpu.
+>
+>
+From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+>
+[ 5.060042] CPU1: Stuck ??
+>
+[ 10.170815] CPU7: Stuck ??
+>
+[ 10.171648] Brought up 6 CPUs
+>
+>
+Besides, the follow is the cpus message got from host.
+>
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
+>
+qemu-monitor-command instance-0000000
+>
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+>
+CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+>
+CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+>
+CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+>
+CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+>
+CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+>
+CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+>
+CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+>
+>
+Oh, i also forgot to mention in the above message that, we have bond each
+>
+vCPU to different physical CPU in
+>
+host.
+>
+>
+Thanks,
+>
+zhanghailiang
+>
+>
+>
+>
+>
+--
+>
+To unsubscribe from this list: send the line "unsubscribe kvm" in
+>
+the body of a message to address@hidden
+>
+More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+
+On 2015/7/7 19:23, Igor Mammedov wrote:
+On Mon, 6 Jul 2015 17:59:10 +0800
+zhanghailiang <address@hidden> wrote:
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+On 06/07/2015 09:54, zhanghailiang wrote:
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+consuming any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck'
+warning,
+in current upstream there isn't any printk(...Stuck...) left since that code
+path
+has been reworked.
+I've often seen this on over-committed host during guest CPUs up/down torture
+test.
+Could you update guest kernel to upstream and see if issue reproduces?
+Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to
+reproduce it.
+
+For your test case, is it a kernel bug?
+Or is there any related patch could solve your test problem been merged into
+upstream ?
+
+Thanks,
+zhanghailiang
+and found that the only possible is the emulation of 'cpuid' instruct in
+kvm/qemu has something wrong.
+But since we can’t reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation? What do the
+traces say about vcpus 1 and 7?
+OK, we searched the VM's kernel codes with the 'Stuck' message, and it is
+located in
+do_boot_cpu(). It's in BSP context, the call process is:
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu()
+-> wakeup_secondary_via_INIT() to trigger APs.
+It will wait 5s for APs to startup, if some AP not startup normally, it will
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and
+begins to execute the code
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before
+smp_callin()(smpboot.c).
+The follow is the starup process of BSP and AP.
+BSP:
+start_kernel()
+ ->smp_init()
+ ->smp_boot_cpus()
+ ->do_boot_cpu()
+ ->start_ip = trampoline_address(); //set the address that AP will
+go to execute
+ ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+ ->for (timeout = 0; timeout < 50000; timeout++)
+ if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if
+AP startup or not
+
+APs:
+ENTRY(trampoline_data) (trampoline_64.S)
+ ->ENTRY(secondary_startup_64) (head_64.S)
+ ->start_secondary() (smpboot.c)
+ ->cpu_init();
+ ->smp_callin();
+ ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
+comes here, the BSP will not prints the error message.
+
+ From above call process, we can be sure that, the AP has been stuck between
+trampoline_data and the cpumask_set_cpu() in
+smp_callin(), we look through these codes path carefully, and only found a
+'hlt' instruct that could block the process.
+It is located in trampoline_data():
+
+ENTRY(trampoline_data)
+ ...
+
+ call verify_cpu # Verify the cpu supports long mode
+ testl %eax, %eax # Check for return code
+ jnz no_longmode
+
+ ...
+
+no_longmode:
+ hlt
+ jmp no_longmode
+
+For the process verify_cpu(),
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from
+No-root mode.
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to
+the fail in verify_cpu.
+
+ From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+[ 5.060042] CPU1: Stuck ??
+[ 10.170815] CPU7: Stuck ??
+[ 10.171648] Brought up 6 CPUs
+
+Besides, the follow is the cpus message got from host.
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command
+instance-0000000
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+ CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+ CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+ CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+ CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+ CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+ CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+ CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+
+Oh, i also forgot to mention in the above message that, we have bond each vCPU
+to different physical CPU in
+host.
+
+Thanks,
+zhanghailiang
+
+
+
+
+--
+To unsubscribe from this list: send the line "unsubscribe kvm" in
+the body of a message to address@hidden
+More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+.
+
+On Tue, 7 Jul 2015 19:43:35 +0800
+zhanghailiang <address@hidden> wrote:
+
+>
+On 2015/7/7 19:23, Igor Mammedov wrote:
+>
+> On Mon, 6 Jul 2015 17:59:10 +0800
+>
+> zhanghailiang <address@hidden> wrote:
+>
+>
+>
+>> On 2015/7/6 16:45, Paolo Bonzini wrote:
+>
+>>>
+>
+>>>
+>
+>>> On 06/07/2015 09:54, zhanghailiang wrote:
+>
+>>>>
+>
+>>>> From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+>
+>>>> consuming any cpu (Should be in idle state),
+>
+>>>> All of VCPUs' stacks in host is like bellow:
+>
+>>>>
+>
+>>>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+>
+>>>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+>
+>>>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+>
+>>>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+>
+>>>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+>
+>>>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+>
+>>>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+>
+>>>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+>
+>>>> [<ffffffffffffffff>] 0xffffffffffffffff
+>
+>>>>
+>
+>>>> We looked into the kernel codes that could leading to the above 'Stuck'
+>
+>>>> warning,
+>
+> in current upstream there isn't any printk(...Stuck...) left since that
+>
+> code path
+>
+> has been reworked.
+>
+> I've often seen this on over-committed host during guest CPUs up/down
+>
+> torture test.
+>
+> Could you update guest kernel to upstream and see if issue reproduces?
+>
+>
+>
+>
+Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to
+>
+reproduce it.
+>
+>
+For your test case, is it a kernel bug?
+>
+Or is there any related patch could solve your test problem been merged into
+>
+upstream ?
+I don't remember all prerequisite patches but you should be able to find
+http://marc.info/?l=linux-kernel&m=140326703108009&w=2
+"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
+and then look for dependencies.
+
+
+>
+>
+Thanks,
+>
+zhanghailiang
+>
+>
+>>>> and found that the only possible is the emulation of 'cpuid' instruct in
+>
+>>>> kvm/qemu has something wrong.
+>
+>>>> But since we can’t reproduce this problem, we are not quite sure.
+>
+>>>> Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+>
+>>>
+>
+>>> Can you explain the relationship to the cpuid emulation? What do the
+>
+>>> traces say about vcpus 1 and 7?
+>
+>>
+>
+>> OK, we searched the VM's kernel codes with the 'Stuck' message, and it is
+>
+>> located in
+>
+>> do_boot_cpu(). It's in BSP context, the call process is:
+>
+>> BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() ->
+>
+>> do_boot_cpu() -> wakeup_secondary_via_INIT() to trigger APs.
+>
+>> It will wait 5s for APs to startup, if some AP not startup normally, it
+>
+>> will print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+>
+>>
+>
+>> If it prints 'Stuck', it means the AP has received the SIPI interrupt and
+>
+>> begins to execute the code
+>
+>> 'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places
+>
+>> before smp_callin()(smpboot.c).
+>
+>> The follow is the starup process of BSP and AP.
+>
+>> BSP:
+>
+>> start_kernel()
+>
+>> ->smp_init()
+>
+>> ->smp_boot_cpus()
+>
+>> ->do_boot_cpu()
+>
+>> ->start_ip = trampoline_address(); //set the address that AP
+>
+>> will go to execute
+>
+>> ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+>
+>> ->for (timeout = 0; timeout < 50000; timeout++)
+>
+>> if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;//
+>
+>> check if AP startup or not
+>
+>>
+>
+>> APs:
+>
+>> ENTRY(trampoline_data) (trampoline_64.S)
+>
+>> ->ENTRY(secondary_startup_64) (head_64.S)
+>
+>> ->start_secondary() (smpboot.c)
+>
+>> ->cpu_init();
+>
+>> ->smp_callin();
+>
+>> ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
+>
+>> comes here, the BSP will not prints the error message.
+>
+>>
+>
+>> From above call process, we can be sure that, the AP has been stuck
+>
+>> between trampoline_data and the cpumask_set_cpu() in
+>
+>> smp_callin(), we look through these codes path carefully, and only found a
+>
+>> 'hlt' instruct that could block the process.
+>
+>> It is located in trampoline_data():
+>
+>>
+>
+>> ENTRY(trampoline_data)
+>
+>> ...
+>
+>>
+>
+>> call verify_cpu # Verify the cpu supports long mode
+>
+>> testl %eax, %eax # Check for return code
+>
+>> jnz no_longmode
+>
+>>
+>
+>> ...
+>
+>>
+>
+>> no_longmode:
+>
+>> hlt
+>
+>> jmp no_longmode
+>
+>>
+>
+>> For the process verify_cpu(),
+>
+>> we can only find the 'cpuid' sensitive instruct that could lead VM exit
+>
+>> from No-root mode.
+>
+>> This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading
+>
+>> to the fail in verify_cpu.
+>
+>>
+>
+>> From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+>
+>> [ 5.060042] CPU1: Stuck ??
+>
+>> [ 10.170815] CPU7: Stuck ??
+>
+>> [ 10.171648] Brought up 6 CPUs
+>
+>>
+>
+>> Besides, the follow is the cpus message got from host.
+>
+>> 80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh
+>
+>> qemu-monitor-command instance-0000000
+>
+>> * CPU #0: pc=0x00007f64160c683d thread_id=68570
+>
+>> CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+>
+>> CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+>
+>> CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+>
+>> CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+>
+>> CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+>
+>> CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+>
+>> CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+>
+>>
+>
+>> Oh, i also forgot to mention in the above message that, we have bond each
+>
+>> vCPU to different physical CPU in
+>
+>> host.
+>
+>>
+>
+>> Thanks,
+>
+>> zhanghailiang
+>
+>>
+>
+>>
+>
+>>
+>
+>>
+>
+>> --
+>
+>> To unsubscribe from this list: send the line "unsubscribe kvm" in
+>
+>> the body of a message to address@hidden
+>
+>> More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+>
+>
+>
+>
+>
+> .
+>
+>
+>
+>
+>
+
+On 2015/7/7 20:21, Igor Mammedov wrote:
+On Tue, 7 Jul 2015 19:43:35 +0800
+zhanghailiang <address@hidden> wrote:
+On 2015/7/7 19:23, Igor Mammedov wrote:
+On Mon, 6 Jul 2015 17:59:10 +0800
+zhanghailiang <address@hidden> wrote:
+On 2015/7/6 16:45, Paolo Bonzini wrote:
+On 06/07/2015 09:54, zhanghailiang wrote:
+From host, we found that QEMU vcpu1 thread and vcpu7 thread were not
+consuming any cpu (Should be in idle state),
+All of VCPUs' stacks in host is like bellow:
+
+[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm]
+[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm]
+[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
+[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
+[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0
+[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0
+[<ffffffff81468092>] system_call_fastpath+0x16/0x1b
+[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7
+[<ffffffffffffffff>] 0xffffffffffffffff
+
+We looked into the kernel codes that could leading to the above 'Stuck'
+warning,
+in current upstream there isn't any printk(...Stuck...) left since that code
+path
+has been reworked.
+I've often seen this on over-committed host during guest CPUs up/down torture
+test.
+Could you update guest kernel to upstream and see if issue reproduces?
+Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to
+reproduce it.
+
+For your test case, is it a kernel bug?
+Or is there any related patch could solve your test problem been merged into
+upstream ?
+I don't remember all prerequisite patches but you should be able to find
+http://marc.info/?l=linux-kernel&m=140326703108009&w=2
+"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"
+and then look for dependencies.
+Er, we have investigated this patch, and it is not related to our problem, :)
+
+Thanks.
+Thanks,
+zhanghailiang
+and found that the only possible is the emulation of 'cpuid' instruct in
+kvm/qemu has something wrong.
+But since we can’t reproduce this problem, we are not quite sure.
+Is there any possible that the cupid emulation in kvm/qemu has some bug ?
+Can you explain the relationship to the cpuid emulation? What do the
+traces say about vcpus 1 and 7?
+OK, we searched the VM's kernel codes with the 'Stuck' message, and it is
+located in
+do_boot_cpu(). It's in BSP context, the call process is:
+BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu()
+-> wakeup_secondary_via_INIT() to trigger APs.
+It will wait 5s for APs to startup, if some AP not startup normally, it will
+print 'CPU%d Stuck' or 'CPU%d: Not responding'.
+
+If it prints 'Stuck', it means the AP has received the SIPI interrupt and
+begins to execute the code
+'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before
+smp_callin()(smpboot.c).
+The follow is the starup process of BSP and AP.
+BSP:
+start_kernel()
+ ->smp_init()
+ ->smp_boot_cpus()
+ ->do_boot_cpu()
+ ->start_ip = trampoline_address(); //set the address that AP will
+go to execute
+ ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU
+ ->for (timeout = 0; timeout < 50000; timeout++)
+ if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if
+AP startup or not
+
+APs:
+ENTRY(trampoline_data) (trampoline_64.S)
+ ->ENTRY(secondary_startup_64) (head_64.S)
+ ->start_secondary() (smpboot.c)
+ ->cpu_init();
+ ->smp_callin();
+ ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP
+comes here, the BSP will not prints the error message.
+
+ From above call process, we can be sure that, the AP has been stuck between
+trampoline_data and the cpumask_set_cpu() in
+smp_callin(), we look through these codes path carefully, and only found a
+'hlt' instruct that could block the process.
+It is located in trampoline_data():
+
+ENTRY(trampoline_data)
+ ...
+
+ call verify_cpu # Verify the cpu supports long mode
+ testl %eax, %eax # Check for return code
+ jnz no_longmode
+
+ ...
+
+no_longmode:
+ hlt
+ jmp no_longmode
+
+For the process verify_cpu(),
+we can only find the 'cpuid' sensitive instruct that could lead VM exit from
+No-root mode.
+This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to
+the fail in verify_cpu.
+
+ From the message in VM, we know vcpu1 and vcpu7 is something wrong.
+[ 5.060042] CPU1: Stuck ??
+[ 10.170815] CPU7: Stuck ??
+[ 10.171648] Brought up 6 CPUs
+
+Besides, the follow is the cpus message got from host.
+80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command
+instance-0000000
+* CPU #0: pc=0x00007f64160c683d thread_id=68570
+ CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573
+ CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575
+ CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576
+ CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577
+ CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578
+ CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583
+ CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584
+
+Oh, i also forgot to mention in the above message that, we have bond each vCPU
+to different physical CPU in
+host.
+
+Thanks,
+zhanghailiang
+
+
+
+
+--
+To unsubscribe from this list: send the line "unsubscribe kvm" in
+the body of a message to address@hidden
+More majordomo info at
+http://vger.kernel.org/majordomo-info.html
+.
+.
+