diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-07-03 19:39:53 +0200 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-07-03 19:39:53 +0200 |
| commit | dee4dcba78baf712cab403d47d9db319ab7f95d6 (patch) | |
| tree | 418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/016/boot | |
| parent | 4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff) | |
| download | qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip | |
restructure results
Diffstat (limited to 'results/classifier/016/boot')
| -rw-r--r-- | results/classifier/016/boot/16056596 | 125 | ||||
| -rw-r--r-- | results/classifier/016/boot/43643137 | 565 | ||||
| -rw-r--r-- | results/classifier/016/boot/51610399 | 335 |
3 files changed, 0 insertions, 1025 deletions
diff --git a/results/classifier/016/boot/16056596 b/results/classifier/016/boot/16056596 deleted file mode 100644 index bb94452b0..000000000 --- a/results/classifier/016/boot/16056596 +++ /dev/null @@ -1,125 +0,0 @@ -ppc: 0.980 -boot: 0.952 -KVM: 0.879 -virtual: 0.562 -debug: 0.385 -register: 0.363 -operating system: 0.345 -kernel: 0.272 -hypervisor: 0.061 -PID: 0.058 -TCG: 0.038 -device: 0.033 -socket: 0.032 -files: 0.029 -user-level: 0.015 -performance: 0.007 -semantic: 0.006 -network: 0.003 -architecture: 0.003 -VMM: 0.002 -assembly: 0.002 -graphic: 0.002 -peripherals: 0.001 -permissions: 0.001 -risc-v: 0.001 -vnc: 0.001 -mistranslation: 0.000 -x86: 0.000 -alpha: 0.000 -i386: 0.000 -arm: 0.000 - -[BUG][powerpc] KVM Guest Boot Failure and Hang at "Booting Linux via __start()" - -Bug Description: -Encountering a boot failure when launching a KVM guest with -'qemu-system-ppc64'. The guest hangs at boot, and the QEMU monitor -crashes. -Reproduction Steps: -# qemu-system-ppc64 --version -QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) -Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers -# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine -pseries,accel=kvm \ --m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ - -device virtio-scsi-pci,id=scsi \ --drive -file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 -\ --device scsi-hd,drive=drive0,bus=scsi.0 \ - -netdev bridge,id=net0,br=virbr0 \ - -device virtio-net-pci,netdev=net0 \ - -serial pty \ - -device virtio-balloon-pci \ - -cpu host -QEMU 9.2.50 monitor - type 'help' for more information -char device redirected to /dev/pts/2 (label serial0) -(qemu) -(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but -unavailable: IRQ_XIVE capability must be present for KVM -Falling back to kernel-irqchip=off -** Qemu Hang - -(In another ssh session) -# screen /dev/pts/2 -Preparing to boot Linux version 6.10.4-200.fc40.ppc64le -(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 -(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 -15:20:17 UTC 2024 -Detected machine type: 0000000000000101 -command line: -BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le -root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M -Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) -Calling ibm,client-architecture-support... done -memory layout at init: - memory_limit : 0000000000000000 (16 MB aligned) - alloc_bottom : 0000000008200000 - alloc_top : 0000000030000000 - alloc_top_hi : 0000000800000000 - rmo_top : 0000000030000000 - ram_top : 0000000800000000 -instantiating rtas at 0x000000002fff0000... done -prom_hold_cpus: skipped -copying OF device tree... -Building dt strings... -Building dt structure... -Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 -Device tree struct 0x0000000008220000 -> 0x0000000008230000 -Quiescing Open Firmware ... -Booting Linux via __start() @ 0x0000000000440000 ... -** Guest Console Hang - - -Git Bisect: -Performing git bisect points to the following patch: -# git bisect bad -e8291ec16da80566c121c68d9112be458954d90b is the first bad commit -commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) -Author: Nicholas Piggin <npiggin@gmail.com> -Date: Thu Dec 19 13:40:31 2024 +1000 - - target/ppc: fix timebase register reset state -(H)DEC and PURR get reset before icount does, which causes them to -be -skewed and not match the init state. This can cause replay to not -match the recorded trace exactly. For DEC and HDEC this is usually -not -noticable since they tend to get programmed before affecting the - target machine. PURR has been observed to cause replay bugs when - running Linux. - - Fix this by resetting using a time of 0. - - Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> - Signed-off-by: Nicholas Piggin <npiggin@gmail.com> - - hw/ppc/ppc.c | 11 ++++++++--- - 1 file changed, 8 insertions(+), 3 deletions(-) - - -Reverting the patch helps boot the guest. -Thanks, -Misbah Anjum N - diff --git a/results/classifier/016/boot/43643137 b/results/classifier/016/boot/43643137 deleted file mode 100644 index e4bebc960..000000000 --- a/results/classifier/016/boot/43643137 +++ /dev/null @@ -1,565 +0,0 @@ -boot: 0.933 -virtual: 0.827 -debug: 0.726 -hypervisor: 0.691 -KVM: 0.449 -x86: 0.204 -register: 0.084 -operating system: 0.049 -kernel: 0.047 -PID: 0.047 -TCG: 0.036 -assembly: 0.023 -architecture: 0.022 -VMM: 0.021 -risc-v: 0.016 -user-level: 0.014 -files: 0.014 -performance: 0.013 -semantic: 0.011 -socket: 0.008 -network: 0.007 -ppc: 0.007 -device: 0.007 -arm: 0.004 -vnc: 0.004 -i386: 0.004 -permissions: 0.004 -alpha: 0.002 -peripherals: 0.001 -graphic: 0.001 -mistranslation: 0.000 - -[Qemu-devel] [BUG/RFC] INIT IPI lost when VM starts - -Hi, -We encountered a problem that when a domain starts, seabios failed to online a -vCPU. - -After investigation, we found that the reason is in kvm-kmod, KVM_APIC_INIT bit -in -vcpu->arch.apic->pending_events was overwritten by qemu, and thus an INIT IPI -sent -to AP was lost. Qemu does this since libvirtd sends a âquery-cpusâ qmp command -to qemu -on VM start. - -In qemu, qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state-> -do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from kvm-kmod and -sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call -kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus pending_events is -overwritten by qemu. - -I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true after -âquery-cpusâ, -and kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am not sure -whether -it is OK for qemu to set cpu->kvm_vcpu_dirty in do_kvm_cpu_synchronize_state in -each caller. - -Whatâs your opinion? - -Let me clarify it more clearly. Time sequence is that qemu handles âquery-cpusâ qmp -command, vcpu 1 (and vcpu 0) got registers from kvm-kmod (qmp_query_cpus-> -cpu_synchronize_state-> kvm_cpu_synchronize_state-> -> do_kvm_cpu_synchronize_state-> kvm_arch_get_registers), then vcpu 0 (BSP) -sends INIT-SIPI to vcpu 1(AP). In kvm-kmod, vcpu 1âs pending_eventsâs KVM_APIC_INIT -bit set. -Then vcpu 1 continue running, vcpu1 thread in qemu calls -kvm_arch_put_registers-> kvm_put_vcpu_events, so KVM_APIC_INIT bit in vcpu 1âs -pending_events got cleared, i.e., lost. - -In kvm-kmod, except for pending_events, sipi_vector may also be overwritten., -so I am not sure if there are other fields/registers in danger, i.e., those may -be modified asynchronously with vcpu thread itself. - -BTW, using a sleep like following can reliably reproduce this problem, if VM -equipped with more than 2 vcpus and starting VM using libvirtd. - -diff --git a/target/i386/kvm.c b/target/i386/kvm.c -index 55865db..5099290 100644 ---- a/target/i386/kvm.c -+++ b/target/i386/kvm.c -@@ -2534,6 +2534,11 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level) - KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR; - } - -+ if (CPU(cpu)->cpu_index == 1) { -+ fprintf(stderr, "vcpu 1 sleep!!!!\n"); -+ sleep(10); -+ } -+ - return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events); - } - - -On 2017/3/20 22:21, Herongguang (Stephen) wrote: -Hi, -We encountered a problem that when a domain starts, seabios failed to online a -vCPU. - -After investigation, we found that the reason is in kvm-kmod, KVM_APIC_INIT bit -in -vcpu->arch.apic->pending_events was overwritten by qemu, and thus an INIT IPI -sent -to AP was lost. Qemu does this since libvirtd sends a âquery-cpusâ qmp command -to qemu -on VM start. - -In qemu, qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state-> -do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from kvm-kmod and -sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call -kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus pending_events is -overwritten by qemu. - -I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true after -âquery-cpusâ, -and kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am not sure -whether -it is OK for qemu to set cpu->kvm_vcpu_dirty in do_kvm_cpu_synchronize_state in -each caller. - -Whatâs your opinion? - -On 20/03/2017 15:21, Herongguang (Stephen) wrote: -> -> -We encountered a problem that when a domain starts, seabios failed to -> -online a vCPU. -> -> -After investigation, we found that the reason is in kvm-kmod, -> -KVM_APIC_INIT bit in -> -vcpu->arch.apic->pending_events was overwritten by qemu, and thus an -> -INIT IPI sent -> -to AP was lost. Qemu does this since libvirtd sends a âquery-cpusâ qmp -> -command to qemu -> -on VM start. -> -> -In qemu, qmp_query_cpus-> cpu_synchronize_state-> -> -kvm_cpu_synchronize_state-> -> -do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from -> -kvm-kmod and -> -sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call -> -kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus -> -pending_events is -> -overwritten by qemu. -> -> -I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true -> -after âquery-cpusâ, -> -and kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am -> -not sure whether -> -it is OK for qemu to set cpu->kvm_vcpu_dirty in -> -do_kvm_cpu_synchronize_state in each caller. -> -> -Whatâs your opinion? -Hi Rongguang, - -sorry for the late response. - -Where exactly is KVM_APIC_INIT dropped? kvm_get_mp_state does clear the -bit, but the result of the INIT is stored in mp_state. - -kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves -KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes -it back. Maybe it should ignore events.smi.latched_init if not in SMM, -but I would like to understand the exact sequence of events. - -Thanks, - -paolo - -On 2017/4/6 0:16, Paolo Bonzini wrote: -On 20/03/2017 15:21, Herongguang (Stephen) wrote: -We encountered a problem that when a domain starts, seabios failed to -online a vCPU. - -After investigation, we found that the reason is in kvm-kmod, -KVM_APIC_INIT bit in -vcpu->arch.apic->pending_events was overwritten by qemu, and thus an -INIT IPI sent -to AP was lost. Qemu does this since libvirtd sends a âquery-cpusâ qmp -command to qemu -on VM start. - -In qemu, qmp_query_cpus-> cpu_synchronize_state-> -kvm_cpu_synchronize_state-> -do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from -kvm-kmod and -sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call -kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus -pending_events is -overwritten by qemu. - -I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true -after âquery-cpusâ, -and kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am -not sure whether -it is OK for qemu to set cpu->kvm_vcpu_dirty in -do_kvm_cpu_synchronize_state in each caller. - -Whatâs your opinion? -Hi Rongguang, - -sorry for the late response. - -Where exactly is KVM_APIC_INIT dropped? kvm_get_mp_state does clear the -bit, but the result of the INIT is stored in mp_state. -It's dropped in KVM_SET_VCPU_EVENTS, see below. -kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves -KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes -it back. Maybe it should ignore events.smi.latched_init if not in SMM, -but I would like to understand the exact sequence of events. -time0: -vcpu1: -qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state-> -> do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to true)-> -kvm_arch_get_registers(KVM_APIC_INIT bit in vcpu->arch.apic->pending_events was not set) - -time1: -vcpu0: -send INIT-SIPI to all AP->(in vcpu 0's context)__apic_accept_irq(KVM_APIC_INIT bit -in vcpu1's arch.apic->pending_events is set) - -time2: -vcpu1: -kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is -true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten KVM_APIC_INIT bit in -vcpu->arch.apic->pending_events!) - -So it's a race between vcpu1 get/put registers with kvm/other vcpus changing -vcpu1's status/structure fields in the mean time, I am in worry of if there are -other fields may be overwritten, -sipi_vector is one. - -also see: -https://www.mail-archive.com/address@hidden/msg438675.html -Thanks, - -paolo - -. - -Hi Paolo, - -What's your opinion about this patch? We found it just before finishing patches -for the past two days. - - -Thanks, --Gonglei - - -> ------Original Message----- -> -From: address@hidden [ -mailto:address@hidden -On -> -Behalf Of Herongguang (Stephen) -> -Sent: Thursday, April 06, 2017 9:47 AM -> -To: Paolo Bonzini; address@hidden; address@hidden; -> -address@hidden; address@hidden; address@hidden; -> -wangxin (U); Huangweidong (C) -> -Subject: Re: [BUG/RFC] INIT IPI lost when VM starts -> -> -> -> -On 2017/4/6 0:16, Paolo Bonzini wrote: -> -> -> -> On 20/03/2017 15:21, Herongguang (Stephen) wrote: -> ->> We encountered a problem that when a domain starts, seabios failed to -> ->> online a vCPU. -> ->> -> ->> After investigation, we found that the reason is in kvm-kmod, -> ->> KVM_APIC_INIT bit in -> ->> vcpu->arch.apic->pending_events was overwritten by qemu, and thus an -> ->> INIT IPI sent -> ->> to AP was lost. Qemu does this since libvirtd sends a âquery-cpusâ qmp -> ->> command to qemu -> ->> on VM start. -> ->> -> ->> In qemu, qmp_query_cpus-> cpu_synchronize_state-> -> ->> kvm_cpu_synchronize_state-> -> ->> do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from -> ->> kvm-kmod and -> ->> sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call -> ->> kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus -> ->> pending_events is -> ->> overwritten by qemu. -> ->> -> ->> I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true -> ->> after âquery-cpusâ, -> ->> and kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am -> ->> not sure whether -> ->> it is OK for qemu to set cpu->kvm_vcpu_dirty in -> ->> do_kvm_cpu_synchronize_state in each caller. -> ->> -> ->> Whatâs your opinion? -> -> Hi Rongguang, -> -> -> -> sorry for the late response. -> -> -> -> Where exactly is KVM_APIC_INIT dropped? kvm_get_mp_state does clear -> -the -> -> bit, but the result of the INIT is stored in mp_state. -> -> -It's dropped in KVM_SET_VCPU_EVENTS, see below. -> -> -> -> -> kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves -> -> KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes -> -> it back. Maybe it should ignore events.smi.latched_init if not in SMM, -> -> but I would like to understand the exact sequence of events. -> -> -time0: -> -vcpu1: -> -qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state-> -> -> do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to -> -true)-> kvm_arch_get_registers(KVM_APIC_INIT bit in -> -vcpu->arch.apic->pending_events was not set) -> -> -time1: -> -vcpu0: -> -send INIT-SIPI to all AP->(in vcpu 0's -> -context)__apic_accept_irq(KVM_APIC_INIT bit in vcpu1's -> -arch.apic->pending_events is set) -> -> -time2: -> -vcpu1: -> -kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is -> -true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten -> -KVM_APIC_INIT bit in vcpu->arch.apic->pending_events!) -> -> -So it's a race between vcpu1 get/put registers with kvm/other vcpus changing -> -vcpu1's status/structure fields in the mean time, I am in worry of if there -> -are -> -other fields may be overwritten, -> -sipi_vector is one. -> -> -also see: -> -https://www.mail-archive.com/address@hidden/msg438675.html -> -> -> Thanks, -> -> -> -> paolo -> -> -> -> . -> -> -> - -2017-11-20 06:57+0000, Gonglei (Arei): -> -Hi Paolo, -> -> -What's your opinion about this patch? We found it just before finishing -> -patches -> -for the past two days. -I think your case was fixed by f4ef19108608 ("KVM: X86: Fix loss of -pending INIT due to race"), but that patch didn't fix it perfectly, so -maybe you're hitting a similar case that happens in SMM ... - -> -> -----Original Message----- -> -> From: address@hidden [ -mailto:address@hidden -On -> -> Behalf Of Herongguang (Stephen) -> -> On 2017/4/6 0:16, Paolo Bonzini wrote: -> -> > Hi Rongguang, -> -> > -> -> > sorry for the late response. -> -> > -> -> > Where exactly is KVM_APIC_INIT dropped? kvm_get_mp_state does clear -> -> the -> -> > bit, but the result of the INIT is stored in mp_state. -> -> -> -> It's dropped in KVM_SET_VCPU_EVENTS, see below. -> -> -> -> > -> -> > kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves -> -> > KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes -> -> > it back. Maybe it should ignore events.smi.latched_init if not in SMM, -> -> > but I would like to understand the exact sequence of events. -> -> -> -> time0: -> -> vcpu1: -> -> qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state-> -> -> > do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to -> -> true)-> kvm_arch_get_registers(KVM_APIC_INIT bit in -> -> vcpu->arch.apic->pending_events was not set) -> -> -> -> time1: -> -> vcpu0: -> -> send INIT-SIPI to all AP->(in vcpu 0's -> -> context)__apic_accept_irq(KVM_APIC_INIT bit in vcpu1's -> -> arch.apic->pending_events is set) -> -> -> -> time2: -> -> vcpu1: -> -> kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is -> -> true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten -> -> KVM_APIC_INIT bit in vcpu->arch.apic->pending_events!) -> -> -> -> So it's a race between vcpu1 get/put registers with kvm/other vcpus changing -> -> vcpu1's status/structure fields in the mean time, I am in worry of if there -> -> are -> -> other fields may be overwritten, -> -> sipi_vector is one. -Fields that can be asynchronously written by other VCPUs (like SIPI, -NMI) must not be SET if other VCPUs were not paused since the last GET. -(Looking at the interface, we can currently lose pending SMI.) - -INIT is one of the restricted fields, but the API unconditionally -couples SMM with latched INIT, which means that we can lose an INIT if -the VCPU is in SMM mode -- do you see SMM in kvm_vcpu_events? - -Thanks. - diff --git a/results/classifier/016/boot/51610399 b/results/classifier/016/boot/51610399 deleted file mode 100644 index 615e86a8c..000000000 --- a/results/classifier/016/boot/51610399 +++ /dev/null @@ -1,335 +0,0 @@ -ppc: 0.974 -boot: 0.964 -KVM: 0.904 -operating system: 0.593 -register: 0.561 -virtual: 0.492 -debug: 0.468 -kernel: 0.418 -TCG: 0.156 -PID: 0.089 -hypervisor: 0.070 -device: 0.058 -socket: 0.048 -files: 0.031 -user-level: 0.019 -performance: 0.010 -semantic: 0.007 -VMM: 0.004 -network: 0.004 -architecture: 0.003 -assembly: 0.002 -graphic: 0.002 -permissions: 0.002 -peripherals: 0.001 -risc-v: 0.001 -vnc: 0.001 -x86: 0.001 -alpha: 0.001 -mistranslation: 0.000 -arm: 0.000 -i386: 0.000 - -[BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()” - -Bug Description: -Encountering a boot failure when launching a KVM guest with -qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor -crashes. -Reproduction Steps: -# qemu-system-ppc64 --version -QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) -Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers -# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine -pseries,accel=kvm \ --m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ - -device virtio-scsi-pci,id=scsi \ --drive -file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 -\ --device scsi-hd,drive=drive0,bus=scsi.0 \ - -netdev bridge,id=net0,br=virbr0 \ - -device virtio-net-pci,netdev=net0 \ - -serial pty \ - -device virtio-balloon-pci \ - -cpu host -QEMU 9.2.50 monitor - type 'help' for more information -char device redirected to /dev/pts/2 (label serial0) -(qemu) -(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but -unavailable: IRQ_XIVE capability must be present for KVM -Falling back to kernel-irqchip=off -** Qemu Hang - -(In another ssh session) -# screen /dev/pts/2 -Preparing to boot Linux version 6.10.4-200.fc40.ppc64le -(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 -(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 -15:20:17 UTC 2024 -Detected machine type: 0000000000000101 -command line: -BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le -root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M -Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) -Calling ibm,client-architecture-support... done -memory layout at init: - memory_limit : 0000000000000000 (16 MB aligned) - alloc_bottom : 0000000008200000 - alloc_top : 0000000030000000 - alloc_top_hi : 0000000800000000 - rmo_top : 0000000030000000 - ram_top : 0000000800000000 -instantiating rtas at 0x000000002fff0000... done -prom_hold_cpus: skipped -copying OF device tree... -Building dt strings... -Building dt structure... -Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 -Device tree struct 0x0000000008220000 -> 0x0000000008230000 -Quiescing Open Firmware ... -Booting Linux via __start() @ 0x0000000000440000 ... -** Guest Console Hang - - -Git Bisect: -Performing git bisect points to the following patch: -# git bisect bad -e8291ec16da80566c121c68d9112be458954d90b is the first bad commit -commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) -Author: Nicholas Piggin <npiggin@gmail.com> -Date: Thu Dec 19 13:40:31 2024 +1000 - - target/ppc: fix timebase register reset state -(H)DEC and PURR get reset before icount does, which causes them to -be -skewed and not match the init state. This can cause replay to not -match the recorded trace exactly. For DEC and HDEC this is usually -not -noticable since they tend to get programmed before affecting the - target machine. PURR has been observed to cause replay bugs when - running Linux. - - Fix this by resetting using a time of 0. - - Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> - Signed-off-by: Nicholas Piggin <npiggin@gmail.com> - - hw/ppc/ppc.c | 11 ++++++++--- - 1 file changed, 8 insertions(+), 3 deletions(-) - - -Reverting the patch helps boot the guest. -Thanks, -Misbah Anjum N - -Thanks for the report. - -Tricky problem. A secondary CPU is hanging before it is started by the -primary via rtas call. - -That secondary keeps calling kvm_cpu_exec(), which keeps exiting out -early with EXCP_HLT because kvm_arch_process_async_events() returns -true because that cpu has ->halted=1. That just goes around he run -loop because there is an interrupt pending (DEC). - -So it never runs. It also never releases the BQL, and another CPU, -the primary which is actually supposed to be running, is stuck in -spapr_set_all_lpcrs() in run_on_cpu() waiting for the BQL. - -This patch just exposes the bug I think, by causing the interrupt. -although I'm not quite sure why it's okay previously (-ve decrementer -values should be causing a timer exception too). The timer exception -should not be taken as an interrupt by those secondary CPUs, and it -doesn't because it is masked, until set_all_lpcrs sets an LPCR value -that enables powersave wakeup on decrementer interrupt. - -The start_powered_off sate just sets ->halted, which makes it look -like a powersaving state. Logically I think it's not the same thing -as far as spapr goes. I don't know why start_powered_off only sets -->halted, and not ->stop/stopped as well. - -Not sure how best to solve it cleanly. I'll send a revert if I can't -get something working soon. - -Thanks, -Nick - -On Tue Mar 18, 2025 at 7:09 AM AEST, misanjum wrote: -> -Bug Description: -> -Encountering a boot failure when launching a KVM guest with -> -qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor -> -crashes. -> -> -> -Reproduction Steps: -> -# qemu-system-ppc64 --version -> -QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) -> -Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers -> -> -# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine -> -pseries,accel=kvm \ -> --m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ -> --device virtio-scsi-pci,id=scsi \ -> --drive -> -file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 -> -> -\ -> --device scsi-hd,drive=drive0,bus=scsi.0 \ -> --netdev bridge,id=net0,br=virbr0 \ -> --device virtio-net-pci,netdev=net0 \ -> --serial pty \ -> --device virtio-balloon-pci \ -> --cpu host -> -QEMU 9.2.50 monitor - type 'help' for more information -> -char device redirected to /dev/pts/2 (label serial0) -> -(qemu) -> -(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but -> -unavailable: IRQ_XIVE capability must be present for KVM -> -Falling back to kernel-irqchip=off -> -** Qemu Hang -> -> -(In another ssh session) -> -# screen /dev/pts/2 -> -Preparing to boot Linux version 6.10.4-200.fc40.ppc64le -> -(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 -> -(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 -> -15:20:17 UTC 2024 -> -Detected machine type: 0000000000000101 -> -command line: -> -BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le -> -root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M -> -Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) -> -Calling ibm,client-architecture-support... done -> -memory layout at init: -> -memory_limit : 0000000000000000 (16 MB aligned) -> -alloc_bottom : 0000000008200000 -> -alloc_top : 0000000030000000 -> -alloc_top_hi : 0000000800000000 -> -rmo_top : 0000000030000000 -> -ram_top : 0000000800000000 -> -instantiating rtas at 0x000000002fff0000... done -> -prom_hold_cpus: skipped -> -copying OF device tree... -> -Building dt strings... -> -Building dt structure... -> -Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 -> -Device tree struct 0x0000000008220000 -> 0x0000000008230000 -> -Quiescing Open Firmware ... -> -Booting Linux via __start() @ 0x0000000000440000 ... -> -** Guest Console Hang -> -> -> -Git Bisect: -> -Performing git bisect points to the following patch: -> -# git bisect bad -> -e8291ec16da80566c121c68d9112be458954d90b is the first bad commit -> -commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) -> -Author: Nicholas Piggin <npiggin@gmail.com> -> -Date: Thu Dec 19 13:40:31 2024 +1000 -> -> -target/ppc: fix timebase register reset state -> -> -(H)DEC and PURR get reset before icount does, which causes them to -> -be -> -skewed and not match the init state. This can cause replay to not -> -match the recorded trace exactly. For DEC and HDEC this is usually -> -not -> -noticable since they tend to get programmed before affecting the -> -target machine. PURR has been observed to cause replay bugs when -> -running Linux. -> -> -Fix this by resetting using a time of 0. -> -> -Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> -> -Signed-off-by: Nicholas Piggin <npiggin@gmail.com> -> -> -hw/ppc/ppc.c | 11 ++++++++--- -> -1 file changed, 8 insertions(+), 3 deletions(-) -> -> -> -Reverting the patch helps boot the guest. -> -Thanks, -> -Misbah Anjum N - |