summary refs log tree commit diff stats
path: root/results/classifier/016/boot
diff options
context:
space:
mode:
authorChristian Krinitsin <mail@krinitsin.com>2025-06-10 17:04:21 +0000
committerChristian Krinitsin <mail@krinitsin.com>2025-06-10 17:04:21 +0000
commit7b681b9f9eedaad2f081ae11a32f459f5a1312ff (patch)
tree447529eab427f2cb024d33933794a27f30369c4d /results/classifier/016/boot
parentd804cb5b8f55b5e32c217e728fe02f6e53ecdf78 (diff)
downloademulator-bug-study-7b681b9f9eedaad2f081ae11a32f459f5a1312ff.tar.gz
emulator-bug-study-7b681b9f9eedaad2f081ae11a32f459f5a1312ff.zip
add 17th version of the classifier, including results
Diffstat (limited to 'results/classifier/016/boot')
-rw-r--r--results/classifier/016/boot/16056596125
-rw-r--r--results/classifier/016/boot/43643137565
-rw-r--r--results/classifier/016/boot/51610399335
3 files changed, 1025 insertions, 0 deletions
diff --git a/results/classifier/016/boot/16056596 b/results/classifier/016/boot/16056596
new file mode 100644
index 00000000..bb94452b
--- /dev/null
+++ b/results/classifier/016/boot/16056596
@@ -0,0 +1,125 @@
+ppc: 0.980
+boot: 0.952
+KVM: 0.879
+virtual: 0.562
+debug: 0.385
+register: 0.363
+operating system: 0.345
+kernel: 0.272
+hypervisor: 0.061
+PID: 0.058
+TCG: 0.038
+device: 0.033
+socket: 0.032
+files: 0.029
+user-level: 0.015
+performance: 0.007
+semantic: 0.006
+network: 0.003
+architecture: 0.003
+VMM: 0.002
+assembly: 0.002
+graphic: 0.002
+peripherals: 0.001
+permissions: 0.001
+risc-v: 0.001
+vnc: 0.001
+mistranslation: 0.000
+x86: 0.000
+alpha: 0.000
+i386: 0.000
+arm: 0.000
+
+[BUG][powerpc] KVM Guest Boot Failure and Hang at "Booting Linux via __start()"
+
+Bug Description:
+Encountering a boot failure when launching a KVM guest with
+'qemu-system-ppc64'. The guest hangs at boot, and the QEMU monitor
+crashes.
+Reproduction Steps:
+# qemu-system-ppc64 --version
+QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
+Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
+# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
+pseries,accel=kvm \
+-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
+  -device virtio-scsi-pci,id=scsi \
+-drive
+file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
+\
+-device scsi-hd,drive=drive0,bus=scsi.0 \
+  -netdev bridge,id=net0,br=virbr0 \
+  -device virtio-net-pci,netdev=net0 \
+  -serial pty \
+  -device virtio-balloon-pci \
+  -cpu host
+QEMU 9.2.50 monitor - type 'help' for more information
+char device redirected to /dev/pts/2 (label serial0)
+(qemu)
+(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but
+unavailable: IRQ_XIVE capability must be present for KVM
+Falling back to kernel-irqchip=off
+** Qemu Hang
+
+(In another ssh session)
+# screen /dev/pts/2
+Preparing to boot Linux version 6.10.4-200.fc40.ppc64le
+(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801
+(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11
+15:20:17 UTC 2024
+Detected machine type: 0000000000000101
+command line:
+BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le
+root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
+Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
+Calling ibm,client-architecture-support... done
+memory layout at init:
+  memory_limit : 0000000000000000 (16 MB aligned)
+  alloc_bottom : 0000000008200000
+  alloc_top    : 0000000030000000
+  alloc_top_hi : 0000000800000000
+  rmo_top      : 0000000030000000
+  ram_top      : 0000000800000000
+instantiating rtas at 0x000000002fff0000... done
+prom_hold_cpus: skipped
+copying OF device tree...
+Building dt strings...
+Building dt structure...
+Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
+Device tree struct  0x0000000008220000 -> 0x0000000008230000
+Quiescing Open Firmware ...
+Booting Linux via __start() @ 0x0000000000440000 ...
+** Guest Console Hang
+
+
+Git Bisect:
+Performing git bisect points to the following patch:
+# git bisect bad
+e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
+commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
+Author: Nicholas Piggin <npiggin@gmail.com>
+Date:   Thu Dec 19 13:40:31 2024 +1000
+
+    target/ppc: fix timebase register reset state
+(H)DEC and PURR get reset before icount does, which causes them to
+be
+skewed and not match the init state. This can cause replay to not
+match the recorded trace exactly. For DEC and HDEC this is usually
+not
+noticable since they tend to get programmed before affecting the
+    target machine. PURR has been observed to cause replay bugs when
+    running Linux.
+
+    Fix this by resetting using a time of 0.
+
+    Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
+    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
+
+ hw/ppc/ppc.c | 11 ++++++++---
+ 1 file changed, 8 insertions(+), 3 deletions(-)
+
+
+Reverting the patch helps boot the guest.
+Thanks,
+Misbah Anjum N
+
diff --git a/results/classifier/016/boot/43643137 b/results/classifier/016/boot/43643137
new file mode 100644
index 00000000..e4bebc96
--- /dev/null
+++ b/results/classifier/016/boot/43643137
@@ -0,0 +1,565 @@
+boot: 0.933
+virtual: 0.827
+debug: 0.726
+hypervisor: 0.691
+KVM: 0.449
+x86: 0.204
+register: 0.084
+operating system: 0.049
+kernel: 0.047
+PID: 0.047
+TCG: 0.036
+assembly: 0.023
+architecture: 0.022
+VMM: 0.021
+risc-v: 0.016
+user-level: 0.014
+files: 0.014
+performance: 0.013
+semantic: 0.011
+socket: 0.008
+network: 0.007
+ppc: 0.007
+device: 0.007
+arm: 0.004
+vnc: 0.004
+i386: 0.004
+permissions: 0.004
+alpha: 0.002
+peripherals: 0.001
+graphic: 0.001
+mistranslation: 0.000
+
+[Qemu-devel] [BUG/RFC] INIT IPI lost when VM starts
+
+Hi,
+We encountered a problem that when a domain starts, seabios failed to online a 
+vCPU.
+
+After investigation, we found that the reason is in kvm-kmod, KVM_APIC_INIT bit 
+in
+vcpu->arch.apic->pending_events was overwritten by qemu, and thus an INIT IPI 
+sent
+to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp command 
+to qemu
+on VM start.
+
+In qemu, qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
+do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from kvm-kmod and
+sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
+kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus pending_events is
+overwritten by qemu.
+
+I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true after 
+‘query-cpus’,
+and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am not sure 
+whether
+it is OK for qemu to set cpu->kvm_vcpu_dirty in do_kvm_cpu_synchronize_state in 
+each caller.
+
+What’s your opinion?
+
+Let me clarify it more clearly. Time sequence is that qemu handles ‘query-cpus’ qmp 
+command, vcpu 1 (and vcpu 0) got registers from kvm-kmod (qmp_query_cpus-> 
+cpu_synchronize_state-> kvm_cpu_synchronize_state->
+> do_kvm_cpu_synchronize_state-> kvm_arch_get_registers), then vcpu 0 (BSP) 
+sends INIT-SIPI to vcpu 1(AP). In kvm-kmod, vcpu 1’s pending_events’s KVM_APIC_INIT 
+bit set.
+Then vcpu 1 continue running, vcpu1 thread in qemu calls 
+kvm_arch_put_registers-> kvm_put_vcpu_events, so KVM_APIC_INIT bit in vcpu 1’s 
+pending_events got cleared, i.e., lost.
+
+In kvm-kmod, except for pending_events, sipi_vector may also be overwritten., 
+so I am not sure if there are other fields/registers in danger, i.e., those may 
+be modified asynchronously with vcpu thread itself.
+
+BTW, using a sleep like following can reliably reproduce this problem, if VM 
+equipped with more than 2 vcpus and starting VM using libvirtd.
+
+diff --git a/target/i386/kvm.c b/target/i386/kvm.c
+index 55865db..5099290 100644
+--- a/target/i386/kvm.c
++++ b/target/i386/kvm.c
+@@ -2534,6 +2534,11 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
+             KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR;
+     }
+
++    if (CPU(cpu)->cpu_index == 1) {
++        fprintf(stderr, "vcpu 1 sleep!!!!\n");
++        sleep(10);
++    }
++
+     return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events);
+ }
+
+
+On 2017/3/20 22:21, Herongguang (Stephen) wrote:
+Hi,
+We encountered a problem that when a domain starts, seabios failed to online a 
+vCPU.
+
+After investigation, we found that the reason is in kvm-kmod, KVM_APIC_INIT bit 
+in
+vcpu->arch.apic->pending_events was overwritten by qemu, and thus an INIT IPI 
+sent
+to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp command 
+to qemu
+on VM start.
+
+In qemu, qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
+do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from kvm-kmod and
+sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
+kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus pending_events is
+overwritten by qemu.
+
+I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true after 
+‘query-cpus’,
+and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am not sure 
+whether
+it is OK for qemu to set cpu->kvm_vcpu_dirty in do_kvm_cpu_synchronize_state in 
+each caller.
+
+What’s your opinion?
+
+On 20/03/2017 15:21, Herongguang (Stephen) wrote:
+>
+>
+We encountered a problem that when a domain starts, seabios failed to
+>
+online a vCPU.
+>
+>
+After investigation, we found that the reason is in kvm-kmod,
+>
+KVM_APIC_INIT bit in
+>
+vcpu->arch.apic->pending_events was overwritten by qemu, and thus an
+>
+INIT IPI sent
+>
+to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp
+>
+command to qemu
+>
+on VM start.
+>
+>
+In qemu, qmp_query_cpus-> cpu_synchronize_state->
+>
+kvm_cpu_synchronize_state->
+>
+do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from
+>
+kvm-kmod and
+>
+sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
+>
+kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus
+>
+pending_events is
+>
+overwritten by qemu.
+>
+>
+I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true
+>
+after ‘query-cpus’,
+>
+and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am
+>
+not sure whether
+>
+it is OK for qemu to set cpu->kvm_vcpu_dirty in
+>
+do_kvm_cpu_synchronize_state in each caller.
+>
+>
+What’s your opinion?
+Hi Rongguang,
+
+sorry for the late response.
+
+Where exactly is KVM_APIC_INIT dropped?  kvm_get_mp_state does clear the
+bit, but the result of the INIT is stored in mp_state.
+
+kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves
+KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes
+it back.  Maybe it should ignore events.smi.latched_init if not in SMM,
+but I would like to understand the exact sequence of events.
+
+Thanks,
+
+paolo
+
+On 2017/4/6 0:16, Paolo Bonzini wrote:
+On 20/03/2017 15:21, Herongguang (Stephen) wrote:
+We encountered a problem that when a domain starts, seabios failed to
+online a vCPU.
+
+After investigation, we found that the reason is in kvm-kmod,
+KVM_APIC_INIT bit in
+vcpu->arch.apic->pending_events was overwritten by qemu, and thus an
+INIT IPI sent
+to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp
+command to qemu
+on VM start.
+
+In qemu, qmp_query_cpus-> cpu_synchronize_state->
+kvm_cpu_synchronize_state->
+do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from
+kvm-kmod and
+sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
+kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus
+pending_events is
+overwritten by qemu.
+
+I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true
+after ‘query-cpus’,
+and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am
+not sure whether
+it is OK for qemu to set cpu->kvm_vcpu_dirty in
+do_kvm_cpu_synchronize_state in each caller.
+
+What’s your opinion?
+Hi Rongguang,
+
+sorry for the late response.
+
+Where exactly is KVM_APIC_INIT dropped?  kvm_get_mp_state does clear the
+bit, but the result of the INIT is stored in mp_state.
+It's dropped in KVM_SET_VCPU_EVENTS, see below.
+kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves
+KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes
+it back.  Maybe it should ignore events.smi.latched_init if not in SMM,
+but I would like to understand the exact sequence of events.
+time0:
+vcpu1:
+qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
+> do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to true)-> 
+kvm_arch_get_registers(KVM_APIC_INIT bit in vcpu->arch.apic->pending_events was not set)
+
+time1:
+vcpu0:
+send INIT-SIPI to all AP->(in vcpu 0's context)__apic_accept_irq(KVM_APIC_INIT bit 
+in vcpu1's arch.apic->pending_events is set)
+
+time2:
+vcpu1:
+kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is 
+true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten KVM_APIC_INIT bit in 
+vcpu->arch.apic->pending_events!)
+
+So it's a race between vcpu1 get/put registers with kvm/other vcpus changing 
+vcpu1's status/structure fields in the mean time, I am in worry of if there are 
+other fields may be overwritten,
+sipi_vector is one.
+
+also see:
+https://www.mail-archive.com/address@hidden/msg438675.html
+Thanks,
+
+paolo
+
+.
+
+Hi Paolo,
+
+What's your opinion about this patch? We found it just before finishing patches 
+for the past two days.
+
+
+Thanks,
+-Gonglei
+
+
+>
+-----Original Message-----
+>
+From: address@hidden [
+mailto:address@hidden
+On
+>
+Behalf Of Herongguang (Stephen)
+>
+Sent: Thursday, April 06, 2017 9:47 AM
+>
+To: Paolo Bonzini; address@hidden; address@hidden;
+>
+address@hidden; address@hidden; address@hidden;
+>
+wangxin (U); Huangweidong (C)
+>
+Subject: Re: [BUG/RFC] INIT IPI lost when VM starts
+>
+>
+>
+>
+On 2017/4/6 0:16, Paolo Bonzini wrote:
+>
+>
+>
+> On 20/03/2017 15:21, Herongguang (Stephen) wrote:
+>
+>> We encountered a problem that when a domain starts, seabios failed to
+>
+>> online a vCPU.
+>
+>>
+>
+>> After investigation, we found that the reason is in kvm-kmod,
+>
+>> KVM_APIC_INIT bit in
+>
+>> vcpu->arch.apic->pending_events was overwritten by qemu, and thus an
+>
+>> INIT IPI sent
+>
+>> to AP was lost. Qemu does this since libvirtd sends a ‘query-cpus’ qmp
+>
+>> command to qemu
+>
+>> on VM start.
+>
+>>
+>
+>> In qemu, qmp_query_cpus-> cpu_synchronize_state->
+>
+>> kvm_cpu_synchronize_state->
+>
+>> do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from
+>
+>> kvm-kmod and
+>
+>> sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call
+>
+>> kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus
+>
+>> pending_events is
+>
+>> overwritten by qemu.
+>
+>>
+>
+>> I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true
+>
+>> after ‘query-cpus’,
+>
+>> and  kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am
+>
+>> not sure whether
+>
+>> it is OK for qemu to set cpu->kvm_vcpu_dirty in
+>
+>> do_kvm_cpu_synchronize_state in each caller.
+>
+>>
+>
+>> What’s your opinion?
+>
+> Hi Rongguang,
+>
+>
+>
+> sorry for the late response.
+>
+>
+>
+> Where exactly is KVM_APIC_INIT dropped?  kvm_get_mp_state does clear
+>
+the
+>
+> bit, but the result of the INIT is stored in mp_state.
+>
+>
+It's dropped in KVM_SET_VCPU_EVENTS, see below.
+>
+>
+>
+>
+> kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves
+>
+> KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes
+>
+> it back.  Maybe it should ignore events.smi.latched_init if not in SMM,
+>
+> but I would like to understand the exact sequence of events.
+>
+>
+time0:
+>
+vcpu1:
+>
+qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
+>
+> do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to
+>
+true)-> kvm_arch_get_registers(KVM_APIC_INIT bit in
+>
+vcpu->arch.apic->pending_events was not set)
+>
+>
+time1:
+>
+vcpu0:
+>
+send INIT-SIPI to all AP->(in vcpu 0's
+>
+context)__apic_accept_irq(KVM_APIC_INIT bit in vcpu1's
+>
+arch.apic->pending_events is set)
+>
+>
+time2:
+>
+vcpu1:
+>
+kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is
+>
+true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten
+>
+KVM_APIC_INIT bit in vcpu->arch.apic->pending_events!)
+>
+>
+So it's a race between vcpu1 get/put registers with kvm/other vcpus changing
+>
+vcpu1's status/structure fields in the mean time, I am in worry of if there
+>
+are
+>
+other fields may be overwritten,
+>
+sipi_vector is one.
+>
+>
+also see:
+>
+https://www.mail-archive.com/address@hidden/msg438675.html
+>
+>
+> Thanks,
+>
+>
+>
+> paolo
+>
+>
+>
+> .
+>
+>
+>
+
+2017-11-20 06:57+0000, Gonglei (Arei):
+>
+Hi Paolo,
+>
+>
+What's your opinion about this patch? We found it just before finishing
+>
+patches
+>
+for the past two days.
+I think your case was fixed by f4ef19108608 ("KVM: X86: Fix loss of
+pending INIT due to race"), but that patch didn't fix it perfectly, so
+maybe you're hitting a similar case that happens in SMM ...
+
+>
+> -----Original Message-----
+>
+> From: address@hidden [
+mailto:address@hidden
+On
+>
+> Behalf Of Herongguang (Stephen)
+>
+> On 2017/4/6 0:16, Paolo Bonzini wrote:
+>
+> > Hi Rongguang,
+>
+> >
+>
+> > sorry for the late response.
+>
+> >
+>
+> > Where exactly is KVM_APIC_INIT dropped?  kvm_get_mp_state does clear
+>
+> the
+>
+> > bit, but the result of the INIT is stored in mp_state.
+>
+>
+>
+> It's dropped in KVM_SET_VCPU_EVENTS, see below.
+>
+>
+>
+> >
+>
+> > kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves
+>
+> > KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes
+>
+> > it back.  Maybe it should ignore events.smi.latched_init if not in SMM,
+>
+> > but I would like to understand the exact sequence of events.
+>
+>
+>
+> time0:
+>
+> vcpu1:
+>
+> qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state->
+>
+>  > do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to
+>
+> true)-> kvm_arch_get_registers(KVM_APIC_INIT bit in
+>
+> vcpu->arch.apic->pending_events was not set)
+>
+>
+>
+> time1:
+>
+> vcpu0:
+>
+> send INIT-SIPI to all AP->(in vcpu 0's
+>
+> context)__apic_accept_irq(KVM_APIC_INIT bit in vcpu1's
+>
+> arch.apic->pending_events is set)
+>
+>
+>
+> time2:
+>
+> vcpu1:
+>
+> kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is
+>
+> true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten
+>
+> KVM_APIC_INIT bit in vcpu->arch.apic->pending_events!)
+>
+>
+>
+> So it's a race between vcpu1 get/put registers with kvm/other vcpus changing
+>
+> vcpu1's status/structure fields in the mean time, I am in worry of if there
+>
+> are
+>
+> other fields may be overwritten,
+>
+> sipi_vector is one.
+Fields that can be asynchronously written by other VCPUs (like SIPI,
+NMI) must not be SET if other VCPUs were not paused since the last GET.
+(Looking at the interface, we can currently lose pending SMI.)
+
+INIT is one of the restricted fields, but the API unconditionally
+couples SMM with latched INIT, which means that we can lose an INIT if
+the VCPU is in SMM mode -- do you see SMM in kvm_vcpu_events?
+
+Thanks.
+
diff --git a/results/classifier/016/boot/51610399 b/results/classifier/016/boot/51610399
new file mode 100644
index 00000000..615e86a8
--- /dev/null
+++ b/results/classifier/016/boot/51610399
@@ -0,0 +1,335 @@
+ppc: 0.974
+boot: 0.964
+KVM: 0.904
+operating system: 0.593
+register: 0.561
+virtual: 0.492
+debug: 0.468
+kernel: 0.418
+TCG: 0.156
+PID: 0.089
+hypervisor: 0.070
+device: 0.058
+socket: 0.048
+files: 0.031
+user-level: 0.019
+performance: 0.010
+semantic: 0.007
+VMM: 0.004
+network: 0.004
+architecture: 0.003
+assembly: 0.002
+graphic: 0.002
+permissions: 0.002
+peripherals: 0.001
+risc-v: 0.001
+vnc: 0.001
+x86: 0.001
+alpha: 0.001
+mistranslation: 0.000
+arm: 0.000
+i386: 0.000
+
+[BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()”
+
+Bug Description:
+Encountering a boot failure when launching a KVM guest with
+qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor
+crashes.
+Reproduction Steps:
+# qemu-system-ppc64 --version
+QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
+Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
+# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
+pseries,accel=kvm \
+-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
+  -device virtio-scsi-pci,id=scsi \
+-drive
+file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
+\
+-device scsi-hd,drive=drive0,bus=scsi.0 \
+  -netdev bridge,id=net0,br=virbr0 \
+  -device virtio-net-pci,netdev=net0 \
+  -serial pty \
+  -device virtio-balloon-pci \
+  -cpu host
+QEMU 9.2.50 monitor - type 'help' for more information
+char device redirected to /dev/pts/2 (label serial0)
+(qemu)
+(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but
+unavailable: IRQ_XIVE capability must be present for KVM
+Falling back to kernel-irqchip=off
+** Qemu Hang
+
+(In another ssh session)
+# screen /dev/pts/2
+Preparing to boot Linux version 6.10.4-200.fc40.ppc64le
+(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801
+(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11
+15:20:17 UTC 2024
+Detected machine type: 0000000000000101
+command line:
+BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le
+root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
+Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
+Calling ibm,client-architecture-support... done
+memory layout at init:
+  memory_limit : 0000000000000000 (16 MB aligned)
+  alloc_bottom : 0000000008200000
+  alloc_top    : 0000000030000000
+  alloc_top_hi : 0000000800000000
+  rmo_top      : 0000000030000000
+  ram_top      : 0000000800000000
+instantiating rtas at 0x000000002fff0000... done
+prom_hold_cpus: skipped
+copying OF device tree...
+Building dt strings...
+Building dt structure...
+Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
+Device tree struct  0x0000000008220000 -> 0x0000000008230000
+Quiescing Open Firmware ...
+Booting Linux via __start() @ 0x0000000000440000 ...
+** Guest Console Hang
+
+
+Git Bisect:
+Performing git bisect points to the following patch:
+# git bisect bad
+e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
+commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
+Author: Nicholas Piggin <npiggin@gmail.com>
+Date:   Thu Dec 19 13:40:31 2024 +1000
+
+    target/ppc: fix timebase register reset state
+(H)DEC and PURR get reset before icount does, which causes them to
+be
+skewed and not match the init state. This can cause replay to not
+match the recorded trace exactly. For DEC and HDEC this is usually
+not
+noticable since they tend to get programmed before affecting the
+    target machine. PURR has been observed to cause replay bugs when
+    running Linux.
+
+    Fix this by resetting using a time of 0.
+
+    Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
+    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
+
+ hw/ppc/ppc.c | 11 ++++++++---
+ 1 file changed, 8 insertions(+), 3 deletions(-)
+
+
+Reverting the patch helps boot the guest.
+Thanks,
+Misbah Anjum N
+
+Thanks for the report.
+
+Tricky problem. A secondary CPU is hanging before it is started by the
+primary via rtas call.
+
+That secondary keeps calling kvm_cpu_exec(), which keeps exiting out
+early with EXCP_HLT because kvm_arch_process_async_events() returns
+true because that cpu has ->halted=1. That just goes around he run
+loop because there is an interrupt pending (DEC).
+
+So it never runs. It also never releases the BQL, and another CPU,
+the primary which is actually supposed to be running, is stuck in
+spapr_set_all_lpcrs() in run_on_cpu() waiting for the BQL.
+
+This patch just exposes the bug I think, by causing the interrupt.
+although I'm not quite sure why it's okay previously (-ve decrementer
+values should be causing a timer exception too). The timer exception
+should not be taken as an interrupt by those secondary CPUs, and it
+doesn't because it is masked, until set_all_lpcrs sets an LPCR value
+that enables powersave wakeup on decrementer interrupt.
+
+The start_powered_off sate just sets ->halted, which makes it look
+like a powersaving state. Logically I think it's not the same thing
+as far as spapr goes. I don't know why start_powered_off only sets
+->halted, and not ->stop/stopped as well.
+
+Not sure how best to solve it cleanly. I'll send a revert if I can't
+get something working soon.
+
+Thanks,
+Nick
+
+On Tue Mar 18, 2025 at 7:09 AM AEST, misanjum wrote:
+>
+Bug Description:
+>
+Encountering a boot failure when launching a KVM guest with
+>
+qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor
+>
+crashes.
+>
+>
+>
+Reproduction Steps:
+>
+# qemu-system-ppc64 --version
+>
+QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
+>
+Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
+>
+>
+# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
+>
+pseries,accel=kvm \
+>
+-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
+>
+-device virtio-scsi-pci,id=scsi \
+>
+-drive
+>
+file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
+>
+>
+\
+>
+-device scsi-hd,drive=drive0,bus=scsi.0 \
+>
+-netdev bridge,id=net0,br=virbr0 \
+>
+-device virtio-net-pci,netdev=net0 \
+>
+-serial pty \
+>
+-device virtio-balloon-pci \
+>
+-cpu host
+>
+QEMU 9.2.50 monitor - type 'help' for more information
+>
+char device redirected to /dev/pts/2 (label serial0)
+>
+(qemu)
+>
+(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but
+>
+unavailable: IRQ_XIVE capability must be present for KVM
+>
+Falling back to kernel-irqchip=off
+>
+** Qemu Hang
+>
+>
+(In another ssh session)
+>
+# screen /dev/pts/2
+>
+Preparing to boot Linux version 6.10.4-200.fc40.ppc64le
+>
+(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801
+>
+(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11
+>
+15:20:17 UTC 2024
+>
+Detected machine type: 0000000000000101
+>
+command line:
+>
+BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le
+>
+root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
+>
+Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
+>
+Calling ibm,client-architecture-support... done
+>
+memory layout at init:
+>
+memory_limit : 0000000000000000 (16 MB aligned)
+>
+alloc_bottom : 0000000008200000
+>
+alloc_top    : 0000000030000000
+>
+alloc_top_hi : 0000000800000000
+>
+rmo_top      : 0000000030000000
+>
+ram_top      : 0000000800000000
+>
+instantiating rtas at 0x000000002fff0000... done
+>
+prom_hold_cpus: skipped
+>
+copying OF device tree...
+>
+Building dt strings...
+>
+Building dt structure...
+>
+Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
+>
+Device tree struct  0x0000000008220000 -> 0x0000000008230000
+>
+Quiescing Open Firmware ...
+>
+Booting Linux via __start() @ 0x0000000000440000 ...
+>
+** Guest Console Hang
+>
+>
+>
+Git Bisect:
+>
+Performing git bisect points to the following patch:
+>
+# git bisect bad
+>
+e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
+>
+commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
+>
+Author: Nicholas Piggin <npiggin@gmail.com>
+>
+Date:   Thu Dec 19 13:40:31 2024 +1000
+>
+>
+target/ppc: fix timebase register reset state
+>
+>
+(H)DEC and PURR get reset before icount does, which causes them to
+>
+be
+>
+skewed and not match the init state. This can cause replay to not
+>
+match the recorded trace exactly. For DEC and HDEC this is usually
+>
+not
+>
+noticable since they tend to get programmed before affecting the
+>
+target machine. PURR has been observed to cause replay bugs when
+>
+running Linux.
+>
+>
+Fix this by resetting using a time of 0.
+>
+>
+Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
+>
+Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
+>
+>
+hw/ppc/ppc.c | 11 ++++++++---
+>
+1 file changed, 8 insertions(+), 3 deletions(-)
+>
+>
+>
+Reverting the patch helps boot the guest.
+>
+Thanks,
+>
+Misbah Anjum N
+