diff options
Diffstat (limited to 'results/classifier/003/boot/51610399')
| -rw-r--r-- | results/classifier/003/boot/51610399 | 311 |
1 files changed, 311 insertions, 0 deletions
diff --git a/results/classifier/003/boot/51610399 b/results/classifier/003/boot/51610399 new file mode 100644 index 000000000..481231d26 --- /dev/null +++ b/results/classifier/003/boot/51610399 @@ -0,0 +1,311 @@ +boot: 0.986 +instruction: 0.985 +other: 0.985 +semantic: 0.984 +mistranslation: 0.983 +KVM: 0.975 +network: 0.973 + +[BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()” + +Bug Description: +Encountering a boot failure when launching a KVM guest with +qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor +crashes. +Reproduction Steps: +# qemu-system-ppc64 --version +QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) +Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers +# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine +pseries,accel=kvm \ +-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ + -device virtio-scsi-pci,id=scsi \ +-drive +file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 +\ +-device scsi-hd,drive=drive0,bus=scsi.0 \ + -netdev bridge,id=net0,br=virbr0 \ + -device virtio-net-pci,netdev=net0 \ + -serial pty \ + -device virtio-balloon-pci \ + -cpu host +QEMU 9.2.50 monitor - type 'help' for more information +char device redirected to /dev/pts/2 (label serial0) +(qemu) +(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but +unavailable: IRQ_XIVE capability must be present for KVM +Falling back to kernel-irqchip=off +** Qemu Hang + +(In another ssh session) +# screen /dev/pts/2 +Preparing to boot Linux version 6.10.4-200.fc40.ppc64le +(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 +(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 +15:20:17 UTC 2024 +Detected machine type: 0000000000000101 +command line: +BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le +root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M +Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) +Calling ibm,client-architecture-support... done +memory layout at init: + memory_limit : 0000000000000000 (16 MB aligned) + alloc_bottom : 0000000008200000 + alloc_top : 0000000030000000 + alloc_top_hi : 0000000800000000 + rmo_top : 0000000030000000 + ram_top : 0000000800000000 +instantiating rtas at 0x000000002fff0000... done +prom_hold_cpus: skipped +copying OF device tree... +Building dt strings... +Building dt structure... +Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 +Device tree struct 0x0000000008220000 -> 0x0000000008230000 +Quiescing Open Firmware ... +Booting Linux via __start() @ 0x0000000000440000 ... +** Guest Console Hang + + +Git Bisect: +Performing git bisect points to the following patch: +# git bisect bad +e8291ec16da80566c121c68d9112be458954d90b is the first bad commit +commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) +Author: Nicholas Piggin <npiggin@gmail.com> +Date: Thu Dec 19 13:40:31 2024 +1000 + + target/ppc: fix timebase register reset state +(H)DEC and PURR get reset before icount does, which causes them to +be +skewed and not match the init state. This can cause replay to not +match the recorded trace exactly. For DEC and HDEC this is usually +not +noticable since they tend to get programmed before affecting the + target machine. PURR has been observed to cause replay bugs when + running Linux. + + Fix this by resetting using a time of 0. + + Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> + Signed-off-by: Nicholas Piggin <npiggin@gmail.com> + + hw/ppc/ppc.c | 11 ++++++++--- + 1 file changed, 8 insertions(+), 3 deletions(-) + + +Reverting the patch helps boot the guest. +Thanks, +Misbah Anjum N + +Thanks for the report. + +Tricky problem. A secondary CPU is hanging before it is started by the +primary via rtas call. + +That secondary keeps calling kvm_cpu_exec(), which keeps exiting out +early with EXCP_HLT because kvm_arch_process_async_events() returns +true because that cpu has ->halted=1. That just goes around he run +loop because there is an interrupt pending (DEC). + +So it never runs. It also never releases the BQL, and another CPU, +the primary which is actually supposed to be running, is stuck in +spapr_set_all_lpcrs() in run_on_cpu() waiting for the BQL. + +This patch just exposes the bug I think, by causing the interrupt. +although I'm not quite sure why it's okay previously (-ve decrementer +values should be causing a timer exception too). The timer exception +should not be taken as an interrupt by those secondary CPUs, and it +doesn't because it is masked, until set_all_lpcrs sets an LPCR value +that enables powersave wakeup on decrementer interrupt. + +The start_powered_off sate just sets ->halted, which makes it look +like a powersaving state. Logically I think it's not the same thing +as far as spapr goes. I don't know why start_powered_off only sets +->halted, and not ->stop/stopped as well. + +Not sure how best to solve it cleanly. I'll send a revert if I can't +get something working soon. + +Thanks, +Nick + +On Tue Mar 18, 2025 at 7:09 AM AEST, misanjum wrote: +> +Bug Description: +> +Encountering a boot failure when launching a KVM guest with +> +qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor +> +crashes. +> +> +> +Reproduction Steps: +> +# qemu-system-ppc64 --version +> +QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) +> +Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers +> +> +# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine +> +pseries,accel=kvm \ +> +-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ +> +-device virtio-scsi-pci,id=scsi \ +> +-drive +> +file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 +> +> +\ +> +-device scsi-hd,drive=drive0,bus=scsi.0 \ +> +-netdev bridge,id=net0,br=virbr0 \ +> +-device virtio-net-pci,netdev=net0 \ +> +-serial pty \ +> +-device virtio-balloon-pci \ +> +-cpu host +> +QEMU 9.2.50 monitor - type 'help' for more information +> +char device redirected to /dev/pts/2 (label serial0) +> +(qemu) +> +(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but +> +unavailable: IRQ_XIVE capability must be present for KVM +> +Falling back to kernel-irqchip=off +> +** Qemu Hang +> +> +(In another ssh session) +> +# screen /dev/pts/2 +> +Preparing to boot Linux version 6.10.4-200.fc40.ppc64le +> +(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 +> +(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 +> +15:20:17 UTC 2024 +> +Detected machine type: 0000000000000101 +> +command line: +> +BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le +> +root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M +> +Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) +> +Calling ibm,client-architecture-support... done +> +memory layout at init: +> +memory_limit : 0000000000000000 (16 MB aligned) +> +alloc_bottom : 0000000008200000 +> +alloc_top : 0000000030000000 +> +alloc_top_hi : 0000000800000000 +> +rmo_top : 0000000030000000 +> +ram_top : 0000000800000000 +> +instantiating rtas at 0x000000002fff0000... done +> +prom_hold_cpus: skipped +> +copying OF device tree... +> +Building dt strings... +> +Building dt structure... +> +Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 +> +Device tree struct 0x0000000008220000 -> 0x0000000008230000 +> +Quiescing Open Firmware ... +> +Booting Linux via __start() @ 0x0000000000440000 ... +> +** Guest Console Hang +> +> +> +Git Bisect: +> +Performing git bisect points to the following patch: +> +# git bisect bad +> +e8291ec16da80566c121c68d9112be458954d90b is the first bad commit +> +commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) +> +Author: Nicholas Piggin <npiggin@gmail.com> +> +Date: Thu Dec 19 13:40:31 2024 +1000 +> +> +target/ppc: fix timebase register reset state +> +> +(H)DEC and PURR get reset before icount does, which causes them to +> +be +> +skewed and not match the init state. This can cause replay to not +> +match the recorded trace exactly. For DEC and HDEC this is usually +> +not +> +noticable since they tend to get programmed before affecting the +> +target machine. PURR has been observed to cause replay bugs when +> +running Linux. +> +> +Fix this by resetting using a time of 0. +> +> +Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> +> +Signed-off-by: Nicholas Piggin <npiggin@gmail.com> +> +> +hw/ppc/ppc.c | 11 ++++++++--- +> +1 file changed, 8 insertions(+), 3 deletions(-) +> +> +> +Reverting the patch helps boot the guest. +> +Thanks, +> +Misbah Anjum N + |