summary refs log tree commit diff stats
path: root/results/classifier/012/all/51610399
blob: 56bfa3c9659af636d401de4bbffe234eed739df2 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
permissions: 0.988
kernel virtual machine: 0.987
debug: 0.986
boot: 0.986
graphic: 0.986
assembly: 0.985
other: 0.985
semantic: 0.984
device: 0.984
register: 0.983
mistranslation: 0.983
performance: 0.983
architecture: 0.982
files: 0.981
arm: 0.981
risc-v: 0.979
PID: 0.978
socket: 0.978
vnc: 0.974
network: 0.973
TCG: 0.973
x86: 0.952

[BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()”

Bug Description:
Encountering a boot failure when launching a KVM guest with
qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor
crashes.
Reproduction Steps:
# qemu-system-ppc64 --version
QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
pseries,accel=kvm \
-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
  -device virtio-scsi-pci,id=scsi \
-drive
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
\
-device scsi-hd,drive=drive0,bus=scsi.0 \
  -netdev bridge,id=net0,br=virbr0 \
  -device virtio-net-pci,netdev=net0 \
  -serial pty \
  -device virtio-balloon-pci \
  -cpu host
QEMU 9.2.50 monitor - type 'help' for more information
char device redirected to /dev/pts/2 (label serial0)
(qemu)
(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but
unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off
** Qemu Hang

(In another ssh session)
# screen /dev/pts/2
Preparing to boot Linux version 6.10.4-200.fc40.ppc64le
(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801
(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11
15:20:17 UTC 2024
Detected machine type: 0000000000000101
command line:
BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le
root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
memory layout at init:
  memory_limit : 0000000000000000 (16 MB aligned)
  alloc_bottom : 0000000008200000
  alloc_top    : 0000000030000000
  alloc_top_hi : 0000000800000000
  rmo_top      : 0000000030000000
  ram_top      : 0000000800000000
instantiating rtas at 0x000000002fff0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
Device tree struct  0x0000000008220000 -> 0x0000000008230000
Quiescing Open Firmware ...
Booting Linux via __start() @ 0x0000000000440000 ...
** Guest Console Hang


Git Bisect:
Performing git bisect points to the following patch:
# git bisect bad
e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Thu Dec 19 13:40:31 2024 +1000

    target/ppc: fix timebase register reset state
(H)DEC and PURR get reset before icount does, which causes them to
be
skewed and not match the init state. This can cause replay to not
match the recorded trace exactly. For DEC and HDEC this is usually
not
noticable since they tend to get programmed before affecting the
    target machine. PURR has been observed to cause replay bugs when
    running Linux.

    Fix this by resetting using a time of 0.

    Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

 hw/ppc/ppc.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)


Reverting the patch helps boot the guest.
Thanks,
Misbah Anjum N

Thanks for the report.

Tricky problem. A secondary CPU is hanging before it is started by the
primary via rtas call.

That secondary keeps calling kvm_cpu_exec(), which keeps exiting out
early with EXCP_HLT because kvm_arch_process_async_events() returns
true because that cpu has ->halted=1. That just goes around he run
loop because there is an interrupt pending (DEC).

So it never runs. It also never releases the BQL, and another CPU,
the primary which is actually supposed to be running, is stuck in
spapr_set_all_lpcrs() in run_on_cpu() waiting for the BQL.

This patch just exposes the bug I think, by causing the interrupt.
although I'm not quite sure why it's okay previously (-ve decrementer
values should be causing a timer exception too). The timer exception
should not be taken as an interrupt by those secondary CPUs, and it
doesn't because it is masked, until set_all_lpcrs sets an LPCR value
that enables powersave wakeup on decrementer interrupt.

The start_powered_off sate just sets ->halted, which makes it look
like a powersaving state. Logically I think it's not the same thing
as far as spapr goes. I don't know why start_powered_off only sets
->halted, and not ->stop/stopped as well.

Not sure how best to solve it cleanly. I'll send a revert if I can't
get something working soon.

Thanks,
Nick

On Tue Mar 18, 2025 at 7:09 AM AEST, misanjum wrote:
>
Bug Description:
>
Encountering a boot failure when launching a KVM guest with
>
qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor
>
crashes.
>
>
>
Reproduction Steps:
>
# qemu-system-ppc64 --version
>
QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
>
Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
>
>
# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
>
pseries,accel=kvm \
>
-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
>
-device virtio-scsi-pci,id=scsi \
>
-drive
>
file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
>
>
\
>
-device scsi-hd,drive=drive0,bus=scsi.0 \
>
-netdev bridge,id=net0,br=virbr0 \
>
-device virtio-net-pci,netdev=net0 \
>
-serial pty \
>
-device virtio-balloon-pci \
>
-cpu host
>
QEMU 9.2.50 monitor - type 'help' for more information
>
char device redirected to /dev/pts/2 (label serial0)
>
(qemu)
>
(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but
>
unavailable: IRQ_XIVE capability must be present for KVM
>
Falling back to kernel-irqchip=off
>
** Qemu Hang
>
>
(In another ssh session)
>
# screen /dev/pts/2
>
Preparing to boot Linux version 6.10.4-200.fc40.ppc64le
>
(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801
>
(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11
>
15:20:17 UTC 2024
>
Detected machine type: 0000000000000101
>
command line:
>
BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le
>
root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
>
Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
>
Calling ibm,client-architecture-support... done
>
memory layout at init:
>
memory_limit : 0000000000000000 (16 MB aligned)
>
alloc_bottom : 0000000008200000
>
alloc_top    : 0000000030000000
>
alloc_top_hi : 0000000800000000
>
rmo_top      : 0000000030000000
>
ram_top      : 0000000800000000
>
instantiating rtas at 0x000000002fff0000... done
>
prom_hold_cpus: skipped
>
copying OF device tree...
>
Building dt strings...
>
Building dt structure...
>
Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
>
Device tree struct  0x0000000008220000 -> 0x0000000008230000
>
Quiescing Open Firmware ...
>
Booting Linux via __start() @ 0x0000000000440000 ...
>
** Guest Console Hang
>
>
>
Git Bisect:
>
Performing git bisect points to the following patch:
>
# git bisect bad
>
e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
>
commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
>
Author: Nicholas Piggin <npiggin@gmail.com>
>
Date:   Thu Dec 19 13:40:31 2024 +1000
>
>
target/ppc: fix timebase register reset state
>
>
(H)DEC and PURR get reset before icount does, which causes them to
>
be
>
skewed and not match the init state. This can cause replay to not
>
match the recorded trace exactly. For DEC and HDEC this is usually
>
not
>
noticable since they tend to get programmed before affecting the
>
target machine. PURR has been observed to cause replay bugs when
>
running Linux.
>
>
Fix this by resetting using a time of 0.
>
>
Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>
>
hw/ppc/ppc.c | 11 ++++++++---
>
1 file changed, 8 insertions(+), 3 deletions(-)
>
>
>
Reverting the patch helps boot the guest.
>
Thanks,
>
Misbah Anjum N