graphic: 0.965
semantic: 0.953
other: 0.944
assembly: 0.940
device: 0.935
instruction: 0.919
socket: 0.864
boot: 0.840
vnc: 0.815
network: 0.813
mistranslation: 0.799
KVM: 0.701
[BUG]QEMU jump into interrupt when single-stepping on aarch64
Dear, folks,
I try to debug Linux kernel with QEMU in single-stepping mode on aarch64
platform,
the added breakpoint hits but after I type `step`, the gdb always jumps into
interrupt.
My env:
gdb-10.2
qemu-6.2.0
host kernel: 5.10.84
VM kernel: 5.10.84
The steps to reproduce:
# host console: run a VM with only one core, the import arg:
# details can be found here:
https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt
virsh create dev_core0.xml
# run gdb client
gdb ./vmlinux
# gdb client on host console
(gdb) dir
./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64
(gdb) target remote localhost:1234
(gdb) info b
Num Type Disp Enb Address What
1 breakpoint keep y
1.1 y 0xffff800010361444
mm/memory-failure.c:1318
1.2 y 0xffff800010361450 in memory_failure
at mm/memory-failure.c:1488
(gdb) c
Continuing.
# console in VM, use madvise to inject a hwposion at virtual address
vaddr,
# which will hit the b inmemory_failur: madvise(vaddr, pagesize,
MADV_HWPOISON);
# and the VM pause
./run_madvise.c
# gdb client on host console
(gdb)
Continuing.
Breakpoint 1, 0xffff800010361444 in memory_failure () at
mm/memory-failure.c:1318
1318 res = -EHWPOISON;
(gdb) n
vectors () at arch/arm64/kernel/entry.S:552
552 kernel_ventry 1, irq // IRQ
EL1h
(gdb) n
(gdb) n
(gdb) n
(gdb) n
gic_handle_irq (regs=0xffff8000147c3b80) at
drivers/irqchip/irq-gic-v3.c:721
# after several step, I got the irqnr
(gdb) p irqnr
$5 = 8262
Sometimes, the irqnr is 27ï¼ which is used for arch_timer.
I was wondering do you have any comments on this? And feedback are welcomed.
Thank you.
Best Regards.
Shuai
On 4/6/22 09:30, Shuai Xue wrote:
Dear, folks,
I try to debug Linux kernel with QEMU in single-stepping mode on aarch64
platform,
the added breakpoint hits but after I type `step`, the gdb always jumps into
interrupt.
My env:
gdb-10.2
qemu-6.2.0
host kernel: 5.10.84
VM kernel: 5.10.84
The steps to reproduce:
# host console: run a VM with only one core, the import arg:
# details can be found here:
https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt
virsh create dev_core0.xml
# run gdb client
gdb ./vmlinux
# gdb client on host console
(gdb) dir
./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64
(gdb) target remote localhost:1234
(gdb) info b
Num Type Disp Enb Address What
1 breakpoint keep y
1.1 y 0xffff800010361444
mm/memory-failure.c:1318
1.2 y 0xffff800010361450 in memory_failure
at mm/memory-failure.c:1488
(gdb) c
Continuing.
# console in VM, use madvise to inject a hwposion at virtual address
vaddr,
# which will hit the b inmemory_failur: madvise(vaddr, pagesize,
MADV_HWPOISON);
# and the VM pause
./run_madvise.c
# gdb client on host console
(gdb)
Continuing.
Breakpoint 1, 0xffff800010361444 in memory_failure () at
mm/memory-failure.c:1318
1318 res = -EHWPOISON;
(gdb) n
vectors () at arch/arm64/kernel/entry.S:552
552 kernel_ventry 1, irq // IRQ
EL1h
The 'n' command is not a single-step: use stepi, which will suppress interrupts.
Anyway, not a bug.
r~
å¨ 2022/4/7 AM12:57, Richard Henderson åé:
>
On 4/6/22 09:30, Shuai Xue wrote:
>
> Dear, folks,
>
>
>
> I try to debug Linux kernel with QEMU in single-stepping mode on aarch64
>
> platform,
>
> the added breakpoint hits but after I type `step`, the gdb always jumps into
>
> interrupt.
>
>
>
> My env:
>
>
>
> Â Â Â Â gdb-10.2
>
> Â Â Â Â qemu-6.2.0
>
> Â Â Â Â host kernel: 5.10.84
>
> Â Â Â Â VM kernel: 5.10.84
>
>
>
> The steps to reproduce:
>
> Â Â Â Â # host console: run a VM with only one core, the import arg:
> value='-s'/>
>
> Â Â Â Â # details can be found here:
>
>
https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt
>
> Â Â Â Â virsh create dev_core0.xml
>
> Â Â Â Â
>
> Â Â Â Â # run gdb client
>
> Â Â Â Â gdb ./vmlinux
>
>
>
> Â Â Â Â # gdb client on host console
>
> Â Â Â Â (gdb) dir
>
> ./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64
>
> Â Â Â Â (gdb) target remote localhost:1234
>
> Â Â Â Â (gdb) info b
>
>     Num    Type          Disp Enb Address           What
>
>     1      breakpoint    keep y Â
>
>     1.1                        y  0xffff800010361444
>
> mm/memory-failure.c:1318
>
>     1.2                        y  0xffff800010361450 in memory_failure
>
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â at
>
> mm/memory-failure.c:1488
>
> Â Â Â Â (gdb) c
>
> Â Â Â Â Continuing.
>
>
>
> Â Â Â Â # console in VM, use madvise to inject a hwposion at virtual address
>
> vaddr,
>
> Â Â Â Â # which will hit the b inmemory_failur: madvise(vaddr, pagesize,
>
> MADV_HWPOISON);
>
> Â Â Â Â # and the VM pause
>
> Â Â Â Â ./run_madvise.c
>
>
>
> Â Â Â Â # gdb client on host console
>
> Â Â Â Â (gdb)
>
> Â Â Â Â Continuing.
>
> Â Â Â Â Breakpoint 1, 0xffff800010361444 in memory_failure () at
>
> mm/memory-failure.c:1318
>
> Â Â Â Â 1318Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â res = -EHWPOISON;
>
> Â Â Â Â (gdb) n
>
> Â Â Â Â vectors () at arch/arm64/kernel/entry.S:552
>
>     552            kernel_ventry  1, irq                         // IRQ
>
> EL1h
>
>
The 'n' command is not a single-step: use stepi, which will suppress
>
interrupts.
>
Anyway, not a bug.
>
>
r~
Hi, Richard,
Thank you for your quick reply, I also try `stepi`, but it does NOT work either.
(gdb) c
Continuing.
Breakpoint 1, memory_failure (pfn=1273982, flags=1) at
mm/memory-failure.c:1488
1488 {
(gdb) stepi
vectors () at arch/arm64/kernel/entry.S:552
552 kernel_ventry 1, irq // IRQ
EL1h
According to QEMU doc[1]: the default single stepping behavior is step with the
IRQs
and timer service routines off. I checked the MASK bits used to control the
single
stepping IE on my machine as bellow:
# gdb client on host (x86 plafrom)
(gdb) maintenance packet qqemu.sstepbits
sending: "qqemu.sstepbits"
received: "ENABLE=1,NOIRQ=2,NOTIMER=4"
The sstep MASK looks as expected, but does not work as expected.
I also try the same kernel and qemu version on X86 platform:
>
> gdb-10.2
>
> qemu-6.2.0
>
> host kernel: 5.10.84
>
> VM kernel: 5.10.84
The command `n` jumps to the next instruction.
# gdb client on host (x86 plafrom)
(gdb) b memory-failure.c:1488
Breakpoint 1, memory_failure (pfn=1128931, flags=1) at
mm/memory-failure.c:1488
1488 {
(gdb) n
1497 if (!sysctl_memory_failure_recovery)
(gdb) stepi
0xffffffff812efdbc 1497 if
(!sysctl_memory_failure_recovery)
(gdb) stepi
0xffffffff812efdbe 1497 if
(!sysctl_memory_failure_recovery)
(gdb) n
1500 p = pfn_to_online_page(pfn);
(gdb) l
1496
1497 if (!sysctl_memory_failure_recovery)
1498 panic("Memory failure on page %lx", pfn);
1499
1500 p = pfn_to_online_page(pfn);
1501 if (!p) {
Best Regrades,
Shuai
[1]
https://github.com/qemu/qemu/blob/master/docs/system/gdb.rst
å¨ 2022/4/7 PM12:10, Shuai Xue åé:
>
å¨ 2022/4/7 AM12:57, Richard Henderson åé:
>
> On 4/6/22 09:30, Shuai Xue wrote:
>
>> Dear, folks,
>
>>
>
>> I try to debug Linux kernel with QEMU in single-stepping mode on aarch64
>
>> platform,
>
>> the added breakpoint hits but after I type `step`, the gdb always jumps
>
>> into interrupt.
>
>>
>
>> My env:
>
>>
>
>> Â Â Â Â gdb-10.2
>
>> Â Â Â Â qemu-6.2.0
>
>> Â Â Â Â host kernel: 5.10.84
>
>> Â Â Â Â VM kernel: 5.10.84
>
>>
>
>> The steps to reproduce:
>
>> Â Â Â Â # host console: run a VM with only one core, the import arg:
>> value='-s'/>
>
>> Â Â Â Â # details can be found here:
>
>>
https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt
>
>> Â Â Â Â virsh create dev_core0.xml
>
>> Â Â Â Â
>
>> Â Â Â Â # run gdb client
>
>> Â Â Â Â gdb ./vmlinux
>
>>
>
>> Â Â Â Â # gdb client on host console
>
>> Â Â Â Â (gdb) dir
>
>> ./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64
>
>> Â Â Â Â (gdb) target remote localhost:1234
>
>> Â Â Â Â (gdb) info b
>
>>     Num    Type          Disp Enb Address           What
>
>>     1      breakpoint    keep y Â
>
>>     1.1                        y  0xffff800010361444
>
>> mm/memory-failure.c:1318
>
>>     1.2                        y  0xffff800010361450 in memory_failure
>
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â at
>
>> mm/memory-failure.c:1488
>
>> Â Â Â Â (gdb) c
>
>> Â Â Â Â Continuing.
>
>>
>
>> Â Â Â Â # console in VM, use madvise to inject a hwposion at virtual address
>
>> vaddr,
>
>> Â Â Â Â # which will hit the b inmemory_failur: madvise(vaddr, pagesize,
>
>> MADV_HWPOISON);
>
>> Â Â Â Â # and the VM pause
>
>> Â Â Â Â ./run_madvise.c
>
>>
>
>> Â Â Â Â # gdb client on host console
>
>> Â Â Â Â (gdb)
>
>> Â Â Â Â Continuing.
>
>> Â Â Â Â Breakpoint 1, 0xffff800010361444 in memory_failure () at
>
>> mm/memory-failure.c:1318
>
>> Â Â Â Â 1318Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â res = -EHWPOISON;
>
>> Â Â Â Â (gdb) n
>
>> Â Â Â Â vectors () at arch/arm64/kernel/entry.S:552
>
>>     552            kernel_ventry  1, irq                         // IRQ
>
>> EL1h
>
>
>
> The 'n' command is not a single-step: use stepi, which will suppress
>
> interrupts.
>
> Anyway, not a bug.
>
>
>
> r~
>
>
Hi, Richard,
>
>
Thank you for your quick reply, I also try `stepi`, but it does NOT work
>
either.
>
>
(gdb) c
>
Continuing.
>
>
Breakpoint 1, memory_failure (pfn=1273982, flags=1) at
>
mm/memory-failure.c:1488
>
1488 {
>
(gdb) stepi
>
vectors () at arch/arm64/kernel/entry.S:552
>
552 kernel_ventry 1, irq // IRQ
>
EL1h
>
>
According to QEMU doc[1]: the default single stepping behavior is step with
>
the IRQs
>
and timer service routines off. I checked the MASK bits used to control the
>
single
>
stepping IE on my machine as bellow:
>
>
# gdb client on host (x86 plafrom)
>
(gdb) maintenance packet qqemu.sstepbits
>
sending: "qqemu.sstepbits"
>
received: "ENABLE=1,NOIRQ=2,NOTIMER=4"
>
>
The sstep MASK looks as expected, but does not work as expected.
>
>
I also try the same kernel and qemu version on X86 platform:
>
>> gdb-10.2
>
>> qemu-6.2.0
>
>> host kernel: 5.10.84
>
>> VM kernel: 5.10.84
>
>
>
The command `n` jumps to the next instruction.
>
>
# gdb client on host (x86 plafrom)
>
(gdb) b memory-failure.c:1488
>
Breakpoint 1, memory_failure (pfn=1128931, flags=1) at
>
mm/memory-failure.c:1488
>
1488 {
>
(gdb) n
>
1497 if (!sysctl_memory_failure_recovery)
>
(gdb) stepi
>
0xffffffff812efdbc 1497 if
>
(!sysctl_memory_failure_recovery)
>
(gdb) stepi
>
0xffffffff812efdbe 1497 if
>
(!sysctl_memory_failure_recovery)
>
(gdb) n
>
1500 p = pfn_to_online_page(pfn);
>
(gdb) l
>
1496
>
1497 if (!sysctl_memory_failure_recovery)
>
1498 panic("Memory failure on page %lx", pfn);
>
1499
>
1500 p = pfn_to_online_page(pfn);
>
1501 if (!p) {
>
>
Best Regrades,
>
Shuai
>
>
>
[1]
https://github.com/qemu/qemu/blob/master/docs/system/gdb.rst
Hi, Richard,
I was wondering that do you have any comments to this?
Best Regrades,
Shuai