graphic: 0.965 semantic: 0.953 other: 0.944 performance: 0.937 debug: 0.936 device: 0.935 permissions: 0.933 PID: 0.913 socket: 0.864 boot: 0.840 files: 0.835 vnc: 0.815 network: 0.813 KVM: 0.701 [BUG]QEMU jump into interrupt when single-stepping on aarch64 Dear, folks, I try to debug Linux kernel with QEMU in single-stepping mode on aarch64 platform, the added breakpoint hits but after I type `step`, the gdb always jumps into interrupt. My env: gdb-10.2 qemu-6.2.0 host kernel: 5.10.84 VM kernel: 5.10.84 The steps to reproduce: # host console: run a VM with only one core, the import arg: # details can be found here: https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt virsh create dev_core0.xml # run gdb client gdb ./vmlinux # gdb client on host console (gdb) dir ./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64 (gdb) target remote localhost:1234 (gdb) info b Num Type Disp Enb Address What 1 breakpoint keep y 1.1 y 0xffff800010361444 mm/memory-failure.c:1318 1.2 y 0xffff800010361450 in memory_failure at mm/memory-failure.c:1488 (gdb) c Continuing. # console in VM, use madvise to inject a hwposion at virtual address vaddr, # which will hit the b inmemory_failur: madvise(vaddr, pagesize, MADV_HWPOISON); # and the VM pause ./run_madvise.c # gdb client on host console (gdb) Continuing. Breakpoint 1, 0xffff800010361444 in memory_failure () at mm/memory-failure.c:1318 1318 res = -EHWPOISON; (gdb) n vectors () at arch/arm64/kernel/entry.S:552 552 kernel_ventry 1, irq // IRQ EL1h (gdb) n (gdb) n (gdb) n (gdb) n gic_handle_irq (regs=0xffff8000147c3b80) at drivers/irqchip/irq-gic-v3.c:721 # after several step, I got the irqnr (gdb) p irqnr $5 = 8262 Sometimes, the irqnr is 27, which is used for arch_timer. I was wondering do you have any comments on this? And feedback are welcomed. Thank you. Best Regards. Shuai On 4/6/22 09:30, Shuai Xue wrote: Dear, folks, I try to debug Linux kernel with QEMU in single-stepping mode on aarch64 platform, the added breakpoint hits but after I type `step`, the gdb always jumps into interrupt. My env: gdb-10.2 qemu-6.2.0 host kernel: 5.10.84 VM kernel: 5.10.84 The steps to reproduce: # host console: run a VM with only one core, the import arg: # details can be found here: https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt virsh create dev_core0.xml # run gdb client gdb ./vmlinux # gdb client on host console (gdb) dir ./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64 (gdb) target remote localhost:1234 (gdb) info b Num Type Disp Enb Address What 1 breakpoint keep y 1.1 y 0xffff800010361444 mm/memory-failure.c:1318 1.2 y 0xffff800010361450 in memory_failure at mm/memory-failure.c:1488 (gdb) c Continuing. # console in VM, use madvise to inject a hwposion at virtual address vaddr, # which will hit the b inmemory_failur: madvise(vaddr, pagesize, MADV_HWPOISON); # and the VM pause ./run_madvise.c # gdb client on host console (gdb) Continuing. Breakpoint 1, 0xffff800010361444 in memory_failure () at mm/memory-failure.c:1318 1318 res = -EHWPOISON; (gdb) n vectors () at arch/arm64/kernel/entry.S:552 552 kernel_ventry 1, irq // IRQ EL1h The 'n' command is not a single-step: use stepi, which will suppress interrupts. Anyway, not a bug. r~ 在 2022/4/7 AM12:57, Richard Henderson 写道: > On 4/6/22 09:30, Shuai Xue wrote: > > Dear, folks, > > > > I try to debug Linux kernel with QEMU in single-stepping mode on aarch64 > > platform, > > the added breakpoint hits but after I type `step`, the gdb always jumps into > > interrupt. > > > > My env: > > > >     gdb-10.2 > >     qemu-6.2.0 > >     host kernel: 5.10.84 > >     VM kernel: 5.10.84 > > > > The steps to reproduce: > >     # host console: run a VM with only one core, the import arg: > value='-s'/> > >     # details can be found here: > > https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt > >     virsh create dev_core0.xml > >     > >     # run gdb client > >     gdb ./vmlinux > > > >     # gdb client on host console > >     (gdb) dir > > ./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64 > >     (gdb) target remote localhost:1234 > >     (gdb) info b > >     Num     Type           Disp Enb Address            What > >     1       breakpoint     keep y   > >     1.1                         y   0xffff800010361444 > > mm/memory-failure.c:1318 > >     1.2                         y   0xffff800010361450 in memory_failure > >                                                     at > > mm/memory-failure.c:1488 > >     (gdb) c > >     Continuing. > > > >     # console in VM, use madvise to inject a hwposion at virtual address > > vaddr, > >     # which will hit the b inmemory_failur: madvise(vaddr, pagesize, > > MADV_HWPOISON); > >     # and the VM pause > >     ./run_madvise.c > > > >     # gdb client on host console > >     (gdb) > >     Continuing. > >     Breakpoint 1, 0xffff800010361444 in memory_failure () at > > mm/memory-failure.c:1318 > >     1318                    res = -EHWPOISON; > >     (gdb) n > >     vectors () at arch/arm64/kernel/entry.S:552 > >     552             kernel_ventry   1, irq                          // IRQ > > EL1h > > The 'n' command is not a single-step: use stepi, which will suppress > interrupts. > Anyway, not a bug. > > r~ Hi, Richard, Thank you for your quick reply, I also try `stepi`, but it does NOT work either. (gdb) c Continuing. Breakpoint 1, memory_failure (pfn=1273982, flags=1) at mm/memory-failure.c:1488 1488 { (gdb) stepi vectors () at arch/arm64/kernel/entry.S:552 552 kernel_ventry 1, irq // IRQ EL1h According to QEMU doc[1]: the default single stepping behavior is step with the IRQs and timer service routines off. I checked the MASK bits used to control the single stepping IE on my machine as bellow: # gdb client on host (x86 plafrom) (gdb) maintenance packet qqemu.sstepbits sending: "qqemu.sstepbits" received: "ENABLE=1,NOIRQ=2,NOTIMER=4" The sstep MASK looks as expected, but does not work as expected. I also try the same kernel and qemu version on X86 platform: > > gdb-10.2 > > qemu-6.2.0 > > host kernel: 5.10.84 > > VM kernel: 5.10.84 The command `n` jumps to the next instruction. # gdb client on host (x86 plafrom) (gdb) b memory-failure.c:1488 Breakpoint 1, memory_failure (pfn=1128931, flags=1) at mm/memory-failure.c:1488 1488 { (gdb) n 1497 if (!sysctl_memory_failure_recovery) (gdb) stepi 0xffffffff812efdbc 1497 if (!sysctl_memory_failure_recovery) (gdb) stepi 0xffffffff812efdbe 1497 if (!sysctl_memory_failure_recovery) (gdb) n 1500 p = pfn_to_online_page(pfn); (gdb) l 1496 1497 if (!sysctl_memory_failure_recovery) 1498 panic("Memory failure on page %lx", pfn); 1499 1500 p = pfn_to_online_page(pfn); 1501 if (!p) { Best Regrades, Shuai [1] https://github.com/qemu/qemu/blob/master/docs/system/gdb.rst 在 2022/4/7 PM12:10, Shuai Xue 写道: > 在 2022/4/7 AM12:57, Richard Henderson 写道: > > On 4/6/22 09:30, Shuai Xue wrote: > >> Dear, folks, > >> > >> I try to debug Linux kernel with QEMU in single-stepping mode on aarch64 > >> platform, > >> the added breakpoint hits but after I type `step`, the gdb always jumps > >> into interrupt. > >> > >> My env: > >> > >>     gdb-10.2 > >>     qemu-6.2.0 > >>     host kernel: 5.10.84 > >>     VM kernel: 5.10.84 > >> > >> The steps to reproduce: > >>     # host console: run a VM with only one core, the import arg: >> value='-s'/> > >>     # details can be found here: > >> https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt > >>     virsh create dev_core0.xml > >>     > >>     # run gdb client > >>     gdb ./vmlinux > >> > >>     # gdb client on host console > >>     (gdb) dir > >> ./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64 > >>     (gdb) target remote localhost:1234 > >>     (gdb) info b > >>     Num     Type           Disp Enb Address            What > >>     1       breakpoint     keep y   > >>     1.1                         y   0xffff800010361444 > >> mm/memory-failure.c:1318 > >>     1.2                         y   0xffff800010361450 in memory_failure > >>                                                     at > >> mm/memory-failure.c:1488 > >>     (gdb) c > >>     Continuing. > >> > >>     # console in VM, use madvise to inject a hwposion at virtual address > >> vaddr, > >>     # which will hit the b inmemory_failur: madvise(vaddr, pagesize, > >> MADV_HWPOISON); > >>     # and the VM pause > >>     ./run_madvise.c > >> > >>     # gdb client on host console > >>     (gdb) > >>     Continuing. > >>     Breakpoint 1, 0xffff800010361444 in memory_failure () at > >> mm/memory-failure.c:1318 > >>     1318                    res = -EHWPOISON; > >>     (gdb) n > >>     vectors () at arch/arm64/kernel/entry.S:552 > >>     552             kernel_ventry   1, irq                          // IRQ > >> EL1h > > > > The 'n' command is not a single-step: use stepi, which will suppress > > interrupts. > > Anyway, not a bug. > > > > r~ > > Hi, Richard, > > Thank you for your quick reply, I also try `stepi`, but it does NOT work > either. > > (gdb) c > Continuing. > > Breakpoint 1, memory_failure (pfn=1273982, flags=1) at > mm/memory-failure.c:1488 > 1488 { > (gdb) stepi > vectors () at arch/arm64/kernel/entry.S:552 > 552 kernel_ventry 1, irq // IRQ > EL1h > > According to QEMU doc[1]: the default single stepping behavior is step with > the IRQs > and timer service routines off. I checked the MASK bits used to control the > single > stepping IE on my machine as bellow: > > # gdb client on host (x86 plafrom) > (gdb) maintenance packet qqemu.sstepbits > sending: "qqemu.sstepbits" > received: "ENABLE=1,NOIRQ=2,NOTIMER=4" > > The sstep MASK looks as expected, but does not work as expected. > > I also try the same kernel and qemu version on X86 platform: > >> gdb-10.2 > >> qemu-6.2.0 > >> host kernel: 5.10.84 > >> VM kernel: 5.10.84 > > > The command `n` jumps to the next instruction. > > # gdb client on host (x86 plafrom) > (gdb) b memory-failure.c:1488 > Breakpoint 1, memory_failure (pfn=1128931, flags=1) at > mm/memory-failure.c:1488 > 1488 { > (gdb) n > 1497 if (!sysctl_memory_failure_recovery) > (gdb) stepi > 0xffffffff812efdbc 1497 if > (!sysctl_memory_failure_recovery) > (gdb) stepi > 0xffffffff812efdbe 1497 if > (!sysctl_memory_failure_recovery) > (gdb) n > 1500 p = pfn_to_online_page(pfn); > (gdb) l > 1496 > 1497 if (!sysctl_memory_failure_recovery) > 1498 panic("Memory failure on page %lx", pfn); > 1499 > 1500 p = pfn_to_online_page(pfn); > 1501 if (!p) { > > Best Regrades, > Shuai > > > [1] https://github.com/qemu/qemu/blob/master/docs/system/gdb.rst Hi, Richard, I was wondering that do you have any comments to this? Best Regrades, Shuai