add new classifier result

author: Christian Krinitsin <mail@krinitsin.com> 2025-06-03 12:04:13 +0000
committer: Christian Krinitsin <mail@krinitsin.com> 2025-06-03 12:04:13 +0000
commit: 256709d2eb3fd80d768a99964be5caa61effa2a0 (patch)
tree: 05b2352fba70923126836a64b6a0de43902e976a /results/classifier/105/other/1647683
parent: 2ab14fa96a6c5484b5e4ba8337551bb8dcc79cc5 (diff)
download: qemu-analysis-256709d2eb3fd80d768a99964be5caa61effa2a0.tar.gz
qemu-analysis-256709d2eb3fd80d768a99964be5caa61effa2a0.zip
1 files changed, 452 insertions, 0 deletions
diff --git a/results/classifier/105/other/1647683 b/results/classifier/105/other/1647683
new file mode 100644
index 000000000..22d8d0ac9
--- /dev/null
+++ b/results/classifier/105/other/1647683
@@ -0,0 +1,452 @@
+other: 0.960
+network: 0.953
+graphic: 0.943
+instruction: 0.931
+semantic: 0.931
+assembly: 0.922
+vnc: 0.913
+socket: 0.910
+device: 0.897
+boot: 0.864
+KVM: 0.863
+mistranslation: 0.838
+
+Bad interaction between tb flushing & gdb stub
+
+I have been working on a series of patches for ARM big-endian system mode support, using QEMU as a bare-metal simulator for the GDB test suite. At some point I realised that these tests were not running reliably on the QEMU master branch, even without my patches applied. (I.e., in little-endian mode.)
+
+Running QEMU under GDB in the test harness via Valgrind, using something akin to:
+
+(gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]
+
+leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU of the form:
+
+==52333== Process terminating with default action of signal 11 (SIGSEGV)
+==52333==  Access not within mapped region at address 0x24
+==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
+==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
+==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
+==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
+==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
+==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
+==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
+==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
+==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
+==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
+==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
+==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)
+
+These crashes didn't happen on a 2.6-era QEMU, so I bisected and discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg: Make tb_flush() thread safe") appears to be the thing that triggers this intermittent failure. Reverting the patch on the branch tip makes the crashes go away.
+
+Unfortunately I don't currently have a way to trigger the segfaults outside of Mentor Graphics's test infrastructure, which I can't share.
+
+Does anyone know a reason that this might be happening, or suggestions of how I might further debug this? Maybe a missed tb flush in the gdb stub code, somewhere?
+
+Thanks!
+
+Julian
+
+(FAOD, the crashes happen without Valgrind too, and the above backtrace may be a red herring.)
+
+On 6 December 2016 at 11:39, Julian Brown <email address hidden> wrote:
+> Running QEMU under GDB in the test harness via Valgrind, using something
+> akin to:
+>
+> (gdb) target remote | valgrind --tool=memcheck qemu-arm-system [...]
+>
+> leads to intermittent (and quite hard-to-reproduce) segfaults in QEMU of
+> the form:
+>
+> ==52333== Process terminating with default action of signal 11 (SIGSEGV)
+> ==52333==  Access not within mapped region at address 0x24
+> ==52333==    at 0x1D55F2: tb_page_remove (translate-all.c:1026)
+> ==52333==    by 0x1D58B4: tb_phys_invalidate (translate-all.c:1119)
+> ==52333==    by 0x1D63AA: tb_invalidate_phys_page_range (translate-all.c:1519)
+> ==52333==    by 0x1D66D7: tb_invalidate_phys_addr (translate-all.c:1714)
+> ==52333==    by 0x1CBA7F: breakpoint_invalidate (exec.c:704)
+> ==52333==    by 0x1CC01F: cpu_breakpoint_remove_by_ref (exec.c:869)
+> ==52333==    by 0x1CBF97: cpu_breakpoint_remove (exec.c:857)
+> ==52333==    by 0x218FAA: gdb_breakpoint_remove (gdbstub.c:717)
+> ==52333==    by 0x219E35: gdb_handle_packet (gdbstub.c:1035)
+> ==52333==    by 0x21AF62: gdb_read_byte (gdbstub.c:1459)
+> ==52333==    by 0x21B096: gdb_chr_receive (gdbstub.c:1672)
+> ==52333==    by 0x3AF2BC: qemu_chr_be_write_impl (qemu-char.c:419)
+>
+> These crashes didn't happen on a 2.6-era QEMU, so I bisected and
+> discovered the commit 3359baad36889b83df40b637ed993a4b816c4906 ("tcg:
+> Make tb_flush() thread safe") appears to be the thing that triggers this
+> intermittent failure. Reverting the patch on the branch tip makes the
+> crashes go away.
+
+I saw something similar the other day as well, not involving valgrind,
+just a simple gdb connected to the gdbstub.
+
+thanks
+-- PMM
+
+
+On 6 December 2016 at 12:34, Peter Maydell <email address hidden> wrote:
+> I saw something similar the other day as well, not involving valgrind,
+> just a simple gdb connected to the gdbstub.
+
+http://people.linaro.org/~peter.maydell/gdbstub-bug.tgz
+is a repro case for this (with an aarch64 kernel guest).
+Segfaults every time when the guest hits the breakpoint.
+
+thanks
+-- PMM
+
+
+A bug (1647683) was reported showing a crash when removing
+breakpoints. The reproducer was bisected to 3359baad when tb_flush was
+finally made thread safe. While in MTTCG the locking in
+breakpoint_invalidate should have prevented any problems currently
+tb_lock() is a NOP for system emulation.
+
+On top of it all the invalidation code was special-cased between user
+and system emulation which was ugly. As we now have a safe tb_flush()
+we might as well use that when breakpoints and added and removed. This
+is the same thing we do for single-stepping and as this is during
+debugging it isn't a major performance concern.
+
+Reported-by: Julian Brown
+Signed-off-by: Alex Bennée <email address hidden>
+---
+ exec.c | 31 ++++---------------------------
+ 1 file changed, 4 insertions(+), 27 deletions(-)
+
+diff --git a/exec.c b/exec.c
+index 3d867f1..e991221 100644
+--- a/exec.c
++++ b/exec.c
+@@ -685,29 +685,6 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
+ }
+ 
+ #if defined(CONFIG_USER_ONLY)
+-static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
+-{
+-    mmap_lock();
+-    tb_lock();
+-    tb_invalidate_phys_page_range(pc, pc + 1, 0);
+-    tb_unlock();
+-    mmap_unlock();
+-}
+-#else
+-static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
+-{
+-    MemTxAttrs attrs;
+-    hwaddr phys = cpu_get_phys_page_attrs_debug(cpu, pc, &attrs);
+-    int asidx = cpu_asidx_from_attrs(cpu, attrs);
+-    if (phys != -1) {
+-        /* Locks grabbed by tb_invalidate_phys_addr */
+-        tb_invalidate_phys_addr(cpu->cpu_ases[asidx].as,
+-                                phys | (pc & ~TARGET_PAGE_MASK));
+-    }
+-}
+-#endif
+-
+-#if defined(CONFIG_USER_ONLY)
+ void cpu_watchpoint_remove_all(CPUState *cpu, int mask)
+ 
+ {
+@@ -839,7 +816,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
+         QTAILQ_INSERT_TAIL(&cpu->breakpoints, bp, entry);
+     }
+ 
+-    breakpoint_invalidate(cpu, pc);
++    tb_flush(cpu);
+ 
+     if (breakpoint) {
+         *breakpoint = bp;
+@@ -855,6 +832,7 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
+     QTAILQ_FOREACH(bp, &cpu->breakpoints, entry) {
+         if (bp->pc == pc && bp->flags == flags) {
+             cpu_breakpoint_remove_by_ref(cpu, bp);
++            tb_flush(cpu);
+             return 0;
+         }
+     }
+@@ -865,9 +843,6 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
+ void cpu_breakpoint_remove_by_ref(CPUState *cpu, CPUBreakpoint *breakpoint)
+ {
+     QTAILQ_REMOVE(&cpu->breakpoints, breakpoint, entry);
+-
+-    breakpoint_invalidate(cpu, breakpoint->pc);
+-
+     g_free(breakpoint);
+ }
+ 
+@@ -881,6 +856,8 @@ void cpu_breakpoint_remove_all(CPUState *cpu, int mask)
+             cpu_breakpoint_remove_by_ref(cpu, bp);
+         }
+     }
++
++    tb_flush(cpu);
+ }
+ 
+ /* enable or disable single step mode. EXCP_DEBUG is returned by the
+-- 
+2.10.2
+
+
+
+
+Julian Brown <email address hidden> writes:
+
+> (FAOD, the crashes happen without Valgrind too, and the above backtrace
+> may be a red herring.)
+
+I've sent a patch that should fix this. Could you please test?
+
+--
+Alex Bennée
+
+
+On 6 December 2016 at 14:51, Alex Bennée <email address hidden> wrote:
+> A bug (1647683) was reported showing a crash when removing
+> breakpoints. The reproducer was bisected to 3359baad when tb_flush was
+> finally made thread safe. While in MTTCG the locking in
+> breakpoint_invalidate should have prevented any problems currently
+> tb_lock() is a NOP for system emulation.
+>
+> On top of it all the invalidation code was special-cased between user
+> and system emulation which was ugly. As we now have a safe tb_flush()
+> we might as well use that when breakpoints and added and removed. This
+> is the same thing we do for single-stepping and as this is during
+> debugging it isn't a major performance concern.
+>
+> Reported-by: Julian Brown
+> Signed-off-by: Alex Bennée <email address hidden>
+> ---
+>  exec.c | 31 ++++---------------------------
+>  1 file changed, 4 insertions(+), 27 deletions(-)
+>
+> diff --git a/exec.c b/exec.c
+> index 3d867f1..e991221 100644
+> --- a/exec.c
+> +++ b/exec.c
+> @@ -685,29 +685,6 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
+>  }
+>
+>  #if defined(CONFIG_USER_ONLY)
+> -static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
+> -{
+> -    mmap_lock();
+> -    tb_lock();
+> -    tb_invalidate_phys_page_range(pc, pc + 1, 0);
+> -    tb_unlock();
+> -    mmap_unlock();
+> -}
+> -#else
+> -static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
+> -{
+> -    MemTxAttrs attrs;
+> -    hwaddr phys = cpu_get_phys_page_attrs_debug(cpu, pc, &attrs);
+> -    int asidx = cpu_asidx_from_attrs(cpu, attrs);
+> -    if (phys != -1) {
+> -        /* Locks grabbed by tb_invalidate_phys_addr */
+> -        tb_invalidate_phys_addr(cpu->cpu_ases[asidx].as,
+> -                                phys | (pc & ~TARGET_PAGE_MASK));
+> -    }
+> -}
+> -#endif
+> -
+> -#if defined(CONFIG_USER_ONLY)
+>  void cpu_watchpoint_remove_all(CPUState *cpu, int mask)
+>
+>  {
+> @@ -839,7 +816,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
+>          QTAILQ_INSERT_TAIL(&cpu->breakpoints, bp, entry);
+>      }
+>
+> -    breakpoint_invalidate(cpu, pc);
+> +    tb_flush(cpu);
+>
+>      if (breakpoint) {
+>          *breakpoint = bp;
+> @@ -855,6 +832,7 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
+>      QTAILQ_FOREACH(bp, &cpu->breakpoints, entry) {
+>          if (bp->pc == pc && bp->flags == flags) {
+>              cpu_breakpoint_remove_by_ref(cpu, bp);
+> +            tb_flush(cpu);
+>              return 0;
+>          }
+>      }
+> @@ -865,9 +843,6 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
+>  void cpu_breakpoint_remove_by_ref(CPUState *cpu, CPUBreakpoint *breakpoint)
+>  {
+>      QTAILQ_REMOVE(&cpu->breakpoints, breakpoint, entry);
+> -
+> -    breakpoint_invalidate(cpu, breakpoint->pc);
+> -
+>      g_free(breakpoint);
+>  }
+>
+> @@ -881,6 +856,8 @@ void cpu_breakpoint_remove_all(CPUState *cpu, int mask)
+>              cpu_breakpoint_remove_by_ref(cpu, bp);
+>          }
+>      }
+> +
+> +    tb_flush(cpu);
+>  }
+>
+>  /* enable or disable single step mode. EXCP_DEBUG is returned by the
+
+I think this is the wrong direction. We only need to invalidate
+the TBs for the specific location the breakpoint is at.
+Even if we for the moment want to apply a big hammer of doing
+a full tb flush on breakpoint to fix this issue for this release,
+we shouldn't drop the indirection through breakpoint_invalidate(),
+because then we can't go back to invalidating just the parts we need
+easily later.
+
+thanks
+-- PMM
+
+
+
+Peter Maydell <email address hidden> writes:
+
+> On 6 December 2016 at 14:51, Alex Bennée <email address hidden> wrote:
+>> A bug (1647683) was reported showing a crash when removing
+>> breakpoints. The reproducer was bisected to 3359baad when tb_flush was
+>> finally made thread safe. While in MTTCG the locking in
+>> breakpoint_invalidate should have prevented any problems currently
+>> tb_lock() is a NOP for system emulation.
+>>
+>> On top of it all the invalidation code was special-cased between user
+>> and system emulation which was ugly. As we now have a safe tb_flush()
+>> we might as well use that when breakpoints and added and removed. This
+>> is the same thing we do for single-stepping and as this is during
+>> debugging it isn't a major performance concern.
+>>
+>> Reported-by: Julian Brown
+>> Signed-off-by: Alex Bennée <email address hidden>
+>> ---
+>>  exec.c | 31 ++++---------------------------
+>>  1 file changed, 4 insertions(+), 27 deletions(-)
+>>
+>> diff --git a/exec.c b/exec.c
+>> index 3d867f1..e991221 100644
+>> --- a/exec.c
+>> +++ b/exec.c
+>> @@ -685,29 +685,6 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
+>>  }
+>>
+>>  #if defined(CONFIG_USER_ONLY)
+>> -static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
+>> -{
+>> -    mmap_lock();
+>> -    tb_lock();
+>> -    tb_invalidate_phys_page_range(pc, pc + 1, 0);
+>> -    tb_unlock();
+>> -    mmap_unlock();
+>> -}
+>> -#else
+>> -static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
+>> -{
+>> -    MemTxAttrs attrs;
+>> -    hwaddr phys = cpu_get_phys_page_attrs_debug(cpu, pc, &attrs);
+>> -    int asidx = cpu_asidx_from_attrs(cpu, attrs);
+>> -    if (phys != -1) {
+>> -        /* Locks grabbed by tb_invalidate_phys_addr */
+>> -        tb_invalidate_phys_addr(cpu->cpu_ases[asidx].as,
+>> -                                phys | (pc & ~TARGET_PAGE_MASK));
+>> -    }
+>> -}
+>> -#endif
+>> -
+>> -#if defined(CONFIG_USER_ONLY)
+>>  void cpu_watchpoint_remove_all(CPUState *cpu, int mask)
+>>
+>>  {
+>> @@ -839,7 +816,7 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
+>>          QTAILQ_INSERT_TAIL(&cpu->breakpoints, bp, entry);
+>>      }
+>>
+>> -    breakpoint_invalidate(cpu, pc);
+>> +    tb_flush(cpu);
+>>
+>>      if (breakpoint) {
+>>          *breakpoint = bp;
+>> @@ -855,6 +832,7 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
+>>      QTAILQ_FOREACH(bp, &cpu->breakpoints, entry) {
+>>          if (bp->pc == pc && bp->flags == flags) {
+>>              cpu_breakpoint_remove_by_ref(cpu, bp);
+>> +            tb_flush(cpu);
+>>              return 0;
+>>          }
+>>      }
+>> @@ -865,9 +843,6 @@ int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
+>>  void cpu_breakpoint_remove_by_ref(CPUState *cpu, CPUBreakpoint *breakpoint)
+>>  {
+>>      QTAILQ_REMOVE(&cpu->breakpoints, breakpoint, entry);
+>> -
+>> -    breakpoint_invalidate(cpu, breakpoint->pc);
+>> -
+>>      g_free(breakpoint);
+>>  }
+>>
+>> @@ -881,6 +856,8 @@ void cpu_breakpoint_remove_all(CPUState *cpu, int mask)
+>>              cpu_breakpoint_remove_by_ref(cpu, bp);
+>>          }
+>>      }
+>> +
+>> +    tb_flush(cpu);
+>>  }
+>>
+>>  /* enable or disable single step mode. EXCP_DEBUG is returned by the
+>
+> I think this is the wrong direction. We only need to invalidate
+> the TBs for the specific location the breakpoint is at.
+> Even if we for the moment want to apply a big hammer of doing
+> a full tb flush on breakpoint to fix this issue for this release,
+> we shouldn't drop the indirection through breakpoint_invalidate(),
+> because then we can't go back to invalidating just the parts we need
+> easily later.
+
+Why would we? It's not like fresh code generation is particularly slow,
+especially in a debugging situation when you've just stopped the
+machine.
+
+The self-modifying-code paths make more sense to be partial in their
+invalidations although I've never really measured quite how pathalogical
+running a JIT inside QEMU is.
+
+--
+Alex Bennée
+
+
+I'm testing the patch now, thank you! I'll report back on how it goes.
+
+The patch works for me! Thank you!
+
+On 6 December 2016 at 15:14, Alex Bennée <email address hidden> wrote:
+> Peter Maydell <email address hidden> writes:
+>> I think this is the wrong direction. We only need to invalidate
+>> the TBs for the specific location the breakpoint is at.
+>> Even if we for the moment want to apply a big hammer of doing
+>> a full tb flush on breakpoint to fix this issue for this release,
+>> we shouldn't drop the indirection through breakpoint_invalidate(),
+>> because then we can't go back to invalidating just the parts we need
+>> easily later.
+>
+> Why would we? It's not like fresh code generation is particularly slow,
+> especially in a debugging situation when you've just stopped the
+> machine.
+
+Because a wholescale tb_flush() is a "this is probably wrong"
+flag, and a great way to hide bugs. It also makes the gdbstub
+more invasive than it strictly has to be.
+
+We have less than 10 calls to tb_flush() in the whole system
+and I think they could all use examination to see whether
+they're really correct.
+
+thanks
+-- PMM
+
+
+Fix had been included here:
+https://git.qemu.org/?p=qemu.git;a=commitdiff;h=a9353fe897ca2687e5b338
+
author	Christian Krinitsin <mail@krinitsin.com>	2025-06-03 12:04:13 +0000
committer	Christian Krinitsin <mail@krinitsin.com>	2025-06-03 12:04:13 +0000
commit	256709d2eb3fd80d768a99964be5caa61effa2a0 (patch)
tree	05b2352fba70923126836a64b6a0de43902e976a /results/classifier/105/other/1647683
parent	2ab14fa96a6c5484b5e4ba8337551bb8dcc79cc5 (diff)
download	qemu-analysis-256709d2eb3fd80d768a99964be5caa61effa2a0.tar.gz qemu-analysis-256709d2eb3fd80d768a99964be5caa61effa2a0.zip