diff options
Diffstat (limited to 'results/classifier/118/risc-v')
26 files changed, 5193 insertions, 0 deletions
diff --git a/results/classifier/118/risc-v/1093 b/results/classifier/118/risc-v/1093 new file mode 100644 index 000000000..c4926ea59 --- /dev/null +++ b/results/classifier/118/risc-v/1093 @@ -0,0 +1,63 @@ +risc-v: 0.941 +graphic: 0.780 +device: 0.675 +performance: 0.656 +files: 0.642 +ppc: 0.545 +architecture: 0.508 +user-level: 0.423 +vnc: 0.406 +network: 0.394 +semantic: 0.378 +socket: 0.365 +permissions: 0.306 +arm: 0.297 +PID: 0.291 +boot: 0.281 +mistranslation: 0.211 +debug: 0.210 +peripherals: 0.201 +TCG: 0.198 +register: 0.191 +kernel: 0.183 +hypervisor: 0.173 +VMM: 0.154 +i386: 0.123 +KVM: 0.113 +virtual: 0.109 +x86: 0.098 +assembly: 0.081 + +RISC-V: signal frame is misaligned in signal handlers +Description of problem: +`qemu-user` misaligns the signal frame (to 4 bytes rather than 16 bytes) on RISC-V 64, e.g causing pointer misalignment diagnostics to be triggered by UBSan. +Steps to reproduce: +1. Create a C file with the following contents: +```c +#include <signal.h> +#include <stdio.h> + +void handler(int sig, siginfo_t *info, void *context) { + printf("signal occurred, info: %p, context: %p\n", info, context); +} + +int main() { + struct sigaction act; + act.sa_flags = SA_SIGINFO; + act.sa_sigaction = handler; + sigaction(SIGINT, &act, NULL); + + // Deliberately misalign the stack + asm volatile ("addi sp, sp, -4"); + + while(1); + // Unreachable +} +``` +2. Compile with an appropriate RISC-V toolchain and run with `qemu-riscv64 ./a.out`. +3. Send a `SIGINT` (e.g by hitting Ctrl-C), and observe that the signal frame will be misaligned: +``` +signal occurred, info: 0x400080025c, context: 0x40008002dc +``` +Additional information: +This issue is alluded to in the source code, see https://gitlab.com/qemu-project/qemu/-/blob/master/linux-user/riscv/signal.c#L68-69. It should be sufficient to change that constant to 15. diff --git a/results/classifier/118/risc-v/1155 b/results/classifier/118/risc-v/1155 new file mode 100644 index 000000000..7bbf9cb9a --- /dev/null +++ b/results/classifier/118/risc-v/1155 @@ -0,0 +1,57 @@ +risc-v: 0.966 +debug: 0.874 +KVM: 0.861 +graphic: 0.794 +device: 0.681 +virtual: 0.644 +architecture: 0.638 +permissions: 0.630 +kernel: 0.492 +arm: 0.492 +hypervisor: 0.461 +ppc: 0.442 +performance: 0.399 +semantic: 0.398 +vnc: 0.374 +PID: 0.370 +user-level: 0.354 +x86: 0.311 +boot: 0.304 +mistranslation: 0.286 +network: 0.277 +socket: 0.243 +peripherals: 0.226 +i386: 0.223 +VMM: 0.201 +TCG: 0.165 +register: 0.157 +assembly: 0.152 +files: 0.125 + +RISC-V: Instruction fetch exceptions can have invalid tval/epc combination +Description of problem: +Instruction page fault / guest-page fault / access fault exceptions can have invalid `epc`/`tval` combinations, for example as shown in the debug log: + +``` +riscv_cpu_do_interrupt: hart:0, async:0, cause:0000000000000014, epc:0xffffffff802fec76, tval:0xffffffff802ff000, desc=guest_exec_page_fault +riscv_cpu_do_interrupt: hart:0, async:0, cause:0000000000000014, epc:0xffffffff80243fe6, tval:0xffffffff80244000, desc=guest_exec_page_fault +``` + +From the privileged spec: + +> If `mtval` is written with a nonzero value when an instruction access-fault or page-fault exception occurs on a system with variable-length instructions, then `mtval` will contain the virtual address of the portion of the instruction that caused the fault, while `mepc` will point to the beginning of the instruction. + +Currently RISC-V only has 32-bit and 16-bit instructions, so the difference `tval - epc` should be either `0` or `2`. In the examples above the differences are `906` and `26` respectively. + +Possibly notable: all occurrences of these invalid combinations to have `tval` aligned to a page-boundary. +Steps to reproduce: +This one only gives invalid `tval`/`epc` combinations with instruction guest-page faults, but I've found it to be the easiest reproducer to describe, since presumably running KVM in RISC-V QEMU is a standard setup. I have not otherwise been able to find a more minimal case. + +1. Start a QEMU-based `riscv64` machine +2. Start a KVM-based virtual machine with QEMU inside it +3. Do some stuff in the KVM-based virtual machine to increase the chance of page faults +4. Look in the debug log of the outer QEMU for `guest_exec_page_fault` exceptions with `tval` ending in `000`, but `epc` ending in neither `000` nor `ffe` + +Everything in both layers of guests should otherwise work without issue, but other/future software that relies on the spec-mandated relationship of `epc`/`tval` may break. +Additional information: + diff --git a/results/classifier/118/risc-v/1160 b/results/classifier/118/risc-v/1160 new file mode 100644 index 000000000..075de0f59 --- /dev/null +++ b/results/classifier/118/risc-v/1160 @@ -0,0 +1,31 @@ +risc-v: 0.886 +device: 0.854 +performance: 0.739 +peripherals: 0.718 +network: 0.670 +architecture: 0.644 +graphic: 0.574 +vnc: 0.574 +virtual: 0.486 +register: 0.428 +boot: 0.423 +debug: 0.406 +kernel: 0.393 +files: 0.390 +arm: 0.358 +mistranslation: 0.319 +ppc: 0.271 +hypervisor: 0.259 +VMM: 0.241 +socket: 0.236 +user-level: 0.199 +semantic: 0.196 +assembly: 0.177 +TCG: 0.112 +permissions: 0.102 +KVM: 0.094 +PID: 0.088 +i386: 0.056 +x86: 0.032 + +hw/riscv reset vector improvement diff --git a/results/classifier/118/risc-v/1233 b/results/classifier/118/risc-v/1233 new file mode 100644 index 000000000..931d42647 --- /dev/null +++ b/results/classifier/118/risc-v/1233 @@ -0,0 +1,31 @@ +risc-v: 0.865 +device: 0.653 +architecture: 0.652 +virtual: 0.495 +hypervisor: 0.392 +graphic: 0.334 +mistranslation: 0.251 +performance: 0.212 +vnc: 0.210 +semantic: 0.209 +network: 0.198 +arm: 0.188 +i386: 0.165 +permissions: 0.155 +x86: 0.144 +VMM: 0.142 +boot: 0.141 +register: 0.140 +ppc: 0.121 +debug: 0.120 +kernel: 0.115 +TCG: 0.085 +assembly: 0.073 +files: 0.064 +user-level: 0.060 +PID: 0.040 +KVM: 0.040 +socket: 0.034 +peripherals: 0.033 + +is there a roadmap about when riscv-v extension will be implemented?? diff --git a/results/classifier/118/risc-v/1259 b/results/classifier/118/risc-v/1259 new file mode 100644 index 000000000..b3f9761e5 --- /dev/null +++ b/results/classifier/118/risc-v/1259 @@ -0,0 +1,31 @@ +risc-v: 0.988 +device: 0.851 +mistranslation: 0.795 +performance: 0.775 +virtual: 0.583 +register: 0.560 +architecture: 0.521 +semantic: 0.460 +graphic: 0.448 +network: 0.386 +permissions: 0.375 +vnc: 0.371 +kernel: 0.243 +assembly: 0.215 +debug: 0.156 +peripherals: 0.121 +hypervisor: 0.112 +socket: 0.077 +arm: 0.070 +boot: 0.067 +x86: 0.060 +ppc: 0.059 +user-level: 0.056 +files: 0.051 +i386: 0.020 +VMM: 0.016 +TCG: 0.014 +PID: 0.012 +KVM: 0.004 + +RISC-V csr diff --git a/results/classifier/118/risc-v/1395 b/results/classifier/118/risc-v/1395 new file mode 100644 index 000000000..7de938871 --- /dev/null +++ b/results/classifier/118/risc-v/1395 @@ -0,0 +1,186 @@ +risc-v: 0.871 +debug: 0.844 +peripherals: 0.832 +ppc: 0.796 +register: 0.786 +TCG: 0.786 +permissions: 0.773 +x86: 0.759 +KVM: 0.748 +vnc: 0.745 +device: 0.744 +architecture: 0.723 +performance: 0.720 +virtual: 0.717 +mistranslation: 0.711 +VMM: 0.697 +graphic: 0.695 +user-level: 0.671 +assembly: 0.670 +files: 0.669 +network: 0.662 +i386: 0.661 +kernel: 0.655 +hypervisor: 0.652 +socket: 0.651 +arm: 0.647 +semantic: 0.623 +PID: 0.618 +boot: 0.598 + +qemu-system-riscv32 cpu_transaction_failed cause Infinite loop when write mstatus ~"target: riscv" +Description of problem: +I wanna run FreeRTOS riscv, and use the FreeRTOS/Demo/RISC-V-Qemu-virt_GCC/Makefile to build elf.\ +When qemu execute to write mstatus as 0x1888(enable Interrupt, MIE:1, MIP:1, MPP:3), there is no response.\ +https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/portable/GCC/RISC-V/portASM.S\ +line 274: csrrw x0, mstatus, x5 /* Interrupts enabled from here! */\ +opcode is hex 30029073\n +I use pstack to trace qemu thread, there is only one thread is active, and cpu loading is 100%.\ +then I use gdb attatch <pid> to trace the active thread, and it has a loop\ +cpu_loop_exit call siglongjmp and back to sigsetjmp in cpu_exec (cpu=cpu@entry=0x55e2294e4070) at ../accel/tcg/cpu-exec.c:936 +Steps to reproduce: +1.download FreeRTOS and build FreeRTOS/Demo/RISC-V-Qemu-virt_GCC\ +2.run qemu with gdb\ +3.hang when writing mstatus +Additional information: +I find that my issue occur when mtvec is zero and timer interrupt occur when writing mstatus(riscv_cpu_do_interrupt)\ +Although it should jump to 0x0 rather then hanging in while loop.\ +expected flow :cpu_handle_interrupt->check_for_breakpoints->break\ +actually flow: cpu_handle_interrupt->check_for_breakpoints->infinite loop\ +Qemu build command: +``` +./configure --target-list=riscv32-softmmu && make +``` + +pstack for qemu (only need to debug Thread 3) +``` +Thread 3 (Thread 0x7f83af6d3640 (LWP 5093) "qemu-system-ris"): +#0 0x000055cb31b1769f in riscv_cpu_exec_interrupt () +#1 0x0000000000000000 in () +Thread 2 (Thread 0x7f83b0119640 (LWP 5092) "qemu-system-ris"): +#0 0x00007f83b0400a3d in syscall () at /lib/x86_64-linux-gnu/libc.so.6 +#1 0x000055cb31e0bd52 in qemu_event_wait () +#2 0x0000000000000000 in () +Thread 1 (Thread 0x7f83b011ac00 (LWP 5090) "qemu-system-ris"): +#0 0x00007f83b03fae7e in ppoll () at /lib/x86_64-linux-gnu/libc.so.6 +#1 0x00007f83b0752500 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 +#2 0x000055cb33241b30 in () +#3 0x0000000000000005 in () +#4 0x0000000000000000 in () +``` +backtrace for the infinite loop +``` +(gdb) bt +#0 cpu_loop_exit (cpu=0x55e2294e4070) at ../accel/tcg/cpu-exec-common.c:65 +#1 0x000055e2274efde4 in cpu_loop_exit_restore (cpu=cpu@entry=0x55e2294e4070, pc=pc@entry=0) + at ../accel/tcg/cpu-exec-common.c:76 +#2 0x000055e22737fff1 in riscv_cpu_do_transaction_failed + (cs=0x55e2294e4070, physaddr=<optimized out>, addr=0, size=<optimized out>, access_type=MMU_INST_FETCH, mmu_idx=<optimized out>, attrs=..., response=2, retaddr=0) + at ../target/riscv/cpu_helper.c:1165 +#3 0x000055e2274fa4a7 in cpu_transaction_failed + (retaddr=0, response=2, attrs=..., mmu_idx=3, access_type=MMU_INST_FETCH, size=<optimized out>, addr=0, physaddr=<optimized out>, cpu=0x55e2294e4070) at ../accel/tcg/cputlb.c:1344 +#4 io_readx + (env=env@entry=0x55e2294e53d0, full=full@entry=0x7fd90c029410, mmu_idx=3, addr=addr@entry=0, retaddr=retaddr@entry=0, access_type=access_type@entry=MMU_INST_FETCH, op=MO_16) + at ../accel/tcg/cputlb.c:1380 +#5 0x000055e2274fba28 in load_helper + (full_load=<optimized out>, code_read=true, op=MO_16, retaddr=0, oi=19, addr=0, env=0x55e2294e53d0) at ../accel/tcg/cputlb.c:1970 +#6 full_lduw_code (env=env@entry=0x55e2294e53d0, addr=addr@entry=0, oi=19, retaddr=0) + at ../accel/tcg/cputlb.c:2606 +#7 0x000055e22750827b in cpu_lduw_code (env=env@entry=0x55e2294e53d0, addr=addr@entry=0) + at ../accel/tcg/cputlb.c:2612 +#8 0x000055e2274f87fa in translator_lduw + (env=env@entry=0x55e2294e53d0, db=db@entry=0x7fd913dfe5a0, pc=0) + at ../accel/tcg/translator.c:216 +#9 0x000055e2273e423a in riscv_tr_translate_insn (dcbase=0x7fd913dfe5a0, cpu=<optimized out>) + at ../target/riscv/translate.c:1158 +#10 0x000055e2274f83d3 in translator_loop + (cpu=cpu@entry=0x55e2294e4070, tb=tb@entry=0x7fd91c000240 <code_gen_buffer+531>, max_insns=<optim + ized out>, pc=pc@entry=0, host_pc=host_pc@entry=0x55e2274efe74 <tb_htable_lookup+84>, ops=ops@entry=0x55e227a75c80 <riscv_tr_ops>, db=0x7fd913dfe5a0) at ../accel/tcg/translator.c:96 +#11 0x000055e227411760 in gen_intermediate_code + (cs=cs@entry=0x55e2294e4070, tb=tb@entry=0x7fd91c000240 <code_gen_buffer+531>, max_insns=<optimized out>, pc=pc@entry=0, host_pc=host_pc@entry=0x55e2274efe74 <tb_htable_lookup+84>) + at ../target/riscv/translate.c:1240 +#12 0x000055e2274f6954 in setjmp_gen_code + (env=env@entry=0x55e2294e53d0, tb=tb@entry=0x7fd91c000240 <code_gen_buffer+531>, pc=pc@entry=0, host_pc=0x55e2274efe74 <tb_htable_lookup+84>, max_insns=max_insns@entry=0x7fd913dfe744, ti=<optimized out>) at ../accel/tcg/translate-all.c:761 +#13 0x000055e2274f7294 in tb_gen_code + (cpu=cpu@entry=0x55e2294e4070, pc=0, cs_base=0, flags=1085443, cflags=<optimized out>, + cflags@entry=-16777216) at ../accel/tcg/translate-all.c:841 +#14 0x000055e2274f10cf in cpu_exec (cpu=cpu@entry=0x55e2294e4070) at ../accel/tcg/cpu-exec.c:1006 +#15 0x000055e22750a904 in tcg_cpus_exec (cpu=cpu@entry=0x55e2294e4070) + at ../accel/tcg/tcg-accel-ops.c:69 +#16 0x000055e22750aa57 in mttcg_cpu_thread_fn (arg=arg@entry=0x55e2294e4070) + at ../accel/tcg/tcg-accel-ops-mttcg.c:95 +#17 0x000055e227674b21 in qemu_thread_start (args=<optimized out>) + at ../util/qemu-thread-posix.c:505 +#18 0x00007fd9611a9b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 +#19 0x00007fd96123ba00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 +``` + +disassembly code +``` +80001ac6 <xPortStartFirstTask>: +80001ac6: 85c1a103 lw sp,-1956(gp) # 800809fc <pxCurrentTCB> +80001aca: 4102 lw sp,0(sp) +80001acc: 4082 lw ra,0(sp) +80001ace: 43c2 lw t2,16(sp) +80001ad0: 4452 lw s0,20(sp) +80001ad2: 44e2 lw s1,24(sp) +80001ad4: 4572 lw a0,28(sp) +80001ad6: 5582 lw a1,32(sp) +80001ad8: 5612 lw a2,36(sp) +80001ada: 56a2 lw a3,40(sp) +80001adc: 5732 lw a4,44(sp) +80001ade: 57c2 lw a5,48(sp) +80001ae0: 5852 lw a6,52(sp) +80001ae2: 58e2 lw a7,56(sp) +80001ae4: 5972 lw s2,60(sp) +80001ae6: 4986 lw s3,64(sp) +80001ae8: 4a16 lw s4,68(sp) +80001aea: 4aa6 lw s5,72(sp) +80001aec: 4b36 lw s6,76(sp) +80001aee: 4bc6 lw s7,80(sp) +80001af0: 4c56 lw s8,84(sp) +80001af2: 4ce6 lw s9,88(sp) +80001af4: 4d76 lw s10,92(sp) +80001af6: 5d86 lw s11,96(sp) +80001af8: 5e16 lw t3,100(sp) +80001afa: 5ea6 lw t4,104(sp) +80001afc: 5f36 lw t5,108(sp) +80001afe: 5fc6 lw t6,112(sp) +80001b00: 52d6 lw t0,116(sp) +80001b02: 0007f317 auipc t1,0x7f +80001b06: ea232303 lw t1,-350(t1) # 800809a4 <pxCriticalNesting> +80001b0a: 00532023 sw t0,0(t1) +80001b0e: 52e6 lw t0,120(sp) +80001b10: 02a1 addi t0,t0,8 +80001b12: 30029073 csrw mstatus,t0 <--- hang on this line +80001b16: 42a2 lw t0,8(sp) +80001b18: 4332 lw t1,12(sp) +80001b1a: 07c10113 addi sp,sp,124 +80001b1e: 8082 ret +``` + +``` +(gdb) bt +#0 cpu_loop_exit (cpu=cpu@entry=0x564cd884b070) at ../accel/tcg/cpu-exec-common.c:65 +#1 0x0000564cd6685631 in helper_lookup_tb_ptr (env=0x564cd884c3d0) at ../accel/tcg/cpu-exec.c:400 +#2 0x00007f55dc00014c in code_gen_buffer () +#3 0x0000564cd668521b in cpu_tb_exec + (cpu=cpu@entry=0x564cd884b070, itb=itb@entry=0x7f55dc000040 <code_gen_buffer+19>, tb_exit=tb_exit@entry=0x7f56235f67ec) at ../accel/tcg/cpu-exec.c:438 +#4 0x0000564cd6685cfb in cpu_loop_exec_tb + (tb_exit=0x7f56235f67ec, last_tb=<synthetic pointer>, pc=<optimized out>, tb=0x7f55dc000040 <code_gen_buffer+19>, cpu=0x564cd884b070) at ../accel/tcg/cpu-exec.c:868 +#5 cpu_exec (cpu=cpu@entry=0x564cd884b070) at ../accel/tcg/cpu-exec.c:1032 +#6 0x0000564cd669f904 in tcg_cpus_exec (cpu=cpu@entry=0x564cd884b070) + at ../accel/tcg/tcg-accel-ops.c:69 +#7 0x0000564cd669fa57 in mttcg_cpu_thread_fn (arg=arg@entry=0x564cd884b070) + at ../accel/tcg/tcg-accel-ops-mttcg.c:95 +#8 0x0000564cd6809b21 in qemu_thread_start (args=<optimized out>) + at ../util/qemu-thread-posix.c:505 +#9 0x00007f562429ab43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 +#10 0x00007f562432ca00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 +``` + +I also build a very simple elf for qemu-virt-platform, just contain boot-loader and write mstatus as 0x1888, it can't reproduce.\ +I also build different qemu version such v6.0.0, it still can reproduce.\ +I has modify the march to the most simple arch:rv32i, is still can reproduce. + +~"target: riscv" diff --git a/results/classifier/118/risc-v/1441 b/results/classifier/118/risc-v/1441 new file mode 100644 index 000000000..281d01323 --- /dev/null +++ b/results/classifier/118/risc-v/1441 @@ -0,0 +1,64 @@ +risc-v: 0.884 +vnc: 0.812 +graphic: 0.732 +device: 0.715 +architecture: 0.651 +network: 0.594 +PID: 0.548 +files: 0.540 +ppc: 0.537 +TCG: 0.526 +arm: 0.504 +register: 0.490 +socket: 0.471 +debug: 0.436 +permissions: 0.428 +semantic: 0.412 +performance: 0.403 +virtual: 0.401 +boot: 0.340 +VMM: 0.269 +kernel: 0.254 +user-level: 0.248 +peripherals: 0.221 +hypervisor: 0.188 +assembly: 0.166 +mistranslation: 0.152 +x86: 0.127 +KVM: 0.074 +i386: 0.073 + +Assertion failure when executing RISC-V vfncvt.rtz.x.f.w instruction +Description of problem: +When emulating the `vfncvt.rtz.x.f.w` instruction, QEMU crashes with an assertion failure at `target/riscv/translate.c:211`, complaining that ```decode_save_opc: Assertion `ctx->insn_start != NULL' failed.``` + +It appears this problem first emerged with https://gitlab.com/qemu-project/qemu/-/commit/a9814e3e08d2aacbd9018c36c77c2fb652537848 +Steps to reproduce: +The following C program triggers the assertion failure when built a sufficiently recent version of the Clang cross compiler (in my case 15.0.6): +``` +/* test.c */ +#include <riscv_vector.h> + +#define LEN 4 + +int main(int argc, char *argv[]) { + double in[LEN]; + int out[LEN]; + + vfloat64m1_t vf = vle64_v_f64m1(in, LEN); + vint32mf2_t vi = vfncvt_rtz_x_f_w_i32mf2(vf, LEN); + vse32_v_i32mf2(out, vi, LEN); + + return 0; +} +``` + +The above `test.c` can be compiled and run as follows: +``` +clang -O3 -march=rv64gcv -static test.c +qemu-riscv64 -cpu "rv64,zba=true,zbb=true,zbc=true,zbs=true,v=true,vlen=512,elen=64,vext_spec=v1.0" a.out +qemu-riscv64: ../target/riscv/translate.c:211: decode_save_opc: Assertion `ctx->insn_start != NULL' failed. +Segmentation fault (core dumped) +``` +Additional information: + diff --git a/results/classifier/118/risc-v/1673976 b/results/classifier/118/risc-v/1673976 new file mode 100644 index 000000000..fedffa8ff --- /dev/null +++ b/results/classifier/118/risc-v/1673976 @@ -0,0 +1,336 @@ +risc-v: 0.807 +hypervisor: 0.674 +user-level: 0.600 +virtual: 0.594 +graphic: 0.580 +peripherals: 0.556 +device: 0.532 +mistranslation: 0.532 +x86: 0.511 +semantic: 0.510 +VMM: 0.505 +ppc: 0.492 +permissions: 0.489 +assembly: 0.484 +vnc: 0.481 +register: 0.479 +TCG: 0.471 +architecture: 0.471 +debug: 0.457 +performance: 0.456 +PID: 0.450 +socket: 0.443 +arm: 0.442 +i386: 0.440 +network: 0.410 +kernel: 0.404 +KVM: 0.399 +boot: 0.383 +files: 0.324 + +linux-user clone() can't handle glibc posix_spawn() (causes locale-gen to assert) + +I'm running a command command (locale-gen) inside of an armv7h chroot mounted on my x86_64 desktop by putting qemu-arm-static into /usr/bin/ of the chroot file system and I get a core dump. + +locale-gen +Generating locales... + en_US.UTF-8...localedef: ../sysdeps/unix/sysv/linux/spawni.c:360: __spawnix: Assertion `ec >= 0' failed. +qemu: uncaught target signal 6 (Aborted) - core dumped +/usr/bin/locale-gen: line 41: 34 Aborted (core dumped) localedef -i $input -c -f $charset -A /usr/share/locale/locale.alias $locale + +I've done this same thing successfully for years, but this breakage has appeared some time in the last 3 or so months. Possibly with the update to qemu version 2.8. + +I can confirm this. The ninja build system is also affected. + +Could you please check whether the problem also occurs with QEMU v2.10? + +Hi, + +I can confirm it with QEMU 2.10.0 (running Gentoo Linux) + +Portage 2.3.10 (python 2.7.14-final-0, default/linux/amd64/17.0/no-multilib, gcc-7.2.0, glibc-2.25-r5, 4.13.4-gentoo x86_64) + + +# uname -a && locale-gen +Linux **** 4.13.4-gentoo #1 SMP PREEMPT Thu Sep 28 09:41:30 CEST 2017 armv7l Intel(R) Celeron(R) 2957U @ 1.40GHz GNU/Linux + * Generating 8 locales (this might take a while) with 2 jobs + * (2/8) Generating en_US.UTF-8 ... +localedef: ../sysdeps/unix/sysv/linux/spawni.c:366: __spawnix: Assertion `ec >= 0' failed. +qemu: uncaught target signal 6 (Aborted) - core dumped [ !! ] + * (1/8) Generating en_US.ISO-8859-1 ... +localedef: ../sysdeps/unix/sysv/linux/spawni.c:366: __spawnix: Assertion `ec >= 0' failed. +qemu: uncaught target signal 6 (Aborted) - core dumped [ !! ] + * (3/8) Generating fr_BE.ISO-8859-15@euro ... +localedef: ../sysdeps/unix/sysv/linux/spawni.c:366: __spawnix: Assertion `ec >= 0' failed. +qemu: uncaught target signal 6 (Aborted) - core dumped [ !! ] + * (4/8) Generating fr_BE.ISO-8859-1 ... +localedef: ../sysdeps/unix/sysv/linux/spawni.c:366: __spawnix: Assertion `ec >= 0' failed. +qemu: uncaught target signal 6 (Aborted) - core dumped [ !! ] + * (5/8) Generating fr_BE.UTF-8 ... +localedef: ../sysdeps/unix/sysv/linux/spawni.c:366: __spawnix: Assertion `ec >= 0' failed. +qemu: uncaught target signal 6 (Aborted) - core dumped [ !! ] + * (6/8) Generating fr_FR.ISO-8859-15@euro ... +localedef: ../sysdeps/unix/sysv/linux/spawni.c:366: __spawnix: Assertion `ec >= 0' failed. +qemu: uncaught target signal 6 (Aborted) - core dumped [ !! ] + * (7/8) Generating fr_FR.ISO-8859-1 ... +localedef: ../sysdeps/unix/sysv/linux/spawni.c:366: __spawnix: Assertion `ec >= 0' failed. +qemu: uncaught target signal 6 (Aborted) - core dumped [ !! ] + * (8/8) Generating fr_FR.UTF-8 ... +localedef: ../sysdeps/unix/sysv/linux/spawni.c:366: __spawnix: Assertion `ec >= 0' failed. +qemu: uncaught target signal 6 (Aborted) - core dumped [ !! ] + * Generation complete + * Adding locales to archive ... +incomplete set of locale files in "//usr/lib/locale/en_US" +incomplete set of locale files in "//usr/lib/locale/en_US.utf8" +incomplete set of locale files in "//usr/lib/locale/fr_BE" +incomplete set of locale files in "//usr/lib/locale/fr_BE@euro" +incomplete set of locale files in "//usr/lib/locale/fr_BE.utf8" +incomplete set of locale files in "//usr/lib/locale/fr_FR" +incomplete set of locale files in "//usr/lib/locale/fr_FR@euro" +incomplete set of locale files in "//usr/lib/locale/fr_FR.utf8" [ !! ] + + +Looks like the __clone() call is failing for some reason: + +https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/spawni.c;h=dea1650d08ded5fd848f263aebebe8748e703697;hb=HEAD#l362 + + + +Here is a workaround: + +cd /usr/share/i18n/charmaps +gunzip --keep UTF-8.gz +locale-gen en_US.UTF-8 + + + +It is possible to reproduce the issue with a simple clone example taken from + + http://man7.org/linux/man-pages/man2/clone.2.html + + +# qemu-aarch64-static -strace ./a.out testname +585 brk(NULL) = 0x0000004000013000 +585 uname(0x4000812d08) = 0 +585 faccessat(AT_FDCWD,"/etc/ld.so.nohwcap",F_OK,0x82e888) = -1 errno=2 (No such file or directory) +585 mmap(NULL,12288,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x0000004000843000 +585 faccessat(AT_FDCWD,"/etc/ld.so.preload",R_OK,AT_SYMLINK_NOFOLLOW|0x82d848) = -1 errno=2 (No such file or directory) +585 openat(AT_FDCWD,"/etc/ld.so.cache",O_RDONLY|O_CLOEXEC) = 3 +585 fstat(3,0x0000004000812680) = 0 +585 mmap(NULL,20645,PROT_READ,MAP_PRIVATE,3,0) = 0x0000004000846000 +585 close(3) = 0 +585 faccessat(AT_FDCWD,"/etc/ld.so.nohwcap",F_OK,0x82e888) = -1 errno=2 (No such file or directory) +585 openat(AT_FDCWD,"/lib/aarch64-linux-gnu/libc.so.6",O_RDONLY|O_CLOEXEC) = 3 +585 read(3,0x812830,832) = 832 +585 fstat(3,0x00000040008126d0) = 0 +585 mmap(NULL,1393456,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,3,0) = 0x000000400084c000 +585 mprotect(0x0000004000987000,65536,PROT_NONE) = 0 +585 mmap(0x0000004000997000,24576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,3,0x13b000) = 0x0000004000997000 +585 mmap(0x000000400099d000,13104,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,-1,0) = 0x000000400099d000 +585 close(3) = 0 +585 mprotect(0x0000004000997000,16384,PROT_READ) = 0 +585 mprotect(0x0000004000011000,4096,PROT_READ) = 0 +585 mprotect(0x0000004000840000,4096,PROT_READ) = 0 +585 munmap(0x0000004000846000,20645) = 0 +585 brk(NULL) = 0x0000004000013000 +585 brk(0x0000004000034000) = 0x0000004000013000 +585 mmap(NULL,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x00000040009a1000 +585 mmap(NULL,1052672,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x0000004000aa1000 +585 clone(CLONE_NEWUTS|0x11,child_stack=0x0000004000ba1010,parent_tidptr=0x0000004000aa1010,tls=0x0000000000000000,child_tidptr=0x0000000000000000) = -1 errno=22 (Invalid argument) +585 dup(2,4222427270,274886578000,22,0,0) = 3 +585 fcntl(3,F_GETFL) = 1026 +585 fstat(3,0x0000004000812628) = 0 +585 write(3,0x9a1490,24)clone: Invalid argument + = 24 +585 close(3) = 0 +585 exit_group(1) + + +# strace ./a.out testname +qemu: Unsupported syscall: 117 +qemu: Unsupported syscall: 117 +/usr/bin/strace: ptrace(PTRACE_TRACEME, ...): Function not implemented ++++ exited with 1 +++ + + +This happens because QEMU is now stricter about its checking of the flags passed to clone() -- previously we would silently allow flags we couldn't support and create new threads with the wrong behaviour. Now we check and fail the clone() syscall if the requested behaviour is something we can't implement. Unfortunately we don't have any way to distinguish "guest program is asking for something odd that it doesn't really need" from "guest program is asking for something odd that it does need". So we err on the safe side and tell the guest we can't do it. + +It's particularly unfortunate that the glibc implementation of posix_spawn() runs into this, though. + + +Is there a way I can ask QEMU to not do this strict checking so that my stuff stops breaking? + +Not without messing with the QEMU source, no. + + +OK, this can't be as simple as "posix_spawn() fails", because I've just tried the test program from the posix_spawn manpage (http://man7.org/linux/man-pages/man3/posix_spawn.3.html) and that works fine for x86-64 guest, aarch64 guest and armhf guest. In the x86 and armhf cases the libc I have seems to use the NR_vfork syscall, but for aarch64 it uses clone(CLONE_VM | CLONE_VFORK | SIGCHLD, ...) which is what the glibc sources linked in comment #5 do, and that all works fine. + +And locale-gen runs fine for my xenial-armhf chroot using current head-of-git QEMU: + +root@e104462:/# locale-gen +Generating locales (this might take a while)... + en_GB.UTF-8... done +Generation complete. + +So can I ask that people: (1) please try with current head of git (or with 2.11-rc1, which is almost the same thing); (2) if there's still a problem with localegen or with programs calling posix_spawn() or other real-world code, please provide full repro instructions so I can try to reproduce locally. + +I don't think we can make clone() in general work, so oddball demo code like the example program in the clone(2) manpage is out of scope, but there may well be specific cases we can address. + + +I can reproduce the bug in a mips64el chroot running current Debian unstable - the posix_spawn example you mention fails there. I have tested v2.11.0-rc2 and it fails there as well. I think you need glibc >= 2.25 to trigger the bug (artful / bionic chroot). I only noticed it due to Debian updating to a newer glibc recently. + +I think I see the problem. This glibc commit rewrote the posix_spawn implementation on Linux: +https://sourceware.org/git/?p=glibc.git;a=commit;h=9ff72da471a509a8c19791efe469f47fa6977410 + +It now relies on the exact behavior of clone(CLONE_VM | CLONE_VFORK) - ie: +- That the parent will wait for the child to exec before continuing. +- Writes to memory in the child are later visible in the parent + +QEMU emulates this clone using fork() which no longer works properly and causes the assertion failure. + +Sorry, this is probably the commit that broke things (not the one above). I was added in glibc 2.25: +https://sourceware.org/git/?p=glibc.git;a=commit;h=4b4d4056bb154603f36c6f8845757c1012758158 + +Thanks for tracking down the glibc change; I will try to set up a chroot with a more recent glibc to see whether we can do something that fixes that posix_spawn implementation... + +Interestingly, this also affects Microsoft Windows Services For Linux, i.e. Microsoft's Linux emulation layer. + +> https://github.com/Microsoft/WSL/issues/1878 + +I have verified that this patch [1] in glibc_2.25 and glibc_2.26 fixes the assert. + +> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=22273 + +This should probably be put under 'glibc', since this is really an issue with that package, which is fixed by the way since Oct 2017. + +https://sourceware.org/git/?p=glibc.git;a=commit;h=fe05e1cb6d64dba6172249c79526f1e9af8f2bfd + +This should also be backported to 17.10 + + +That glibc change has caused the assert to go away, but QEMU's spawn(CLONE_VFORK) still does not have the "always waits for child" semantics that glibc has assumed since glibc commit 4b4d4056bb154. The child and the parent will end up racing each other, and the child will never be able to write to the parent's address space. I think that the effect of that race will be that if the child fails (for instance if a bad filename is passed and exec() fails) the parent will never notice and will return a success code from the spawn function when it should not. + +So there remains a QEMU bug here; though it is also the case that I can't see any way we can fix it. + + +Ok, thank you for clearing that up. + +I'm noticing in 4b4d4056bb154 this comment: + +"...we just make explicit use of the fact the the child and parent run in the same VM, so the child can write an error code to a field of the posix_spawn_args struct instead of sending it through a pipe. To ensure that this mechanism really works, the parent initializes the field to -1 and the child writes 0 before execing." + +So, if the child fail to execute, that error code field of the posix_spawn_args struct will remain -1. Would this ensure that QEMU return error in case of failing exec? + +Best Regards, +Eric + +Commit fe05e1cb6d64db changed that, so args.err is initialized to zero. + + +Ok, yes you are right... + +I have looked a bit more on the source code, and indeed, I think understand the issue with the VFORK with QEMU. Please correct me if I'm wrong... + +- In the syscall trap handler, it has to use the fork() function to emulate the vfork() due to restriction of the vfork() function (as QEMU must continue to control the flow of instruction past the call to vfork(), and do a lot more things in the child thread before ending up performing a execve() or _exit()) +- Also, it can not do a wait() for the emulated child, as this child will continue to exist even after it calls execve(), so the parent would stall. +- Then, I taught about doing condition signalling, like waiting for a pthread condition signal that the child would send once it come to the point of performing the _exit() or execve(), but the child would, for example, need to know if execve() was successful, or otherwise the child would continue and set an error flag and then call _exit(). We do need that error flag before continuing the execution on the parent. So we can not signal back to the parent that the 'emulated vfork' is OK before calling execve(), but we can not wait after execve() because if the call is successful, there is no return from that function, and code goes outside the control of QEMU. + +So, I taught of an idea... What if, in the TARGET_NR_clone syscall trap, when we are called upon a CLONE_VFORK, we do: +- Do a regular fork, as it's currently done, with CLONE_VM flag (so the child share the same memory as the parent). However, we also set a state flag that we are in this 'vfork emulation' mode just before the fork (more on that bellow...). +- Let the parent wait for the child to terminate (again, more on that bellow...). +- Let the child return normally and continue execution, as if the parent was waiting. + +Then, eventually the child will eventually either end up in the TARGET_NR_execve or __NR_exit_group syscall trap. At which point: +- The child check if it is in 'vfork emulation' mode. If not, then there's nothing special, just continue the way the code is currently written. If the flag is set, then follow on with the steps bellow... +- The child set a flag that tell where it is (TARGET_NR_execve or __NR_exit_group, and the arguments passed to that syscall), and that everything is ok (it has not simply died meanwhile). +- The child terminate, which resume the parent's execution. + +The parent then: +- Clear the 'vfork emulation' flag. +- Look at where the child left (was it performing TARGET_NR_execve or __NR_exit_group syscall? What was the arguments passed to the syscall?). This is pretty easy since the child was writing to the parent's memory space the whole time (CLONE_VM). The parent could even use a flag allocated on it's stack before the fork(), since the child will have run with it's own stack during that time (so the parent stack is still intact). +- Now that we know what the child wanted to do (what syscall and which parameters), the parent (which at his point has no more 'leftover' child), can then do a *real* vfork, or otherwise return the proper error code. + +It's a bit far fetched, and I'm far from implying that I know much about QEMU, but this is an idea :-) Sound like it's pretty straightforward though. Basically we just wait for the code between the _clone() function and the _execve/_exit function to complete, at which point we take action and we are in measure to assess the status code (and do the real vfork). + +Regards, +Eric + + +Unfortunately that won't work, because if we do a clone(CLONE_VM) in QEMU that will mean that parent and child share not just the guest address space, but also all the QEMU data structures for the emulated CPUs and also the host libc data structures. Then actions done by the child will update those data structures and break execution of the parent when it resumes. + + +Ok, I taught that could be an issue, but as I said, I don't really know all the internals of QEMU. + +Another idea would be to fork the child, without CLONE_VM, on the initial call to the clone syscall, like it's done right now, and then wait for that child until he call execve or exit syscall. Maybe using some shared memory or IPC to pass the relevant status when the child finally invoke those syscalls. + +When the child finally call one of those, then after signalling the parent about where it is (and the params to the syscall), the child could exit and the parent actually take action. + +Regards, +Eric + + +That way round the child doesn't have the shared memory with the parent, so it can't update the parent's status variable. There's no easy way to say "fork, and then share the guest memory mappings and only the guest memory mappings with the child", because QEMU doesn't currently track what memory the guest has mapped at all. + + +Hello + +Sorry for the delay... + +Actually, you only need the parent to get the status from the child, which can be passed in other way than through common memory. + +The idea is to use pipefd to actually wait for the child to either terminate or successfully call execve. As follow: + + +When the TARGET_NR_clone syscall is trapped, you do: +- Call do_fork(), as currently done +- In do_fork(), at the beginning, if CLONE_VFORK flag is set, keep track of it (i.e. do not clear the flag, just clear the CLONE_VM, as currently done, to do a normal fork, i.e. the child have it's own copy of the memory segments). +- Just before the call to fork(), create a pipefd. +- The parent branch and then (if CLONE_VFORK is set) close the write end of the pipe (it's own copy), and start looping (could be indefinitely, but preferably some sort of timeout logic could be set) on the read fd, waiting continuously for status updates from the child. +- The child branch close the read-end of the pipe (it's own forked copy), set the write-end fd flag FD_CLOEXEC (with fnctl()), and put the write fd into it's QEMU state variables (parent vfork fd). +- The child then move on. + +When the TARGET_NR_execve syscall is trapped (this is in child context), you do: +- Do everything as currently done, up to just before the safe_execve() call. +- Just before the call to safe_execve(), check if the QEMU state variable (parent vfork fd) is defined. If so, tell the the parent (through the pipe), that we are good so far, and about to call execve(). Note that the parent just update the child status, but keep looping endlessly. +- Call the execve(). +- If the above call return, an error occurred. If this occur, check if the QEMU state variable (parent vfork fd) is defined. If so, tell whatever error status you got to the parent (through the pipe). The parent update it's child status, but again, continue to loop endlessly. +- Continue normally. + +That's pretty much the bulk of the work done! What will happen: +- Either the child will eventually call execve, which will succeed, at which point the write end of the pipe will be closed (because we set the pipe to close on execve, with the FD_CLOEXEC flag). +- The child could be playing on us, and try to re-call execve() multiple times (possibly with different arguments, executables path, etc.), but every time, the parent will just receive status update through the pipe. And eventually, the above case will occur (success), and pipe will be closed. +- The child call _exit(), which will close the pipe again. +- The child get some horrible signal, get killed, or whatever else... Pipe still get closed. + +The parent, on it's side, just update the status endlessly, UNTIL the other end of the pipe get closed. At this point, the read() of the pipe will get a 'broken pipe' error. This signal the parent to move on, and return whatever status the child last provided. + +Note that this status could initially be set to an error state (in case the child die or call _exit() before calling execve()). + +The only thing that could make the parent hang is if the child hang (and never call execve() or _exit() or die...). But the beauty is that this is perfectly fine, because that is exactly the required behavior when CLONE_VFORK flag is set (parent wait for the child). + + +This is a lot of description, but should be relatively easy and straightforward to implement. Could this work? + +There are a few examples similar to this on the Web, using pipefd, fork and execve, for different applications. Here, we just pass the status. + +Regards, +Eric + + +> Actually, you only need the parent to get the status from the child, which can be passed in other way than through common memory. + +Certainly, it *can* be, but the glibc code we're trying to run in the guest here doesn't do it in some other way, it uses common memory. Having QEMU effectively pause the parent process until the child has done its execve is certainly possible along the lines you suggest. But that is only half the requirement -- the parent also has to be able to see in its memory space the updates to the status variable that the child has made. + +If you're willing to change the guest code the problem is easy (for instance you could just go back to the old glibc approach). But we need to run the code as it stands. + + +any solution? trying to emulate a closed source amd64 app on my raspberry and i'm getting this error with qemu 5.2.0-rc4 and glibc 2.27. + + +This is an automated cleanup. This bug report has been moved to QEMU's +new bug tracker on gitlab.com and thus gets marked as 'expired' now. +Please continue with the discussion here: + + https://gitlab.com/qemu-project/qemu/-/issues/140 + + diff --git a/results/classifier/118/risc-v/1721275 b/results/classifier/118/risc-v/1721275 new file mode 100644 index 000000000..cf9330bf7 --- /dev/null +++ b/results/classifier/118/risc-v/1721275 @@ -0,0 +1,117 @@ +risc-v: 0.824 +user-level: 0.743 +arm: 0.738 +register: 0.724 +PID: 0.720 +architecture: 0.712 +ppc: 0.710 +permissions: 0.707 +device: 0.695 +VMM: 0.686 +vnc: 0.673 +peripherals: 0.647 +hypervisor: 0.637 +performance: 0.628 +semantic: 0.609 +assembly: 0.600 +virtual: 0.577 +graphic: 0.575 +boot: 0.559 +files: 0.551 +debug: 0.538 +socket: 0.533 +mistranslation: 0.506 +x86: 0.489 +kernel: 0.474 +network: 0.449 +TCG: 0.442 +KVM: 0.415 +i386: 0.256 + +Support more ARM CPUs + +Hi, + +This is an enhancement request, rather than a bug report. + +After some discussions/presentations during the last Linaro Connect (SFO17), I understand that it may be easy to add support for more ARM CPUs in QEMU. I am interested in user-mode, if that matters. + +I'm primarily using QEMU for GCC validations, and I'd like to make sure that GCC doesn't generate instructions not supported by the CPU it's supposed to generate code for. + +I'd like to have: +cortex-m0 +cortex-m4 +cortex-m7 +cortex-m23 +cortex-m33 + +cortex-a35 +cortex-a53 +cortex-a57 + +Is it possible? +Is it the right place to ask? +Should I file separate requests for each? + +Thanks + +M0 is hard, because it's v6M which we don't support. M4 we already have (but only the no-fpu variant). M7 we don't currently have -- what would be the differences from M4? M33 is in the works (it's v8M). M23 is harder, because it's v8M-baseline which is the v8M equivalent to v6M. A53 and A57 we already have. How would A35 differ from A53/A57 ? + + +PS: in general I wouldn't unconditionally trust that QEMU emulating CPU X definitely does not implement any instructions that CPU X doesn't have -- no real world code will notice, and we don't have any mechanism to automatically verify that we didn't accidentally forget to conditionalize an instruction on an architecture feature. + + +Thanks for PS, I thought QEMU was stricter than that. + +Regarding M7 vs M$ or A35 vs A53/A57, even though there's no functional difference, it would be convenient in Makefiles to have CPU=cortex-a53, and use $(CPU) to expand GCC and QEMU options, without having to guess which CPU I should use for QEMU that would match the one I pass to GCC. + +Regarding v[68]M, are there any plans to add support for these variants to QEMU? + + +On 16 October 2017 at 14:16, Christophe Lyon +<email address hidden> wrote: +> Thanks for PS, I thought QEMU was stricter than that. + +Well, I hope we get it right, and we'll treat cases where +we get it wrong as bugs, but we're not testing... + +> Regarding v[68]M, are there any plans to add support for these variants +> to QEMU? + +v8M we're working on. v6M we have no plans for currently. + +thankns +-- PMM + + +We now support Cortex-M0 (v6M) and Cortex-M33 (v8M mainline). We don't have Cortex-M7, Cortex-M23 (v8M baseline) or Cortex-A35. + +In general, adding an extra CPU to QEMU really requires us to have a decent use-case for it, probably including a board model for it, especially for the M-profile CPUs. There are a lot of CPUs out there and I'm not too keen on adding large numbers of them unless there's a real need. + +I'm going to close this bug report because I don't think it really adds anything to have it sitting open indefinitely. + + +Regarding Cortex-M7, I've noticed that unlike Cortex-M4, it supports double precision floating-point. Is DP supported by qemu? + + +Yes, QEMU supports DP (I actually had to go to some lengths to disable the DP support for the M4 :-)) + + +How do I activate it since --cpu cortex-m7 isn't supported? + +You can't for an M-profile CPU. It would work without any further coding beyond getting the ID register values right if we had a Cortex-M7 model, though. + + +It seemed "easy" to add cortex-m7 based on cortex-m4 (copy m4 description, update ID register values), but I realized that QEMU does not support FPv5 which not only supports DP, but also adds new instructions that QEMU does not handle yet (see section A2.5 of the ARMv7-M ARM). + +* Are there plans to implement them? +* If not, how difficult is it? (for a developer not very familiar with the QEMU code base) + + +They are implemented, because they also appear in A-profile. +We just need to set the corresponding MVFR field to enable them. + +Setting MVFR2.FPMISC = 4 will do the job, I believe. + +Good news, I thought at least some of them were not implemented because for instance I couldn't find where VRINTA is handled (I noticed code for NEON_2RM_VRINTA, but I thought there should be something in vfp.decode for VRINT[ANPM]) + diff --git a/results/classifier/118/risc-v/1779120 b/results/classifier/118/risc-v/1779120 new file mode 100644 index 000000000..159c1271e --- /dev/null +++ b/results/classifier/118/risc-v/1779120 @@ -0,0 +1,132 @@ +risc-v: 0.861 +user-level: 0.794 +mistranslation: 0.785 +peripherals: 0.742 +device: 0.733 +permissions: 0.732 +KVM: 0.726 +register: 0.722 +debug: 0.716 +kernel: 0.713 +performance: 0.712 +socket: 0.712 +boot: 0.711 +vnc: 0.708 +network: 0.707 +arm: 0.707 +architecture: 0.706 +assembly: 0.701 +virtual: 0.696 +VMM: 0.696 +PID: 0.693 +ppc: 0.682 +files: 0.679 +semantic: 0.677 +graphic: 0.673 +x86: 0.666 +TCG: 0.663 +hypervisor: 0.622 +i386: 0.604 + +disk missing in the guest contingently when hotplug several virtio scsi disks consecutively + +Hi, I found a bug that disk missing (not all disks missing ) in the guest contingently when hotplug several virtio scsi disks consecutively. After rebooting the guest,the missing disks appear again. + +The guest is centos7.3 running on a centos7.3 host and the scsi controllers are configed with iothread. The scsi controller xml is below: + + <controller type='scsi' index='0' model='virtio-scsi'> + <driver iothread='26'/> + <alias name='scsi0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + </controller> + + +If the scsi controllers are configed without iothread, disks are all can be seen in the guest when hotplug several virtio scsi disks consecutively. + +I think the biggest difference between them is that scsi controllers with iothread call virtio_notify_irqfd to notify guest and scsi controllers without + +iothread call virtio_notify instead. What make it difference? Will interrupts are lost when call virtio_notify_irqfd due to race condition for some unknow reasons? Maybe guys more familiar with scsi dataplane can help. Thanks for your reply! + +Please post the following information: +(host)# rpm -qa | grep qemu-kvm +(guest)# uname -r + +What are the exact steps to reproduce this issue (virsh command-lines and XML)? + +I also met this bug + +Hi, Stefan. +(host)# rpm -qa | grep qemu-kvm +qemu-kvm-2.8.1-25.142.x86_64 +(guest)# uname -r +3.10.0-514.el7.x86_64 + +I also tried the newest version of qemu-kvm, but it also met this issue. +The steps to reproduce this issue are below: + +1)attach four virtio-scsi controller with dataplane to vm. +    <controller type='scsi' index='0' model='virtio-scsi'> +      <driver iothread='1'/> +      <alias name='scsi0'/> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' +function='0x0'/> +    </controller> +    <controller type='scsi' index='1' model='virtio-scsi'> +      <driver iothread='2'/> +      <alias name='scsi1'/> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' +function='0x0'/> +    </controller> +    <controller type='scsi' index='2' model='virtio-scsi'> +      <driver iothread='3'/> +      <alias name='scsi2'/> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' +function='0x0'/> +    </controller> +    <controller type='scsi' index='3' model='virtio-scsi'> +      <driver iothread='4'/> +      <alias name='scsi3'/> +      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' +function='0x0'/> +    </controller> + +2)attach 35 virtio-scsi disks(sda - sdai) to vm consecutively. One +controller has 15 scsi disks. +A example of disk xml is below: +    <disk type='block' device='lun' rawio='yes'> +      <driver name='qemu' type='raw' cache='none' io='native'/> +      <source dev='/dev/mapper/360022a11000c1e0a0787bb2500000139'/> +      <backingStore/> +      <target dev='sda' bus='scsi'/> +      <shareable/> +      <alias name='scsi0-0-0-0'/> +      <address type='drive' controller='0' bus='0' target='0' unit='0'/> +    </disk> + +   You can write a shell script like this: +      for((i=1;i++;i<=35)) +      do +          virsh attach-device centos7.3_64_server scsi_disk_$i.xml +--config --live +      done + +This issue is a probabilistic event. If it does not appear, repeat the +above steps several more times. +Thank you! + +On 2018/6/28 21:01, Stefan Hajnoczi wrote: +> Please post the following information: +> (host)# rpm -qa | grep qemu-kvm +> (guest)# uname -r +> +> What are the exact steps to reproduce this issue (virsh command-lines +> and XML)? +> + + + +The QEMU project is currently considering to move its bug tracking to another system. For this we need to know which bugs are still valid and which could be closed already. Thus we are setting older bugs to "Incomplete" now. +If you still think this bug report here is valid, then please switch the state back to "New" within the next 60 days, otherwise this report will be marked as "Expired". Or mark it as "Fix Released" if the problem has been solved with a newer version of QEMU already. Thank you and sorry for the inconvenience. + +[Expired for QEMU because there has been no activity for 60 days.] + diff --git a/results/classifier/118/risc-v/1805256 b/results/classifier/118/risc-v/1805256 new file mode 100644 index 000000000..725656016 --- /dev/null +++ b/results/classifier/118/risc-v/1805256 @@ -0,0 +1,2838 @@ +risc-v: 0.827 +peripherals: 0.784 +device: 0.752 +permissions: 0.751 +register: 0.743 +ppc: 0.711 +graphic: 0.696 +TCG: 0.691 +vnc: 0.689 +performance: 0.659 +VMM: 0.658 +network: 0.633 +arm: 0.621 +virtual: 0.615 +x86: 0.610 +semantic: 0.604 +user-level: 0.603 +KVM: 0.597 +mistranslation: 0.588 +hypervisor: 0.577 +socket: 0.572 +PID: 0.571 +debug: 0.564 +architecture: 0.558 +i386: 0.554 +assembly: 0.508 +kernel: 0.505 +boot: 0.442 +files: 0.380 + +qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images + +On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: + +qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 + +Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. + +Once hung, attaching gdb gives the following backtrace: + +(gdb) bt +#0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760, + timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950) + at ../sysdeps/unix/sysv/linux/ppoll.c:39 +#1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, + __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 +#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, + timeout=timeout@entry=-1) at util/qemu-timer.c:322 +#3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1) + at util/main-loop.c:233 +#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 +#5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980 +#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456 +#7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975 + +Reproduced w/ latest QEMU git (@ 53744e0a182) + +Hi, can you do a `thread apply all bt` instead? If I were to bet, we're probably waiting for some slow call like lseek to return in another thread. + +What filesystem/blockdevice is involved here? + +ext4 filesystem, SATA drive: + +(gdb) thread apply all bt + +Thread 3 (Thread 0xffff9bffc9a0 (LWP 9015)): +#0 0x0000ffffaaa462cc in __GI___sigtimedwait (set=<optimized out>, + set@entry=0xaaaae725c070, info=info@entry=0xffff9bffbf18, + timeout=0x3ff0000000000001, timeout@entry=0x0) + at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42 +#1 0x0000ffffaab7dfac in __sigwait (set=set@entry=0xaaaae725c070, + sig=sig@entry=0xffff9bffbff4) at ../sysdeps/unix/sysv/linux/sigwait.c:28 +#2 0x0000aaaad998a628 in sigwait_compat (opaque=0xaaaae725c070) + at util/compatfd.c:36 +#3 0x0000aaaad998bce0 in qemu_thread_start (args=<optimized out>) + at util/qemu-thread-posix.c:498 +#4 0x0000ffffaab73088 in start_thread (arg=0xffffc528531f) + at pthread_create.c:463 +#5 0x0000ffffaaae34ec in thread_start () + at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 + +Thread 2 (Thread 0xffffa0e779a0 (LWP 9014)): +#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 +#1 0x0000aaaad998c9e8 in qemu_futex_wait (val=<optimized out>, f=<optimized out>) + at /home/ubuntu/qemu/include/qemu/futex.h:29 +#2 qemu_event_wait (ev=ev@entry=0xaaaad9a091c0 <rcu_call_ready_event>) + at util/qemu-thread-posix.c:442 +#3 0x0000aaaad99a6834 in call_rcu_thread (opaque=<optimized out>) + at util/rcu.c:261 +#4 0x0000aaaad998bce0 in qemu_thread_start (args=<optimized out>) + at util/qemu-thread-posix.c:498 +#5 0x0000ffffaab73088 in start_thread (arg=0xffffc528542f) + at pthread_create.c:463 +#6 0x0000ffffaaae34ec in thread_start () + at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 + +Thread 1 (Thread 0xffffa0fa8010 (LWP 9013)): +#0 0x0000ffffaaada154 in __GI_ppoll (fds=0xaaaae7291dc0, nfds=187650771816320, + timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc52852e0) + at ../sysdeps/unix/sysv/linux/ppoll.c:39 +#1 0x0000aaaad9987f00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, + __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 +#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, + timeout=timeout@entry=-1) at util/qemu-timer.c:322 +#3 0x0000aaaad9988f80 in os_host_main_loop_wait (timeout=-1) + at util/main-loop.c:233 +#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 +#5 0x0000aaaad98b7a30 in convert_do_copy (s=0xffffc52854e8) at qemu-img.c:1980 +#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456 +#7 0x0000aaaad98b033c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975 + +Hi, I also found a problem that qemu-img convert hands in ARM. + +The convert command line is "qemu-img convert -f qcow2 -O raw disk.qcow2 disk.raw ". + +The bt is below: + +Thread 2 (Thread 0x40000b776e50 (LWP 27215)): +#0 0x000040000a3f2994 in sigtimedwait () from /lib64/libc.so.6 +#1 0x000040000a39c60c in sigwait () from /lib64/libpthread.so.0 +#2 0x0000aaaaaae82610 in sigwait_compat (opaque=0xaaaac5163b00) at util/compatfd.c:37 +#3 0x0000aaaaaae85038 in qemu_thread_start (args=args@entry=0xaaaac5163b90) at util/qemu_thread_posix.c:496 +#4 0x000040000a3918bc in start_thread () from /lib64/libpthread.so.0 +#5 0x000040000a492b2c in thread_start () from /lib64/libc.so.6 + +Thread 1 (Thread 0x40000b573370 (LWP 27214)): +#0 0x000040000a489020 in ppoll () from /lib64/libc.so.6 +#1 0x0000aaaaaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 +#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at qemu_timer.c:391 +#3 0x0000aaaaaadae014 in os_host_main_loop_wait (timeout=<optimized out>) at main_loop.c:272 +#4 0x0000aaaaaadae190 in main_loop_wait (nonblocking=<optimized out>) at main_loop.c:534 +#5 0x0000aaaaaad97be0 in convert_do_copy (s=0xffffdc32eb48) at qemu-img.c:1923 +#6 0x0000aaaaaada2d70 in img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2414 +#7 0x0000aaaaaad99ac4 in main (argc=7, argv=<optimized out>) at qemu-img.c:5305 + + +Do you find the cause of the problem and fix it? Thanks for your reply! + +sorry, I make a spelling mistake here("Hi, I also found a problem that qemu-img convert hands in ARM.").The right is "I also found a problem that qemu-img convert hangs in ARM". + +No, sorry - this bugs still persists w/ latest upstream (@ afccfc0). I found a report of similar symptoms: + + https://patchwork.kernel.org/patch/10047341/ + https://bugzilla.redhat.com/show_bug.cgi?id=1524770#c13 + +To be clear, ^ is already fixed upstream, so it is not the *same* issue - but perhaps related. + + +Do you have any good ideas about it? Maybe somewhere lack of memeory barriers that cause it? + + +frazier, Do you find the conditions that necessarily make this problem appear? + +I can reproduce this problem with qemu.git/matser. It still exists in qemu.git/matser. I found that when an IO return in worker threads and want to call aio_notify to wake up main_loop, but it found that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by calling atomic_and(&ctx->notify_me, ~1) . So worker thread won't write enventfd to notify main_loop.If such a scene happens, the main_loop will hang: + main loop worker thread1 worker thread2 +----------------------------------------------------------------------------------------------- + qemu_poll_ns aio_worker + qemu_bh_schedule(pool->completion_bh) + glib_pollfds_poll + g_main_context_check + aio_ctx_check + atomic_and(&ctx->notify_me, ~1) aio_worker + qemu_bh_schedule(pool->completion_bh) + /* do something for event */ + qemu_poll_ns + /* hangs !!!*/ + +As we known, ctx->notify_me will be visited by worker thread and main loop. I thank we should add a lock protection for ctx->notify_me to avoid this happend.what do you thank so? + +Hello Liz, + +I'll try to reproduce this issue in a Cortex-A53 aarch64 real environment (w/ 24 HW threads) AND in a virtual environment w/ lots of vCPUs... but, if it's a barrier missing - or the lack of atomicity and/or ordering in a primitive - then, I'm afraid the context switch in between vCPUs might not be the same as in real CPUs (IPIs are sent and handled differently and host kernel delays IPI delivery because of its own callbacks, before scheduling, etc...) and I could need a qemu dump from your environment. + +Would that be feasible ? Can you reproduce this nowadays ? This bug has aged a little, so I'm now sure! + +Could you provide me the dump caused by latest package available for your Ubuntu version ? This way I have the debug symbols to work with. + +Meanwhile, I'll be trying to reproduce on my side. + +OOhh nm on the virtual environment test, as I just remembered we don't have KVM on 2nd level for aarch64 yet (at least in ARMv8 implementing virt extension). I'll try to reproduce in the real env only. + +Alright, I couldn't reproduce this yet, I'm running same test case in a 24 cores box and causing lots of context switches and CPU migrations in parallel (trying to exhaust the logic). + +Will let this running for sometime to check. + +Unfortunately this can be related QEMU AIO BH locking/primitives and cache coherency in the HW in question (which I got specs from: https://en.wikichip.org/wiki/hisilicon/kunpeng/hi1616): + +l1$ size 8 MiB +l1d$ size 4 MiB +l1i$ size 4 MiB +l2$ size 32 MiB +l3$ size 64 MiB + +like for example when having 2 threads in different NUMA domains, or some other situation. + +I can't simulate the same since I have a SOC with: + +Cortex-A53 MPCore 24cores, + +L1 I/D=32KB/32KB +L2 =256KB +L3 =4MB + +and I'm not even close to L1/L2/L3 cache numbers from D06 =o). + +Just got a note that I'll be able to reproduce this in the real HW, will get back soon with real gdb debugging. + +Alright, with a d06 aarch64 machine I was able to reproduce it after 8 attempts.I'll debug it today and provide feedback on my findings. + +(gdb) bt full +#0 0x0000ffffb0b2181c in __GI_ppoll (fds=0xaaaace5ab770, nfds=4, timeout=<optimized out>, timeout@entry=0x0, + sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 + _x3tmp = 0 + _x0tmp = 187650583213936 + _x0 = 187650583213936 + _x3 = 0 + _x4tmp = 8 + _x1tmp = 4 + _x1 = 4 + _x4 = 8 + _x2tmp = <optimized out> + _x2 = 0 + _x8 = 73 + _sys_result = <optimized out> + _sys_result = <optimized out> + sc_cancel_oldtype = 0 + sc_ret = <optimized out> + tval = {tv_sec = 0, tv_nsec = 187650583137792} +#1 0x0000aaaacd2a773c in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) + at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 +No locals. +#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at ./util/qemu-timer.c:322 +No locals. +#3 0x0000aaaacd2a8764 in os_host_main_loop_wait (timeout=-1) at ./util/main-loop.c:233 + context = 0xaaaace599d90 + ret = <optimized out> + context = <optimized out> + ret = <optimized out> +#4 main_loop_wait (nonblocking=<optimized out>) at ./util/main-loop.c:497 + ret = <optimized out> + timeout = 4294967295 + timeout_ns = <optimized out> +#5 0x0000aaaacd1df454 in convert_do_copy (s=0xfffff9b2b1d8) at ./qemu-img.c:1981 + ret = <optimized out> + i = <optimized out> + n = <optimized out> + sector_num = <optimized out> + ret = <optimized out> + i = <optimized out> + n = <optimized out> + sector_num = <optimized out> +#6 img_convert (argc=<optimized out>, argv=<optimized out>) at ./qemu-img.c:2457 + c = <optimized out> + bs_i = <optimized out> + flags = 16898 + src_flags = 0 + fmt = 0xfffff9b2bad1 "qcow2" + out_fmt = <optimized out> + cache = 0xaaaacd2cb1c8 "unsafe" + src_cache = 0xaaaacd2ca9c0 "writeback" + out_baseimg = <optimized out> + out_filename = <optimized out> + out_baseimg_param = <optimized out> + snapshot_name = 0x0 + drv = <optimized out> + proto_drv = <optimized out> + bdi = {cluster_size = 65536, vm_state_offset = 32212254720, is_dirty = false, unallocated_blocks_are_zero = true, + needs_compressed_writes = false} + out_bs = <optimized out> + opts = 0xaaaace5ab390 + sn_opts = 0x0 + create_opts = 0xaaaace5ab0c0 + open_opts = <optimized out> + options = 0x0 + local_err = 0x0 + writethrough = false + src_writethrough = false + quiet = <optimized out> + image_opts = false + skip_create = false + progress = <optimized out> + tgt_image_opts = false + ret = <optimized out> + force_share = false + explict_min_sparse = false + s = {src = 0xaaaace577240, src_sectors = 0xaaaace577300, src_num = 1, total_sectors = 62914560,allocated_sectors = 9572096, allocated_done = 6541440, sector_num = 8863744, wr_offs = 8859776, status = BLK_DATA, sector_next_status = 8863744, target = 0xaaaace5bd2a0, has_zero_init = true,compressed = false, unallocated_blocks_are_zero = true, target_has_backing = false, target_backing_sectors = -1, wr_in_order = true, copy_range = false, min_sparse = 8, alignment = 8,cluster_sectors = 128, buf_sectors = 4096, num_coroutines = 8, running_coroutines = 8, co = {0xaaaace5ceda0,0xaaaace5cef50, 0xaaaace5cf100, 0xaaaace5cf2b0, 0xaaaace5cf460, 0xaaaace5cf610, 0xaaaace5cf7c0,0xaaaace5cf970, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, wait_sector_num = {-1, 8859904, 8860928, 8863360,8861952, 8862976, 8862592, 8861440, 0, 0, 0, 0, 0, 0, 0, 0}, lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0}, ret = -115} + __PRETTY_FUNCTION__ = "img_convert" +#7 0x0000aaaacd1d8400 in main (argc=7, argv=<optimized out>) at ./qemu-img.c:4976 + cmd = 0xaaaacd34ad78 <img_cmds+80> + cmdname = <optimized out> + local_error = 0x0 + trace_file = 0x0 + c = <optimized out> + long_options = {{name = 0xaaaacd2cbbb0 "help", has_arg = 0, flag = 0x0, val = 104}, { + name = 0xaaaacd2cbc78 "version", has_arg = 0, flag = 0x0, val = 86}, {name = 0xaaaacd2cbc80 "trace", + has_arg = 1, flag = 0x0, val = 84}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} + +Alright, + +I'm still investigating this but wanted to share some findings... I haven't got a kernel dump yet after the task is frozen, I have analyzed only the userland part of it (although I have checked if code was running inside kernel with perf cycles:u/cycles:k at some point). + +The big picture is this: Whenever qemu-img hangs, we have 3 hung tasks basically with these stacks: + +---- + +TRHREAD #1 +__GI_ppoll (../sysdeps/unix/sysv/linux/ppoll.c:39) +ppoll (/usr/include/aarch64-linux-gnu/bits/poll2.h:77) +qemu_poll_ns (./util/qemu-timer.c:322) +os_host_main_loop_wait (./util/main-loop.c:233) +main_loop_wait (./util/main-loop.c:497) +convert_do_copy (./qemu-img.c:1981) +img_convert (./qemu-img.c:2457) +main (./qemu-img.c:4976) + +got stack traces: + +./33293/stack ./33293/stack +[<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 +[<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sys_poll+0x508/0x5c0 +[<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_ppoll+0xc0/0x118 +[<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168 +[<0>] work_pending+0x8/0x10 [<0>] el0_svc_handler+0x38/0x78 + [<0>] el0_svc+0x8/0xc + +root@d06-1:~$ perf record -F 9999 -e cycles:u -p 33293 -- sleep 10 +[ perf record: Woken up 6 times to write data ] +[ perf record: Captured and wrote 1.871 MB perf.data (48730 samples) ] + +root@d06-1:~$ perf report --stdio +# Overhead Command Shared Object Symbol +# ........ ........ .................. ...................... +# + 37.82% qemu-img libc-2.29.so [.] 0x00000000000df710 + 21.81% qemu-img [unknown] [k] 0xffff000010099504 + 14.23% qemu-img [unknown] [k] 0xffff000010085dc0 + 9.13% qemu-img [unknown] [k] 0xffff00001008fff8 + 6.47% qemu-img libc-2.29.so [.] 0x00000000000df708 + 5.69% qemu-img qemu-img [.] qemu_event_reset + 2.57% qemu-img libc-2.29.so [.] 0x00000000000df678 + 0.63% qemu-img libc-2.29.so [.] 0x00000000000df700 + 0.49% qemu-img libc-2.29.so [.] __sigtimedwait + 0.42% qemu-img libpthread-2.29.so [.] __libc_sigwait + +---- + +TRHREAD #3 +__GI___sigtimedwait (../sysdeps/unix/sysv/linux/sigtimedwait.c:29) +__sigwait (linux/sigwait.c:28) +qemu_thread_start (./util/qemu-thread-posix.c:498) +start_thread (pthread_create.c:486) +thread_start (linux/aarch64/clone.S:78) + + +./33303/stack ./33303/stack +[<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 +[<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sigtimedwait.isra.9+0x194/0x288 +[<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_rt_sigtimedwait+0xac/0x110 +[<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168 +[<0>] work_pending+0x8/0x10 [<0>] el0_svc_handler+0x38/0x78 + [<0>] el0_svc+0x8/0xc + +root@d06-1:~$ perf record -F 9999 -e cycles:u -p 33303 -- sleep 10 +[ perf record: Woken up 6 times to write data ] +[ perf record: Captured and wrote 1.905 MB perf.data (49647 samples) ] + +root@d06-1:~$ perf report --stdio +# Overhead Command Shared Object Symbol +# ........ ........ .................. ...................... +# + 45.37% qemu-img libc-2.29.so [.] 0x00000000000df710 + 23.52% qemu-img [unknown] [k] 0xffff000010099504 + 9.08% qemu-img [unknown] [k] 0xffff00001008fff8 + 8.89% qemu-img [unknown] [k] 0xffff000010085dc0 + 5.56% qemu-img libc-2.29.so [.] 0x00000000000df708 + 3.66% qemu-img libc-2.29.so [.] 0x00000000000df678 + 1.01% qemu-img libc-2.29.so [.] __sigtimedwait + 0.80% qemu-img libc-2.29.so [.] 0x00000000000df700 + 0.64% qemu-img qemu-img [.] qemu_event_reset + 0.55% qemu-img libc-2.29.so [.] 0x00000000000df718 + 0.52% qemu-img libpthread-2.29.so [.] __libc_sigwait + +---- + +TRHREAD #2 +syscall (linux/aarch64/syscall.S:38) +qemu_futex_wait (./util/qemu-thread-posix.c:438) +qemu_event_wait (./util/qemu-thread-posix.c:442) +call_rcu_thread (./util/rcu.c:261) +qemu_thread_start (./util/qemu-thread-posix.c:498) +start_thread (pthread_create.c:486) +thread_start (linux/aarch64/clone.S:78) + +./33302/stack ./33302/stack +[<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 +[<0>] ptrace_stop+0x148/0x2b0 [<0>] ptrace_stop+0x148/0x2b0 +[<0>] get_signal+0x5a4/0x730 [<0>] get_signal+0x5a4/0x730 +[<0>] do_notify_resume+0x1c4/0x358 [<0>] do_notify_resume+0x1c4/0x358 +[<0>] work_pending+0x8/0x10 [<0>] work_pending+0x8/0x10 + +<stack does not change at all> + +root@d06-1:~$ perf report --stdio +# Overhead Command Shared Object Symbol +# ........ ........ .................. ...................... +# + 50.30% qemu-img libc-2.29.so [.] 0x00000000000df710 + 26.44% qemu-img [unknown] [k] 0xffff000010099504 + 5.88% qemu-img libc-2.29.so [.] 0x00000000000df708 + 5.26% qemu-img [unknown] [k] 0xffff000010085dc0 + 5.25% qemu-img [unknown] [k] 0xffff00001008fff8 + 4.25% qemu-img libc-2.29.so [.] 0x00000000000df678 + 0.93% qemu-img libc-2.29.so [.] __sigtimedwait + 0.51% qemu-img libc-2.29.so [.] 0x00000000000df700 + 0.35% qemu-img libpthread-2.29.so [.] __libc_sigwait + +Their stack show those tasks are pretty much "stuck" in same userland program logic, while one of them is stuck at the same program counter address. Profiling those tasks give no much information without more debugging data and less optimizations. + +Although all the 0x000000dfXXX addresses seem broken as we get where libc was mapped (mid heap) and we have: + +(gdb) print __libc_sigwait +$25 = {int (const sigset_t *, int *)} 0xffffbf128080 <__GI___sigwait> + +---- + +Anyway, continuing.... I investigated the qemu_event_{set,reset,xxx} logic. In non Linux OSes it uses pthread primitives, but, for Linux, it uses a futex() implementation with a struct QemuEvent (rcu_call_ready_event) being the one holding values (busy, set, free, etc). + +I got 2 hung situations: + +(gdb) print (struct QemuEvent) *(0xaaaacd35fce8) +$16 = { + value = 4294967295, + initialized = true +} + +value = 4294967295 -> THIS IS A 32-bit 0xFFFF (casting vs overflow issue ?) + +AND + +a situation where value was either 0 or 1 (like expected). In this last situation I changed things by hand to make program to continue its execution: + +void qemu_event_wait(QemuEvent *ev) +{ + unsigned value; + + assert(ev->initialized); + value = atomic_read(&ev->value); + smp_mb_acquire(); + if (value != EV_SET) { + if (value == EV_FREE) { + + if (atomic_cmpxchg(&ev->value, + EV_FREE, EV_BUSY) == EV_SET) { + return; + } + } + qemu_futex_wait(ev, EV_BUSY); + } +} + +438 in ./util/qemu-thread-posix.c + 0x0000aaaaaabd4174 <+44>: mov w1, #0xffffffff // #-1 + 0x0000aaaaaabd4178 <+48>: ldaxr w0, [x19] + 0x0000aaaaaabd417c <+52>: cmp w0, #0x1 + 0x0000aaaaaabd4180 <+56>: b.ne 0xaaaaaabd418c <qemu_event_wait+68> // b.any +=> 0x0000aaaaaabd4184 <+60>: stlxr w2, w1, [x19] + 0x0000aaaaaabd4188 <+64>: cbnz w2, 0xaaaaaabd4178 <qemu_event_wait+48> + 0x0000aaaaaabd418c <+68>: cbz w0, 0xaaaaaabd41cc <qemu_event_wait+132> + 0x0000aaaaaabd4190 <+72>: mov w6, #0x0 // #0 + 0x0000aaaaaabd4194 <+76>: mov x5, #0x0 // #0 + 0x0000aaaaaabd4198 <+80>: mov x4, #0x0 // #0 + 0x0000aaaaaabd419c <+84>: mov w3, #0xffffffff // #-1 + 0x0000aaaaaabd41a0 <+88>: mov w2, #0x0 // #0 + 0x0000aaaaaabd41a4 <+92>: mov x1, x19 + 0x0000aaaaaabd41a8 <+96>: mov x0, #0x62 // #98 + 0x0000aaaaaabd41ac <+100>: bl 0xaaaaaaaff380 <syscall@plt> + +I unblocked it by hand, setting the program counter register to outside that logic: + +(gdb) print qemu_event_wait+132 +$15 = (void (*)(QemuEvent *)) 0xaaaaaabd41cc <qemu_event_wait+132> +(gdb) print rcu_call_ready_event +$16 = {value = 1, initialized = true} +(gdb) set rcu_call_ready_event->value=0 +(gdb) set $pc=0xaaaaaabd41cc + +And it got stuck again with program counter in other STLXR instruction: + +(gdb) thread 2 + +[Switching to thread 2 (Thread 0xffffbec61d90 (LWP 33302))] +#0 0x0000aaaaaabd4110 in qemu_event_reset (ev=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:414 +414 ./util/qemu-thread-posix.c: No such file or directory. +(gdb) bt +#0 0x0000aaaaaabd4110 in qemu_event_reset (ev=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:414 +#1 0x0000aaaaaabedff8 in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:255 +#2 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:498 +#3 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486 +#4 0x0000ffffbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 + +(gdb) print rcu_call_ready_event +$20 = {value = 1, initialized = true} + +(gdb) disassemble qemu_event_reset +Dump of assembler code for function qemu_event_reset: + 0x0000aaaaaabd40f0 <+0>: ldrb w1, [x0, #4] + 0x0000aaaaaabd40f4 <+4>: cbz w1, 0xaaaaaabd411c <qemu_event_reset+44> + 0x0000aaaaaabd40f8 <+8>: ldr w1, [x0] + 0x0000aaaaaabd40fc <+12>: dmb ishld + 0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 <qemu_event_reset+24> + 0x0000aaaaaabd4104 <+20>: ret + 0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0] + 0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1 +=> 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0] + 0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108 <qemu_event_reset+24> + 0x0000aaaaaabd4118 <+40>: ret + +And it does not matter if I continue, CPU keeps stuck in that program counter (again in a STLXR instruction) + +---- + +So, initially I was afraid that the lack barriers (or not so strong ones being used) could have caused a race condition that would make one thread to depend on the other thread logic. + +Unfortunately it looks that instruction STLXR might not be behaving appropriately for this CPU/architecture as program counter seem to be stuck in the same instruction (which is super weird, by not throwing a general exception for some microcode issue, for example). + +But this was just an initial overview, I still have to revisit this in order interpret results better (and recompile qemu with debugging data, and possible with other GCC version). + +Any comments are appreciated. + +Alright, here is what is happening: + +Whenever program is stuck, thread #2 backtrace is this: + +(gdb) bt +#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 +#1 0x0000aaaaaabd41b0 in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at ./util/qemu-thread-posix.c:438 +#2 qemu_event_wait (ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:442 +#3 0x0000aaaaaabee03c in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:261 +#4 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:498 +#5 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486 +#6 0x0000ffffbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 + +Meaning that code is waiting for a futex inside kernel. + +(gdb) print rcu_call_ready_event +$4 = {value = 4294967295, initialized = true} + +The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I don't know why yet. + +rcu_call_ready_event->value is only touched by: + +qemu_event_init() -> bool init ? EV_SET : EV_FREE +qemu_event_reset() -> atomic_or(&ev->value, EV_FREE) +qemu_event_set() -> atomic_xchg(&ev->value, EV_SET) +qemu_event_wait() -> atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY)' + +And there should be no 0x7fff value for "ev->value". + +qemu_event_init() is the one initializing the global: + + static QemuEvent rcu_call_ready_event; + +and it is called by "rcu_init_complete()" which is called by "rcu_init()": + + static void __attribute__((__constructor__)) rcu_init(void) + +a constructor function. + +So, "fixing" this issue by: + + (gdb) print rcu_call_ready_event + $8 = {value = 4294967295, initialized = true} + + (gdb) watch rcu_call_ready_event + Hardware watchpoint 1: rcu_call_ready_event + + (gdb) set rcu_call_ready_event.initialized = 1 + + (gdb) set rcu_call_ready_event.value = 0 + +and note that I added a watchpoint to rcu_call_ready_event global: + +<HANG> + +Thread 1 "qemu-img" received signal SIGINT, Interrupt. +(gdb) thread 2 +[Switching to thread 2 (Thread 0xffffbec61d90 (LWP 33625))] + +(gdb) bt +#0 0x0000aaaaaabd4110 in qemu_event_reset (ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>) +#1 0x0000aaaaaabedff8 in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:255 +#2 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:498 +#3 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486 +#4 0x0000ffffbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 +(gdb) print rcu_call_ready_event +$9 = {value = 0, initialized = true} + +You can see I advanced in the qemu_event_{reset,set,wait} logic. + + (gdb) disassemble /m 0x0000aaaaaabd4110 + Dump of assembler code for function qemu_event_reset: + 408 in ./util/qemu-thread-posix.c + + 409 in ./util/qemu-thread-posix.c + + 410 in ./util/qemu-thread-posix.c + 411 in ./util/qemu-thread-posix.c + 0x0000aaaaaabd40f0 <+0>: ldrb w1, [x0, #4] + 0x0000aaaaaabd40f4 <+4>: cbz w1, 0xaaaaaabd411c <qemu_event_reset+44> + 0x0000aaaaaabd411c <+44>: stp x29, x30, [sp, #-16]! + 0x0000aaaaaabd4120 <+48>: adrp x3, 0xaaaaaac20000 + 0x0000aaaaaabd4124 <+52>: add x3, x3, #0x908 + 0x0000aaaaaabd4128 <+56>: mov x29, sp + 0x0000aaaaaabd412c <+60>: adrp x1, 0xaaaaaac20000 + 0x0000aaaaaabd4130 <+64>: adrp x0, 0xaaaaaac20000 + 0x0000aaaaaabd4134 <+68>: add x3, x3, #0x290 + 0x0000aaaaaabd4138 <+72>: add x1, x1, #0xc00 + 0x0000aaaaaabd413c <+76>: add x0, x0, #0xd40 + 0x0000aaaaaabd4140 <+80>: mov w2, #0x19b // #411 + 0x0000aaaaaabd4144 <+84>: bl 0xaaaaaaaff190 <__assert_fail@plt> + + 412 in ./util/qemu-thread-posix.c + 0x0000aaaaaabd40f8 <+8>: ldr w1, [x0] + + 413 in ./util/qemu-thread-posix.c + 0x0000aaaaaabd40fc <+12>: dmb ishld + + 414 in ./util/qemu-thread-posix.c + 0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 <qemu_event_reset+24> + 0x0000aaaaaabd4104 <+20>: ret + 0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0] + 0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1 + => 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0] + 0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108 <qemu_event_reset+24> + 0x0000aaaaaabd4118 <+40>: ret + +And I'm currently inside the STLXR and LDAXR logic. To make sure my program counter is advancing, I added a breakpoint at 0x0000aaaaaabd4108, so CBNZ instruction would branch indefinitely into LDXAR instruction again, until the +LDAXR<->STLXR logic is satisfied (inside qemu_event_wait()). + +(gdb) break *(0x0000aaaaaabd4108) +Breakpoint 2 at 0xaaaaaabd4108: file ./util/qemu-thread-posix.c, line 414. + +which is basically this: + + if (value == EV_SET) { EV_SET == 0 + atomic_or(&ev->value, EV_FREE); EV_FREE = 1 + } + +and we can see that this logic being called one time after another: + +(gdb) c +Thread 2 "qemu-img" hit Breakpoint 3, 0x0000aaaaaabd4108 in qemu_event_reset ( + ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:414 + +(gdb) c +Thread 2 "qemu-img" hit Breakpoint 3, 0x0000aaaaaabd4108 in qemu_event_reset ( + ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:414 + +(gdb) c +Thread 2 "qemu-img" hit Breakpoint 3, 0x0000aaaaaabd4108 in qemu_event_reset ( + ev=ev@entry=0xaaaaaac87ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:414 + +EVEN when rcu_call_ready_event->value is already EV_SET (0): + +(gdb) print rcu_call_ready_event +$11 = {value = 0, initialized = true} + +(gdb) info break +Num Type Disp Enb Address What +1 hw watchpoint keep y rcu_call_ready_event +3 breakpoint keep n 0x0000aaaaaabd4108 qemu-thread-posix.c:414 + breakpoint already hit 23 times +4 breakpoint keep y 0x0000aaaaaabd4148 qemu-thread-posix.c:424 + +IF I enable only rcu_call_ready_event HW watchpoint, nothing is triggered. + +(gdb) watch *(rcu_call_ready_event->value) +Hardware watchpoint 6: *(rcu_call_ready_event->value) + +not if I set it directly to QemuEvent->value... + + assert(ev->initialized); + value = atomic_read(&ev->value); + smp_mb_acquire(); + if (value == EV_SET) { + atomic_or(&ev->value, EV_FREE); + } + +meaning that "value" and "ev->value" might have a diff value... is that so ? + +(gdb) print value +$14 = <optimized out> + +can't say.. checking registers AND stack: + + 0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 <qemu_event_reset+24> + 0x0000aaaaaabd4104 <+20>: ret + 0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0] + 0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1 + => 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0] + 0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108 + + +x0 0xaaaaaac87ce8 187649986428136 +x1 0x1 1 +x2 0x1 1 +x3 0x0 0 +x4 0xffffbec61e98 281473882398360 +x5 0xffffbec61c90 281473882397840 +x6 0xffffbec61c90 281473882397840 +x7 0x1 1 +x8 0x65 101 +x9 0x0 0 +x10 0x0 0 +x11 0x0 0 +x12 0xffffbec61d90 281473882398096 +x13 0x0 0 +x14 0x0 0 +x15 0x2 2 +x16 0xffffbf67ccf0 281473892994288 +x17 0xffffbf274938 281473888766264 +x18 0x23f 575 +x19 0x0 0 +x20 0xaaaaaac87ce8 187649986428136 +x21 0x0 0 +x22 0xfffffffff5bf 281474976708031 +x23 0xaaaaaac87ce0 187649986428128 +x24 0xaaaaaac29000 187649986039808 +x25 0xfffffffff658 281474976708184 +x26 0x1000 4096 +x27 0xffffbf28c000 281473888862208 +x28 0xffffbec61d90 281473882398096 +x29 0xffffbec61420 281473882395680 +x30 0xaaaaaabedff8 187649985798136 +sp 0xffffbec61420 0xffffbec61420 +pc 0xaaaaaabd4110 0xaaaaaabd4110 <qemu_event_reset+32> +cpsr 0x0 [ EL=0 ] +fpsr 0x0 0 +fpcr 0x0 0 + +AND the ORR instruction is ALWAYS being executed against 0x1 (not 0x0, which is what I just changed by changing .value): + +(gdb) print value +$14 = <optimized out> + + 0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1 + +#0x1 is being used instead of contents of "value" local variable (volatile). + +I'll recompile QEMU flagging all those local "unsigned value" variables as being volatile and check if optimization changes. Or even try to disable optimizations. + + +QEMU BUG: #1 + +Alright, one of the issues is (according to comment #14): + +""" +Meaning that code is waiting for a futex inside kernel. + +(gdb) print rcu_call_ready_event +$4 = {value = 4294967295, initialized = true} + +The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I don't know why yet. + +rcu_call_ready_event->value is only touched by: + +qemu_event_init() -> bool init ? EV_SET : EV_FREE +qemu_event_reset() -> atomic_or(&ev->value, EV_FREE) +qemu_event_set() -> atomic_xchg(&ev->value, EV_SET) +qemu_event_wait() -> atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY)' +""" + +Now I know why rcu_call_ready_event->value is set to INT_MAX. That is because in the following declaration: + +struct QemuEvent { +#ifndef __linux__ + pthread_mutex_t lock; + pthread_cond_t cond; +#endif + unsigned value; + bool initialized; +}; + +#define EV_SET 0 +#define EV_FREE 1 +#define EV_BUSY -1 + +"value" is declared as unsigned, but EV_BUSY sets it to -1, and, according to the Two's Complement Operation (https://en.wikipedia.org/wiki/Two%27s_complement), it will be INT_MAX (4294967295). + +So this is the "first bug" found AND it is definitely funny that this hasn't been seen in other architectures at all... I can reproduce it at will. + +With that said, it seems that there is still another issue causing (less frequently): + +(gdb) thread 2 +[Switching to thread 2 (Thread 0xffffbec5ad90 (LWP 17459))] +#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 +38 ../sysdeps/unix/sysv/linux/aarch64/syscall.S: No such file or directory. +(gdb) bt +#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 +#1 0x0000aaaaaabd41cc in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at ./util/qemu-thread-posix.c:438 +#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:442 +#3 0x0000aaaaaabed05c in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:261 +#4 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:498 +#5 0x0000ffffbf25c880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486 +#6 0x0000ffffbf1b6b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 + +Thread 2 to be stuck at "futex()" kernel syscall (like the FUTEX_WAKE never happened and/or wasn't atomic for this arch/binary). Need to investigate this also. + +Paolo, + +While debugging hungs in ARM64 while doing a simple: + +qemu-img convert -f qcow2 -O qcow2 file.qcow2 output.qcow2 + +I might have found 2 issues which I'd like you to review, if possible. + +ISSUE #1 +======== + +I've caught the following stack trace after an HUNG in qemu-img convert: + +(gdb) bt +#0 syscall () +#1 0x0000aaaaaabd41cc in qemu_futex_wait +#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) +#3 0x0000aaaaaabed05c in call_rcu_thread +#4 0x0000aaaaaabd34c8 in qemu_thread_start +#5 0x0000ffffbf25c880 in start_thread +#6 0x0000ffffbf1b6b9c in thread_start () + +(gdb) print rcu_call_ready_event +$4 = {value = 4294967295, initialized = true} + +value INT_MAX (4294967295) seems WRONG for qemu_futex_wait(): + +- EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *, +unsigned), is a two's complement, making argument into a INT_MAX when +that's not what is expected (unless I missed something). + +*** If that is the case, unsure if you, Paolo, prefer declaring +*(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay +here *** + +BUG: description: +https://bugs.launchpad.net/qemu/+bug/1805256/comments/15 + +======== +ISSUE #2 +======== + +I found this when debugging lockups while in futex() in a specific ARM64 +server - https://bugs.launchpad.net/qemu/+bug/1805256 - which I'm still +investigating. + +After fixing the issue above, I'm still getting stuck into: + +qemu_event_wait() -> qemu_futex_wait() + +*** +As if qemu_event_set() has ran before qemu_futex_wait() ever started running +*** + +The Other threads are waiting for poll() on a PIPE coming from this +stuck thread (thread #1), and in sigwait(): + +(gdb) thread 1 +... +(gdb) bt +#0 0x0000ffffbf1ad81c in __GI_ppoll +#1 0x0000aaaaaabcf73c in ppoll +#2 qemu_poll_ns +#3 0x0000aaaaaabd0764 in os_host_main_loop_wait +#4 main_loop_wait +... + +(gdb) thread 2 +... +(gdb) bt +#0 syscall () +#1 0x0000aaaaaabd41cc in qemu_futex_wait +#2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) +#3 0x0000aaaaaabed05c in call_rcu_thread +#4 0x0000aaaaaabd34c8 in qemu_thread_start +#5 0x0000ffffbf25c880 in start_thread +#6 0x0000ffffbf1b6b9c in thread_start () + +(gdb) thread 3 +... +(gdb) bt +#0 0x0000ffffbf11aa20 in __GI___sigtimedwait +#1 0x0000ffffbf2671b4 in __sigwait +#2 0x0000aaaaaabd1ddc in sigwait_compat +#3 0x0000aaaaaabd34c8 in qemu_thread_start +#4 0x0000ffffbf25c880 in start_thread +#5 0x0000ffffbf1b6b9c in thread_start + +QUESTION: + +- Should qemu_event_set() check return code from +qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY +waiter was ever woken up ? Maybe even loop until at least 1 is awaken ? + +Tks in advance, + +Rafael D. Tinoco + + +In comment #14, please disregard the second half of the issue, related to: + + 0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 <qemu_event_reset+24> + 0x0000aaaaaabd4104 <+20>: ret + 0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0] + 0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1 + => 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0] + 0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108 + +Duh! This is just a regular load/xor/store logic for atomic_or() inside qemu_event_reset(). + +Quick update... + +> value INT_MAX (4294967295) seems WRONG for qemu_futex_wait(): +> +> - EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *, +> unsigned), is a two's complement, making argument into a INT_MAX when +> that's not what is expected (unless I missed something). +> +> *** If that is the case, unsure if you, Paolo, prefer declaring +> *(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay +> here *** +> +> BUG: description: +> https://bugs.launchpad.net/qemu/+bug/1805256/comments/15 + +I realized this might be intentional, but, still, I tried: + + https://pastebin.ubuntu.com/p/6rkkY6fJdm/ + +looking for anything that could have misbehaved in arm64 (specially +concerned on casting and type conversions between the functions). + +> QUESTION: +> +> - Should qemu_event_set() check return code from +> qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY +> waiter was ever woken up ? Maybe even loop until at least 1 is awaken ? + +And I also tried: + +- qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0); ++ while(qemu_futex(pval, FUTEX_WAKE, val, NULL, NULL, 0) == 0) ++ continue; + +and it made little difference (took way more time for me to reproduce +the issue though): + +""" +(gdb) run +Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 +./disk01.ext4.qcow2 ./output.qcow2 + +[New Thread 0xffffbec5ad90 (LWP 72839)] +[New Thread 0xffffbe459d90 (LWP 72840)] +[New Thread 0xffffbdb57d90 (LWP 72841)] +[New Thread 0xffffacac9d90 (LWP 72859)] +[New Thread 0xffffa7ffed90 (LWP 72860)] +[New Thread 0xffffa77fdd90 (LWP 72861)] +[New Thread 0xffffa6ffcd90 (LWP 72862)] +[New Thread 0xffffa67fbd90 (LWP 72863)] +[New Thread 0xffffa5ffad90 (LWP 72864)] + +[Thread 0xffffa5ffad90 (LWP 72864) exited] +[Thread 0xffffa6ffcd90 (LWP 72862) exited] +[Thread 0xffffa77fdd90 (LWP 72861) exited] +[Thread 0xffffbdb57d90 (LWP 72841) exited] +[Thread 0xffffa67fbd90 (LWP 72863) exited] +[Thread 0xffffacac9d90 (LWP 72859) exited] +[Thread 0xffffa7ffed90 (LWP 72860) exited] + +<HUNG w/ 3 threads in the stack trace showed before> +""" + +All the tasks left are blocked in a system call, so no task left to call +qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock +thread #1 (doing poll() in a pipe with thread #2). + +Those 7 threads exit before disk conversion is complete (sometimes in +the beginning, sometimes at the end). + +I'll try to check why those tasks exited. + +Any thoughts ? + +Tks + + +> Zhengui's theory that notify_me doesn't work properly on ARM is more +> promising, but he couldn't provide a clear explanation of why he thought +> notify_me is involved. In particular, I would have expected notify_me to +> be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example: +> +> +> glib_pollfds_fill +> g_main_context_prepare +> aio_ctx_prepare +> atomic_or(&ctx->notify_me, 1) +> qemu_poll_ns +> glib_pollfds_poll +> g_main_context_check +> aio_ctx_check +> atomic_and(&ctx->notify_me, ~1) +> g_main_context_dispatch +> aio_ctx_dispatch +> /* do something for event */ +> qemu_poll_ns +> + +Paolo, + +I tried confining execution in a single NUMA domain (cpu & mem) and +still faced the issue, then, I added a mutex "ctx->notify_me_lcktest" +into context to protect "ctx->notify_me", like showed bellow, and it +seems to have either fixed or mitigated it. + +I was able to cause the hung once every 3 or 4 runs. I have already ran +qemu-img convert more than 30 times now and couldn't reproduce it again. + +Next step is to play with the barriers and check why existing ones +aren't enough for ordering access to ctx->notify_me ... or should I +try/do something else in your opinion ? + +This arch/machine (Huawei D06): + +$ lscpu +Architecture: aarch64 +Byte Order: Little Endian +CPU(s): 96 +On-line CPU(s) list: 0-95 +Thread(s) per core: 1 +Core(s) per socket: 48 +Socket(s): 2 +NUMA node(s): 4 +Vendor ID: 0x48 +Model: 0 +Stepping: 0x0 +CPU max MHz: 2000.0000 +CPU min MHz: 200.0000 +BogoMIPS: 200.00 +L1d cache: 64K +L1i cache: 64K +L2 cache: 512K +L3 cache: 32768K +NUMA node0 CPU(s): 0-23 +NUMA node1 CPU(s): 24-47 +NUMA node2 CPU(s): 48-71 +NUMA node3 CPU(s): 72-95 +Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics +cpuid asimdrdm dcpop + +---- + +diff --git a/include/block/aio.h b/include/block/aio.h +index 0ca25dfec6..0724086d91 100644 +--- a/include/block/aio.h ++++ b/include/block/aio.h +@@ -84,6 +84,7 @@ struct AioContext { + * dispatch phase, hence a simple counter is enough for them. + */ + uint32_t notify_me; ++ QemuMutex notify_me_lcktest; + + /* A lock to protect between QEMUBH and AioHandler adders and deleter, + * and to ensure that no callbacks are removed while we're walking and +diff --git a/util/aio-posix.c b/util/aio-posix.c +index 51c41ed3c9..031d6e2997 100644 +--- a/util/aio-posix.c ++++ b/util/aio-posix.c +@@ -529,7 +529,9 @@ static bool run_poll_handlers(AioContext *ctx, +int64_t max_ns, int64_t *timeout) + bool progress; + int64_t start_time, elapsed_time; + ++ qemu_mutex_lock(&ctx->notify_me_lcktest); + assert(ctx->notify_me); ++ qemu_mutex_unlock(&ctx->notify_me_lcktest); + assert(qemu_lockcnt_count(&ctx->list_lock) > 0); + + trace_run_poll_handlers_begin(ctx, max_ns, *timeout); +@@ -601,8 +603,10 @@ bool aio_poll(AioContext *ctx, bool blocking) + * so disable the optimization now. + */ + if (blocking) { ++ qemu_mutex_lock(&ctx->notify_me_lcktest); + assert(in_aio_context_home_thread(ctx)); + atomic_add(&ctx->notify_me, 2); ++ qemu_mutex_unlock(&ctx->notify_me_lcktest); + } + + qemu_lockcnt_inc(&ctx->list_lock); +@@ -647,8 +651,10 @@ bool aio_poll(AioContext *ctx, bool blocking) + } + + if (blocking) { ++ qemu_mutex_lock(&ctx->notify_me_lcktest); + atomic_sub(&ctx->notify_me, 2); + aio_notify_accept(ctx); ++ qemu_mutex_unlock(&ctx->notify_me_lcktest); + } + + /* Adjust polling time */ +diff --git a/util/async.c b/util/async.c +index c10642a385..140e1e86f5 100644 +--- a/util/async.c ++++ b/util/async.c +@@ -221,7 +221,9 @@ aio_ctx_prepare(GSource *source, gint *timeout) + { + AioContext *ctx = (AioContext *) source; + ++ qemu_mutex_lock(&ctx->notify_me_lcktest); + atomic_or(&ctx->notify_me, 1); ++ qemu_mutex_unlock(&ctx->notify_me_lcktest); + + /* We assume there is no timeout already supplied */ + *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)); +@@ -239,8 +241,10 @@ aio_ctx_check(GSource *source) + AioContext *ctx = (AioContext *) source; + QEMUBH *bh; + ++ qemu_mutex_lock(&ctx->notify_me_lcktest); + atomic_and(&ctx->notify_me, ~1); + aio_notify_accept(ctx); ++ qemu_mutex_unlock(&ctx->notify_me_lcktest); + + for (bh = ctx->first_bh; bh; bh = bh->next) { + if (bh->scheduled) { +@@ -346,11 +350,13 @@ void aio_notify(AioContext *ctx) + /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs + * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. + */ +- smp_mb(); ++ //smp_mb(); ++ qemu_mutex_lock(&ctx->notify_me_lcktest); + if (ctx->notify_me) { + event_notifier_set(&ctx->notifier); + atomic_mb_set(&ctx->notified, true); + } ++ qemu_mutex_unlock(&ctx->notify_me_lcktest); + } + + void aio_notify_accept(AioContext *ctx) +@@ -424,6 +430,8 @@ AioContext *aio_context_new(Error **errp) + ctx->co_schedule_bh = aio_bh_new(ctx, co_schedule_bh_cb, ctx); + QSLIST_INIT(&ctx->scheduled_coroutines); + ++ qemu_rec_mutex_init(&ctx->notify_me_lcktest); ++ + aio_set_event_notifier(ctx, &ctx->notifier, + false, + (EventNotifierHandler *) + + +On Wed, Sep 11, 2019 at 04:09:25PM -0300, Rafael David Tinoco wrote: +> > Zhengui's theory that notify_me doesn't work properly on ARM is more +> > promising, but he couldn't provide a clear explanation of why he thought +> > notify_me is involved. In particular, I would have expected notify_me to +> > be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example: +> > +> > +> > glib_pollfds_fill +> > g_main_context_prepare +> > aio_ctx_prepare +> > atomic_or(&ctx->notify_me, 1) +> > qemu_poll_ns +> > glib_pollfds_poll +> > g_main_context_check +> > aio_ctx_check +> > atomic_and(&ctx->notify_me, ~1) +> > g_main_context_dispatch +> > aio_ctx_dispatch +> > /* do something for event */ +> > qemu_poll_ns +> > +> +> Paolo, +> +> I tried confining execution in a single NUMA domain (cpu & mem) and +> still faced the issue, then, I added a mutex "ctx->notify_me_lcktest" +> into context to protect "ctx->notify_me", like showed bellow, and it +> seems to have either fixed or mitigated it. +> +> I was able to cause the hung once every 3 or 4 runs. I have already ran +> qemu-img convert more than 30 times now and couldn't reproduce it again. +> +> Next step is to play with the barriers and check why existing ones +> aren't enough for ordering access to ctx->notify_me ... or should I +> try/do something else in your opinion ? +> +> This arch/machine (Huawei D06): +> +> $ lscpu +> Architecture: aarch64 +> Byte Order: Little Endian +> CPU(s): 96 +> On-line CPU(s) list: 0-95 +> Thread(s) per core: 1 +> Core(s) per socket: 48 +> Socket(s): 2 +> NUMA node(s): 4 +> Vendor ID: 0x48 +> Model: 0 +> Stepping: 0x0 +> CPU max MHz: 2000.0000 +> CPU min MHz: 200.0000 +> BogoMIPS: 200.00 +> L1d cache: 64K +> L1i cache: 64K +> L2 cache: 512K +> L3 cache: 32768K +> NUMA node0 CPU(s): 0-23 +> NUMA node1 CPU(s): 24-47 +> NUMA node2 CPU(s): 48-71 +> NUMA node3 CPU(s): 72-95 +> Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics +> cpuid asimdrdm dcpop + +Note that I'm also seeing this on a ThunderX2 (same calltrace): + +$ lscpu +Architecture: aarch64 +Byte Order: Little Endian +CPU(s): 224 +On-line CPU(s) list: 0-223 +Thread(s) per core: 4 +Core(s) per socket: 28 +Socket(s): 2 +NUMA node(s): 2 +Vendor ID: Cavium +Model: 1 +Model name: ThunderX2 99xx +Stepping: 0x1 +BogoMIPS: 400.00 +L1d cache: 32K +L1i cache: 32K +L2 cache: 256K +L3 cache: 32768K +NUMA node0 CPU(s): 0-111 +NUMA node1 CPU(s): 112-223 +Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid asimdrdm + + -dann + +> ---- +> +> diff --git a/include/block/aio.h b/include/block/aio.h +> index 0ca25dfec6..0724086d91 100644 +> --- a/include/block/aio.h +> +++ b/include/block/aio.h +> @@ -84,6 +84,7 @@ struct AioContext { +> * dispatch phase, hence a simple counter is enough for them. +> */ +> uint32_t notify_me; +> + QemuMutex notify_me_lcktest; +> +> /* A lock to protect between QEMUBH and AioHandler adders and deleter, +> * and to ensure that no callbacks are removed while we're walking and +> diff --git a/util/aio-posix.c b/util/aio-posix.c +> index 51c41ed3c9..031d6e2997 100644 +> --- a/util/aio-posix.c +> +++ b/util/aio-posix.c +> @@ -529,7 +529,9 @@ static bool run_poll_handlers(AioContext *ctx, +> int64_t max_ns, int64_t *timeout) +> bool progress; +> int64_t start_time, elapsed_time; +> +> + qemu_mutex_lock(&ctx->notify_me_lcktest); +> assert(ctx->notify_me); +> + qemu_mutex_unlock(&ctx->notify_me_lcktest); +> assert(qemu_lockcnt_count(&ctx->list_lock) > 0); +> +> trace_run_poll_handlers_begin(ctx, max_ns, *timeout); +> @@ -601,8 +603,10 @@ bool aio_poll(AioContext *ctx, bool blocking) +> * so disable the optimization now. +> */ +> if (blocking) { +> + qemu_mutex_lock(&ctx->notify_me_lcktest); +> assert(in_aio_context_home_thread(ctx)); +> atomic_add(&ctx->notify_me, 2); +> + qemu_mutex_unlock(&ctx->notify_me_lcktest); +> } +> +> qemu_lockcnt_inc(&ctx->list_lock); +> @@ -647,8 +651,10 @@ bool aio_poll(AioContext *ctx, bool blocking) +> } +> +> if (blocking) { +> + qemu_mutex_lock(&ctx->notify_me_lcktest); +> atomic_sub(&ctx->notify_me, 2); +> aio_notify_accept(ctx); +> + qemu_mutex_unlock(&ctx->notify_me_lcktest); +> } +> +> /* Adjust polling time */ +> diff --git a/util/async.c b/util/async.c +> index c10642a385..140e1e86f5 100644 +> --- a/util/async.c +> +++ b/util/async.c +> @@ -221,7 +221,9 @@ aio_ctx_prepare(GSource *source, gint *timeout) +> { +> AioContext *ctx = (AioContext *) source; +> +> + qemu_mutex_lock(&ctx->notify_me_lcktest); +> atomic_or(&ctx->notify_me, 1); +> + qemu_mutex_unlock(&ctx->notify_me_lcktest); +> +> /* We assume there is no timeout already supplied */ +> *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)); +> @@ -239,8 +241,10 @@ aio_ctx_check(GSource *source) +> AioContext *ctx = (AioContext *) source; +> QEMUBH *bh; +> +> + qemu_mutex_lock(&ctx->notify_me_lcktest); +> atomic_and(&ctx->notify_me, ~1); +> aio_notify_accept(ctx); +> + qemu_mutex_unlock(&ctx->notify_me_lcktest); +> +> for (bh = ctx->first_bh; bh; bh = bh->next) { +> if (bh->scheduled) { +> @@ -346,11 +350,13 @@ void aio_notify(AioContext *ctx) +> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +> */ +> - smp_mb(); +> + //smp_mb(); +> + qemu_mutex_lock(&ctx->notify_me_lcktest); +> if (ctx->notify_me) { +> event_notifier_set(&ctx->notifier); +> atomic_mb_set(&ctx->notified, true); +> } +> + qemu_mutex_unlock(&ctx->notify_me_lcktest); +> } +> +> void aio_notify_accept(AioContext *ctx) +> @@ -424,6 +430,8 @@ AioContext *aio_context_new(Error **errp) +> ctx->co_schedule_bh = aio_bh_new(ctx, co_schedule_bh_cb, ctx); +> QSLIST_INIT(&ctx->scheduled_coroutines); +> +> + qemu_rec_mutex_init(&ctx->notify_me_lcktest); +> + +> aio_set_event_notifier(ctx, &ctx->notifier, +> false, +> (EventNotifierHandler *) +> + + +I've looked into this on ThunderX2. The arm64 code generated for the +atomic_[add|sub] accesses of ctx->notify_me doesn't contain any +memory barriers. It is just plain ldaxr/stlxr. + +From my understanding this is not sufficient for SMP sync. + +If I read this comment correct: + + void aio_notify(AioContext *ctx) + { + /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs + * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. + */ + smp_mb(); + if (ctx->notify_me) { + +it points out that the smp_mb() should be paired. But as +I said the used atomics don't generate any barriers at all. + +I've tried to verify me theory with this patch and didn't run into the +issue for ~500 iterations (usually I would trigger the issue ~20 iterations). + +--Jan + +diff --git a/util/aio-posix.c b/util/aio-posix.c +index d8f0cb4af8dd..d07dcd4e9993 100644 +--- a/util/aio-posix.c ++++ b/util/aio-posix.c +@@ -591,6 +591,7 @@ bool aio_poll(AioContext *ctx, bool blocking) + */ + if (blocking) { + atomic_add(&ctx->notify_me, 2); ++ smp_mb(); + } + + qemu_lockcnt_inc(&ctx->list_lock); +@@ -632,6 +633,7 @@ bool aio_poll(AioContext *ctx, bool blocking) + + if (blocking) { + atomic_sub(&ctx->notify_me, 2); ++ smp_mb(); + } + + /* Adjust polling time */ +diff --git a/util/async.c b/util/async.c +index 4dd9d95a9e73..92ac209c4615 100644 +--- a/util/async.c ++++ b/util/async.c +@@ -222,6 +222,7 @@ aio_ctx_prepare(GSource *source, gint *timeout) + AioContext *ctx = (AioContext *) source; + + atomic_or(&ctx->notify_me, 1); ++ smp_mb(); + + /* We assume there is no timeout already supplied */ + *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)); +@@ -240,6 +241,7 @@ aio_ctx_check(GSource *source) + QEMUBH *bh; + + atomic_and(&ctx->notify_me, ~1); ++ smp_mb(); + aio_notify_accept(ctx); + + for (bh = ctx->first_bh; bh; bh = bh->next) { + + +Debug files for aio-posix generated on 18.04 on ThunderX2. + +Compiler: +gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) + +Distro: +Ubuntu 18.04.3 LTS + +On Wed, Oct 02, 2019 at 11:45:19AM +0200, Paolo Bonzini wrote: +> On 02/10/19 11:23, Jan Glauber wrote: +> > I've tried to verify me theory with this patch and didn't run into the +> > issue for ~500 iterations (usually I would trigger the issue ~20 iterations). +> +> Awesome! That would be a compiler bug though, as atomic_add and atomic_sub +> are defined as sequentially consistent: +> +> #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)) +> #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)) + +Compiler bug sounds kind of unlikely... + +> What compiler are you using and what distro? Can you compile util/aio-posix.c +> with "-fdump-rtl-all -fdump-tree-all", zip the boatload of debugging files and +> send them my way? + +This is on Ubuntu 18.04.3, +gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) + +I've uploaded the debug files to: +https://bugs.launchpad.net/qemu/+bug/1805256/+attachment/5293619/+files/aio-posix.tar.xz + +Thanks, +Jan + +> Thanks, +> +> Paolo + + +Documenting this here as bug# was dropped from the mail thread: + +On 02/10/19 13:05, Jan Glauber wrote: +> The arm64 code generated for the +> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any +> memory barriers. It is just plain ldaxr/stlxr. +> +> From my understanding this is not sufficient for SMP sync. +> +>>> If I read this comment correct: +>>> +>>> void aio_notify(AioContext *ctx) +>>> { +>>> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +>>> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +>>> */ +>>> smp_mb(); +>>> if (ctx->notify_me) { +>>> +>>> it points out that the smp_mb() should be paired. But as +>>> I said the used atomics don't generate any barriers at all. +>> +>> Awesome! That would be a compiler bug though, as atomic_add and atomic_sub +>> are defined as sequentially consistent: +>> +>> #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)) +>> #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)) +> +> Compiler bug sounds kind of unlikely... +Indeed the assembly produced by the compiler matches for example the +mappings at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. A +small testcase is as follows: + + int ctx_notify_me; + int bh_scheduled; + + int x() + { + int one = 1; + int ret; + __atomic_store(&bh_scheduled, &one, __ATOMIC_RELEASE); // x1 + __atomic_thread_fence(__ATOMIC_SEQ_CST); // x2 + __atomic_load(&ctx_notify_me, &ret, __ATOMIC_RELAXED); // x3 + return ret; + } + + int y() + { + int ret; + __atomic_fetch_add(&ctx_notify_me, 2, __ATOMIC_SEQ_CST); // y1 + __atomic_load(&bh_scheduled, &ret, __ATOMIC_RELAXED); // y2 + return ret; + } + +Here y (which is aio_poll) wants to order the write to ctx->notify_me +before reads of bh->scheduled. However, the processor can speculate the +load of bh->scheduled between the load-acquire and store-release of +ctx->notify_me. So you can have something like: + + thread 0 (y) thread 1 (x) + ----------------------------------- ----------------------------- + y1: load-acq ctx->notify_me + y2: load-rlx bh->scheduled + x1: store-rel bh->scheduled <-- 1 + x2: memory barrier + x3: load-rlx ctx->notify_me + y1: store-rel ctx->notify_me <-- 2 + +Being very puzzled, I tried to put this into cppmem: + + int main() { + atomic_int ctx_notify_me = 0; + atomic_int bh_scheduled = 0; + {{{ { + bh_scheduled.store(1, mo_release); + atomic_thread_fence(mo_seq_cst); + // must be zero since the bug report shows no notification + ctx_notify_me.load(mo_relaxed).readsvalue(0); + } + ||| { + ctx_notify_me.store(2, mo_seq_cst); + r2=bh_scheduled.load(mo_relaxed); + } + }}}; + return 0; + } + +and much to my surprise, the tool said r2 *can* be 0. Same if I put a +CAS like + + cas_strong_explicit(ctx_notify_me.readsvalue(0), 0, 2, + mo_seq_cst, mo_seq_cst); + +which resembles the code in the test case a bit more. + +I then found a discussion about using the C11 memory model in Linux +(https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html) which contains the +following statement, which is a bit disheartening even though it is +about a different test: + + My first gut feeling was that the assertion should never fire, but + that was wrong because (as I seem to usually forget) the seq-cst + total order is just a constraint but doesn't itself contribute + to synchronizes-with -- but this is different for seq-cst fences. + +and later in the thread: + + Use of C11 atomics to implement Linux kernel atomic operations + requires knowledge of the underlying architecture and the compiler's + implementation, as was noted earlier in this thread. + +Indeed if I add an atomic_thread_fence I get only one valid execution, +where r2 must be 1. This is similar to GCC's bug +https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697, and we can fix it in +QEMU by using __sync_fetch_and_add; in fact cppmem also shows one valid +execution if the store is replaced with something like GCC's assembly +for __sync_fetch_and_add (or Linux's assembly for atomic_add_return): + + cas_strong_explicit(ctx_notify_me.readsvalue(0), 0, 2, + mo_release, mo_release); + atomic_thread_fence(mo_seq_cst); + +So we should: + +1) understand why ATOMIC_SEQ_CST is not enough in this case. QEMU code +seems to be making the same assumptions as Linux about the memory model, +and this is wrong because QEMU uses C11 atomics if available. +Fortunately, this kind of synchronization in QEMU is relatively rare and +only this particular bit seems affected. If there is a fix which stays +within the C11 memory model, and does not pessimize code on x86, we can +use it[1] and document the pitfall. + +2) if there's no way to fix the bug, qemu/atomic.h needs to switch to +__sync_fetch_and_add and friends. And again, in this case the +difference between the C11 and Linux/QEMU memory models must be documented. + +Torvald, Will, help me please... :(( + +Paolo + +[1] as would be the case if fetch_add was implemented as +fetch_add(RELEASE)+thread_fence(SEQ_CST). + + + + + +On Wed, 2019-10-02 at 15:20 +0200, Paolo Bonzini wrote: +> On 02/10/19 13:05, Jan Glauber wrote: +>> The arm64 code generated for the +>> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any +>> memory barriers. It is just plain ldaxr/stlxr. +>> +>> From my understanding this is not sufficient for SMP sync. +>> +>>>> If I read this comment correct: +>>>> +>>>> void aio_notify(AioContext *ctx) +>>>> { +>>>> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +>>>> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +>>>> */ +>>>> smp_mb(); +>>>> if (ctx->notify_me) { +>>>> +>>>> it points out that the smp_mb() should be paired. But as +>>>> I said the used atomics don't generate any barriers at all. +>>> +>>> Awesome! That would be a compiler bug though, as atomic_add and atomic_sub +>>> are defined as sequentially consistent: +>>> +>>> #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)) +>>> #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)) +>> +>> Compiler bug sounds kind of unlikely... +> +> Indeed the assembly produced by the compiler matches for example the +> mappings at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. A +> small testcase is as follows: +> +> int ctx_notify_me; +> int bh_scheduled; +> +> int x() +> { +> int one = 1; +> int ret; +> __atomic_store(&bh_scheduled, &one, __ATOMIC_RELEASE); // x1 +> __atomic_thread_fence(__ATOMIC_SEQ_CST); // x2 +> __atomic_load(&ctx_notify_me, &ret, __ATOMIC_RELAXED); // x3 +> return ret; +> } +> +> int y() +> { +> int ret; +> __atomic_fetch_add(&ctx_notify_me, 2, __ATOMIC_SEQ_CST); // y1 +> __atomic_load(&bh_scheduled, &ret, __ATOMIC_RELAXED); // y2 +> return ret; +> } +> +> Here y (which is aio_poll) wants to order the write to ctx->notify_me +> before reads of bh->scheduled. However, the processor can speculate the +> load of bh->scheduled between the load-acquire and store-release of +> ctx->notify_me. So you can have something like: +> +> thread 0 (y) thread 1 (x) +> ----------------------------------- ----------------------------- +> y1: load-acq ctx->notify_me +> y2: load-rlx bh->scheduled +> x1: store-rel bh->scheduled <-- 1 +> x2: memory barrier +> x3: load-rlx ctx->notify_me +> y1: store-rel ctx->notify_me <-- 2 +> +> Being very puzzled, I tried to put this into cppmem: +> +> int main() { +> atomic_int ctx_notify_me = 0; +> atomic_int bh_scheduled = 0; +> {{{ { +> bh_scheduled.store(1, mo_release); +> atomic_thread_fence(mo_seq_cst); +> // must be zero since the bug report shows no notification +> ctx_notify_me.load(mo_relaxed).readsvalue(0); +> } +> ||| { +> ctx_notify_me.store(2, mo_seq_cst); +> r2=bh_scheduled.load(mo_relaxed); +> } +> }}}; +> return 0; +> } +> +> and much to my surprise, the tool said r2 *can* be 0. Same if I put a +> CAS like +> +> cas_strong_explicit(ctx_notify_me.readsvalue(0), 0, 2, +> mo_seq_cst, mo_seq_cst); +> +> which resembles the code in the test case a bit more. + +This example looks like Dekker synchronization (if I get the intent right). + +Two possible implementations of this are either (1) with all memory +accesses having seq-cst MO, or (2) with relaxed-MO accesses and seq-cst +fences on between the store and load on both ends. It's possible to mix +both, but that get's trickier I think. I'd prefer the one with just +fences, just because it's easiest, conceptually. + +> I then found a discussion about using the C11 memory model in Linux +> (https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html) which contains the +> following statement, which is a bit disheartening even though it is +> about a different test: +> +> My first gut feeling was that the assertion should never fire, but +> that was wrong because (as I seem to usually forget) the seq-cst +> total order is just a constraint but doesn't itself contribute +> to synchronizes-with -- but this is different for seq-cst fences. + +It works if you use (1) or (2) consistently. cppmem and the Batty et al. +tech report should give you the gory details. +My comment is just about seq-cst working differently on memory accesses vs. +fences (in the way it's specified in the memory model). + +> and later in the thread: +> +> Use of C11 atomics to implement Linux kernel atomic operations +> requires knowledge of the underlying architecture and the compiler's +> implementation, as was noted earlier in this thread. +> +> Indeed if I add an atomic_thread_fence I get only one valid execution, +> where r2 must be 1. This is similar to GCC's bug +> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697, and we can fix it in +> QEMU by using __sync_fetch_and_add; in fact cppmem also shows one valid +> execution if the store is replaced with something like GCC's assembly +> for __sync_fetch_and_add (or Linux's assembly for atomic_add_return): +> +> cas_strong_explicit(ctx_notify_me.readsvalue(0), 0, 2, +> mo_release, mo_release); +> atomic_thread_fence(mo_seq_cst); +> +> So we should: +> +> 1) understand why ATOMIC_SEQ_CST is not enough in this case. QEMU code +> seems to be making the same assumptions as Linux about the memory model, +> and this is wrong because QEMU uses C11 atomics if available. +> Fortunately, this kind of synchronization in QEMU is relatively rare and +> only this particular bit seems affected. If there is a fix which stays +> within the C11 memory model, and does not pessimize code on x86, we can +> use it[1] and document the pitfall. + +Using the fences between the store/load pairs in Dekker-like +synchronization should do that, right? It's also relatively easy to deal +with. + +> 2) if there's no way to fix the bug, qemu/atomic.h needs to switch to +> __sync_fetch_and_add and friends. And again, in this case the +> difference between the C11 and Linux/QEMU memory models must be documented. + +I surely not aware of all the constraints here, but I'd be surprised if the +C11 memory model isn't good enough for portable synchronization code (with +the exception of the consume MO minefield, perhaps). + + + + +On 02/10/19 16:58, Torvald Riegel wrote: +> This example looks like Dekker synchronization (if I get the intent right). + +It is the same pattern. However, one of the two synchronized variables +is a counter rather than just a flag. + +> Two possible implementations of this are either (1) with all memory +> accesses having seq-cst MO, or (2) with relaxed-MO accesses and seq-cst +> fences on between the store and load on both ends. It's possible to mix +> both, but that get's trickier I think. I'd prefer the one with just +> fences, just because it's easiest, conceptually. + +Got it. + +I'd also prefer the one with just fences, because we only really control +one side of the synchronization primitive (ctx_notify_me in my litmus +test) and I don't like the idea of forcing seq-cst MO on the other side +(bh_scheduled). The performance issue that I mentioned is that x86 +doesn't have relaxed fetch and add, so you'd have a redundant fence like +this: + + lock xaddl $2, mem1 + mfence + ... + movl mem1, %r8 + +(Gory QEMU details however allow us to use relaxed load and store here, +because there's only one writer). + +> It works if you use (1) or (2) consistently. cppmem and the Batty et al. +> tech report should give you the gory details. +> +>> 1) understand why ATOMIC_SEQ_CST is not enough in this case. QEMU code +>> seems to be making the same assumptions as Linux about the memory model, +>> and this is wrong because QEMU uses C11 atomics if available. +>> Fortunately, this kind of synchronization in QEMU is relatively rare and +>> only this particular bit seems affected. If there is a fix which stays +>> within the C11 memory model, and does not pessimize code on x86, we can +>> use it[1] and document the pitfall. +> +> Using the fences between the store/load pairs in Dekker-like +> synchronization should do that, right? It's also relatively easy to deal +> with. +> +>> 2) if there's no way to fix the bug, qemu/atomic.h needs to switch to +>> __sync_fetch_and_add and friends. And again, in this case the +>> difference between the C11 and Linux/QEMU memory models must be documented. +> +> I surely not aware of all the constraints here, but I'd be surprised if the +> C11 memory model isn't good enough for portable synchronization code (with +> the exception of the consume MO minefield, perhaps). + +This helps a lot already; I'll work on a documentation and code patch. +Thanks very much. + +Paolo + +>> int main() { +>> atomic_int ctx_notify_me = 0; +>> atomic_int bh_scheduled = 0; +>> {{{ { +>> bh_scheduled.store(1, mo_release); +>> atomic_thread_fence(mo_seq_cst); +>> // must be zero since the bug report shows no notification +>> ctx_notify_me.load(mo_relaxed).readsvalue(0); +>> } +>> ||| { +>> ctx_notify_me.store(2, mo_seq_cst); +>> r2=bh_scheduled.load(mo_relaxed); +>> } +>> }}}; +>> return 0; +>> } + + + + +On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote: +> On 02/10/19 11:23, Jan Glauber wrote: +> > I've looked into this on ThunderX2. The arm64 code generated for the +> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any +> > memory barriers. It is just plain ldaxr/stlxr. +> > +> > From my understanding this is not sufficient for SMP sync. +> > +> > If I read this comment correct: +> > +> > void aio_notify(AioContext *ctx) +> > { +> > /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +> > * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +> > */ +> > smp_mb(); +> > if (ctx->notify_me) { +> > +> > it points out that the smp_mb() should be paired. But as +> > I said the used atomics don't generate any barriers at all. +> +> Based on the rest of the thread, this patch should also fix the bug: +> +> diff --git a/util/async.c b/util/async.c +> index 47dcbfa..721ea53 100644 +> --- a/util/async.c +> +++ b/util/async.c +> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source) +> aio_notify_accept(ctx); +> +> for (bh = ctx->first_bh; bh; bh = bh->next) { +> - if (bh->scheduled) { +> + if (atomic_mb_read(&bh->scheduled)) { +> return true; +> } +> } +> +> +> And also the memory barrier in aio_notify can actually be replaced +> with a SEQ_CST load: +> +> diff --git a/util/async.c b/util/async.c +> index 47dcbfa..721ea53 100644 +> --- a/util/async.c +> +++ b/util/async.c +> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) +> +> void aio_notify(AioContext *ctx) +> { +> - /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +> + /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before +> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or +> + * atomic_add in aio_poll. +> */ +> - smp_mb(); +> - if (ctx->notify_me) { +> + if (atomic_mb_read(&ctx->notify_me)) { +> event_notifier_set(&ctx->notifier); +> atomic_mb_set(&ctx->notified, true); +> } +> +> +> Would you be able to test these (one by one possibly)? + +Sure. + +> > I've tried to verify me theory with this patch and didn't run into the +> > issue for ~500 iterations (usually I would trigger the issue ~20 iterations). +> +> Sorry for asking the obvious---500 iterations of what? + +The testcase mentioned in the Canonical issue: +https://bugs.launchpad.net/qemu/+bug/1805256 + +It's a simple image convert: +qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 + +Usually it got stuck after 3-20 iterations. + +--Jan + + +On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote: +> On 02/10/19 11:23, Jan Glauber wrote: +> > I've looked into this on ThunderX2. The arm64 code generated for the +> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any +> > memory barriers. It is just plain ldaxr/stlxr. +> > +> > From my understanding this is not sufficient for SMP sync. +> > +> > If I read this comment correct: +> > +> > void aio_notify(AioContext *ctx) +> > { +> > /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +> > * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +> > */ +> > smp_mb(); +> > if (ctx->notify_me) { +> > +> > it points out that the smp_mb() should be paired. But as +> > I said the used atomics don't generate any barriers at all. +> +> Based on the rest of the thread, this patch should also fix the bug: +> +> diff --git a/util/async.c b/util/async.c +> index 47dcbfa..721ea53 100644 +> --- a/util/async.c +> +++ b/util/async.c +> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source) +> aio_notify_accept(ctx); +> +> for (bh = ctx->first_bh; bh; bh = bh->next) { +> - if (bh->scheduled) { +> + if (atomic_mb_read(&bh->scheduled)) { +> return true; +> } +> } +> +> +> And also the memory barrier in aio_notify can actually be replaced +> with a SEQ_CST load: +> +> diff --git a/util/async.c b/util/async.c +> index 47dcbfa..721ea53 100644 +> --- a/util/async.c +> +++ b/util/async.c +> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) +> +> void aio_notify(AioContext *ctx) +> { +> - /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +> + /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before +> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or +> + * atomic_add in aio_poll. +> */ +> - smp_mb(); +> - if (ctx->notify_me) { +> + if (atomic_mb_read(&ctx->notify_me)) { +> event_notifier_set(&ctx->notifier); +> atomic_mb_set(&ctx->notified, true); +> } +> +> +> Would you be able to test these (one by one possibly)? + +Paolo, + I tried them both separately and together on a Hi1620 system, each +time it hung in the first iteration. Here's a backtrace of a run with +both patches applied: + +(gdb) thread apply all bt + +Thread 3 (Thread 0xffff8154b820 (LWP 63900)): +#0 0x0000ffff8b9402cc in __GI___sigtimedwait (set=<optimized out>, set@entry=0xaaaaf1e08070, + info=info@entry=0xffff8154ad98, timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42 +#1 0x0000ffff8ba77fac in __sigwait (set=set@entry=0xaaaaf1e08070, sig=sig@entry=0xffff8154ae74) + at ../sysdeps/unix/sysv/linux/sigwait.c:28 +#2 0x0000aaaab7dc1610 in sigwait_compat (opaque=0xaaaaf1e08070) at util/compatfd.c:35 +#3 0x0000aaaab7dc3e80 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519 +#4 0x0000ffff8ba6d088 in start_thread (arg=0xffffceefbf4f) at pthread_create.c:463 +#5 0x0000ffff8b9dd4ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 + +Thread 2 (Thread 0xffff81d4c820 (LWP 63899)): +#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 +#1 0x0000aaaab7dc4cd8 in qemu_futex_wait (val=<optimized out>, f=<optimized out>) + at /home/ubuntu/qemu/include/qemu/futex.h:29 +#2 qemu_event_wait (ev=ev@entry=0xaaaab7e48708 <rcu_call_ready_event>) at util/qemu-thread-posix.c:459 +#3 0x0000aaaab7ddf44c in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:260 +#4 0x0000aaaab7dc3e80 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519 +#5 0x0000ffff8ba6d088 in start_thread (arg=0xffffceefc05f) at pthread_create.c:463 +#6 0x0000ffff8b9dd4ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 + +Thread 1 (Thread 0xffff81e83010 (LWP 63898)): +#0 0x0000ffff8b9d4154 in __GI_ppoll (fds=0xaaaaf1e0dbc0, nfds=187650205809964, timeout=<optimized out>, + timeout@entry=0x0, sigmask=0xffffceefbef0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 +#1 0x0000aaaab7dbedb0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) + at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 +#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:340 +#3 0x0000aaaab7dbfd2c in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:236 +#4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:517 +#5 0x0000aaaab7ce86e8 in convert_do_copy (s=0xffffceefc068) at qemu-img.c:2028 +#6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2520 +#7 0x0000aaaab7ce1e54 in main (argc=8, argv=<optimized out>) at qemu-img.c:5097 + +> > I've tried to verify me theory with this patch and didn't run into the +> > issue for ~500 iterations (usually I would trigger the issue ~20 iterations). +> +> Sorry for asking the obvious---500 iterations of what? + +$ for i in $(seq 1 500); do echo "==$i=="; ./qemu/qemu-img convert -p -f qcow2 -O qcow2 bionic-server-cloudimg-arm64.img out.img; done +==1== + (37.19/100%) + + -dann + + +On Mon, Oct 07, 2019 at 04:58:30PM +0200, Paolo Bonzini wrote: +> On 07/10/19 16:44, dann frazier wrote: +> > On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote: +> >> On 02/10/19 11:23, Jan Glauber wrote: +> >>> I've looked into this on ThunderX2. The arm64 code generated for the +> >>> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any +> >>> memory barriers. It is just plain ldaxr/stlxr. +> >>> +> >>> From my understanding this is not sufficient for SMP sync. +> >>> +> >>> If I read this comment correct: +> >>> +> >>> void aio_notify(AioContext *ctx) +> >>> { +> >>> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +> >>> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +> >>> */ +> >>> smp_mb(); +> >>> if (ctx->notify_me) { +> >>> +> >>> it points out that the smp_mb() should be paired. But as +> >>> I said the used atomics don't generate any barriers at all. +> >> +> >> Based on the rest of the thread, this patch should also fix the bug: +> >> +> >> diff --git a/util/async.c b/util/async.c +> >> index 47dcbfa..721ea53 100644 +> >> --- a/util/async.c +> >> +++ b/util/async.c +> >> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source) +> >> aio_notify_accept(ctx); +> >> +> >> for (bh = ctx->first_bh; bh; bh = bh->next) { +> >> - if (bh->scheduled) { +> >> + if (atomic_mb_read(&bh->scheduled)) { +> >> return true; +> >> } +> >> } +> >> +> >> +> >> And also the memory barrier in aio_notify can actually be replaced +> >> with a SEQ_CST load: +> >> +> >> diff --git a/util/async.c b/util/async.c +> >> index 47dcbfa..721ea53 100644 +> >> --- a/util/async.c +> >> +++ b/util/async.c +> >> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) +> >> +> >> void aio_notify(AioContext *ctx) +> >> { +> >> - /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +> >> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +> >> + /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before +> >> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or +> >> + * atomic_add in aio_poll. +> >> */ +> >> - smp_mb(); +> >> - if (ctx->notify_me) { +> >> + if (atomic_mb_read(&ctx->notify_me)) { +> >> event_notifier_set(&ctx->notifier); +> >> atomic_mb_set(&ctx->notified, true); +> >> } +> >> +> >> +> >> Would you be able to test these (one by one possibly)? +> > +> > Paolo, +> > I tried them both separately and together on a Hi1620 system, each +> > time it hung in the first iteration. Here's a backtrace of a run with +> > both patches applied: +> +> Ok, not a great start... I'll find myself an aarch64 machine and look +> at it more closely. I'd like the patch to be something we can +> understand and document, since this is probably the second most-used +> memory barrier idiom in QEMU. +> +> Paolo + +I'm still not sure what the actual issue is here, but could it be some bad +interaction between the notify_me and the list_lock? The are both 4 byte +and side-by-side: + +address notify_me: 0xaaaadb528aa0 sizeof notify_me: 4 +address list_lock: 0xaaaadb528aa4 sizeof list_lock: 4 + +AFAICS the generated code looks OK (all load/store exclusive done +with 32 bit size): + + e6c: 885ffc01 ldaxr w1, [x0] + e70: 11000821 add w1, w1, #0x2 + e74: 8802fc01 stlxr w2, w1, [x0] + +...but if I bump notify_me size to uint64_t the issue goes away. + +BTW, the image file I convert in the testcase is ~20 GB. + +--Jan + +diff --git a/include/block/aio.h b/include/block/aio.h +index a1d6b9e24939..e8a5ea3860bb 100644 +--- a/include/block/aio.h ++++ b/include/block/aio.h +@@ -83,7 +83,7 @@ struct AioContext { + * Instead, the aio_poll calls include both the prepare and the + * dispatch phase, hence a simple counter is enough for them. + */ +- uint32_t notify_me; ++ uint64_t notify_me; + + /* A lock to protect between QEMUBH and AioHandler adders and deleter, + * and to ensure that no callbacks are removed while we're walking and + + +On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote: +> On 09/10/19 10:02, Jan Glauber wrote: + +> > I'm still not sure what the actual issue is here, but could it be some bad +> > interaction between the notify_me and the list_lock? The are both 4 byte +> > and side-by-side: +> > +> > address notify_me: 0xaaaadb528aa0 sizeof notify_me: 4 +> > address list_lock: 0xaaaadb528aa4 sizeof list_lock: 4 +> > +> > AFAICS the generated code looks OK (all load/store exclusive done +> > with 32 bit size): +> > +> > e6c: 885ffc01 ldaxr w1, [x0] +> > e70: 11000821 add w1, w1, #0x2 +> > e74: 8802fc01 stlxr w2, w1, [x0] +> > +> > ...but if I bump notify_me size to uint64_t the issue goes away. +> +> Ouch. :) Is this with or without my patch(es)? +> +> Also, what if you just add a dummy uint32_t after notify_me? + +With the dummy the testcase also runs fine for 500 iterations. + +Dann, can you try if this works on the Hi1620 too? + +--Jan + + +On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote: +> On 11/10/19 08:05, Jan Glauber wrote: +> > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote: +> >>> ...but if I bump notify_me size to uint64_t the issue goes away. +> >> +> >> Ouch. :) Is this with or without my patch(es)? +> +> You didn't answer this question. + +Oh, sorry... I did but the mail probably didn't make it out. +I have both of your changes applied (as I think they make sense). + +> >> Also, what if you just add a dummy uint32_t after notify_me? +> > +> > With the dummy the testcase also runs fine for 500 iterations. +> +> You might be lucky and causing list_lock to be in another cache line. +> What if you add __attribute__((aligned(16)) to notify_me (and keep the +> dummy)? + +Good point. I'll try to force both into the same cacheline. + +--Jan + +> Paolo +> +> > Dann, can you try if this works on the Hi1620 too? + + +On Fri, Oct 11, 2019 at 06:05:25AM +0000, Jan Glauber wrote: +> On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote: +> > On 09/10/19 10:02, Jan Glauber wrote: +> +> > > I'm still not sure what the actual issue is here, but could it be some bad +> > > interaction between the notify_me and the list_lock? The are both 4 byte +> > > and side-by-side: +> > > +> > > address notify_me: 0xaaaadb528aa0 sizeof notify_me: 4 +> > > address list_lock: 0xaaaadb528aa4 sizeof list_lock: 4 +> > > +> > > AFAICS the generated code looks OK (all load/store exclusive done +> > > with 32 bit size): +> > > +> > > e6c: 885ffc01 ldaxr w1, [x0] +> > > e70: 11000821 add w1, w1, #0x2 +> > > e74: 8802fc01 stlxr w2, w1, [x0] +> > > +> > > ...but if I bump notify_me size to uint64_t the issue goes away. +> > +> > Ouch. :) Is this with or without my patch(es)? +> > +> > Also, what if you just add a dummy uint32_t after notify_me? +> +> With the dummy the testcase also runs fine for 500 iterations. +> +> Dann, can you try if this works on the Hi1620 too? + +On Hi1620, it hung on the first iteration. Here's the complete patch +I'm running with: + +diff --git a/include/block/aio.h b/include/block/aio.h +index 6b0d52f732..e6fd6f1a1a 100644 +--- a/include/block/aio.h ++++ b/include/block/aio.h +@@ -82,7 +82,7 @@ struct AioContext { + * Instead, the aio_poll calls include both the prepare and the + * dispatch phase, hence a simple counter is enough for them. + */ +- uint32_t notify_me; ++ uint64_t notify_me; + + /* A lock to protect between QEMUBH and AioHandler adders and deleter, + * and to ensure that no callbacks are removed while we're walking and +diff --git a/util/async.c b/util/async.c +index ca83e32c7f..024c4c567d 100644 +--- a/util/async.c ++++ b/util/async.c +@@ -242,7 +242,7 @@ aio_ctx_check(GSource *source) + aio_notify_accept(ctx); + + for (bh = ctx->first_bh; bh; bh = bh->next) { +- if (bh->scheduled) { ++ if (atomic_mb_read(&bh->scheduled)) { + return true; + } + } +@@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) + + void aio_notify(AioContext *ctx) + { +- /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +- * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. ++ /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before ++ * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or ++ * atomic_add in aio_poll. + */ +- smp_mb(); +- if (ctx->notify_me) { +- event_notifier_set(&ctx->notifier); ++ if (atomic_mb_read(&ctx->notify_me)) { ++ event_notifier_set(&ctx->notifier); + atomic_mb_set(&ctx->notified, true); + } + } + + +On Fri, Oct 11, 2019 at 08:30:02AM +0000, Jan Glauber wrote: +> On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote: +> > On 11/10/19 08:05, Jan Glauber wrote: +> > > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote: +> > >>> ...but if I bump notify_me size to uint64_t the issue goes away. +> > >> +> > >> Ouch. :) Is this with or without my patch(es)? +> > +> > You didn't answer this question. +> +> Oh, sorry... I did but the mail probably didn't make it out. +> I have both of your changes applied (as I think they make sense). +> +> > >> Also, what if you just add a dummy uint32_t after notify_me? +> > > +> > > With the dummy the testcase also runs fine for 500 iterations. +> > +> > You might be lucky and causing list_lock to be in another cache line. +> > What if you add __attribute__((aligned(16)) to notify_me (and keep the +> > dummy)? +> +> Good point. I'll try to force both into the same cacheline. + +On the Hi1620, this still hangs in the first iteration: + +diff --git a/include/block/aio.h b/include/block/aio.h +index 6b0d52f732..00e56a5412 100644 +--- a/include/block/aio.h ++++ b/include/block/aio.h +@@ -82,7 +82,7 @@ struct AioContext { + * Instead, the aio_poll calls include both the prepare and the + * dispatch phase, hence a simple counter is enough for them. + */ +- uint32_t notify_me; ++ __attribute__((aligned(16))) uint64_t notify_me; + + /* A lock to protect between QEMUBH and AioHandler adders and deleter, + * and to ensure that no callbacks are removed while we're walking and +diff --git a/util/async.c b/util/async.c +index ca83e32c7f..024c4c567d 100644 +--- a/util/async.c ++++ b/util/async.c +@@ -242,7 +242,7 @@ aio_ctx_check(GSource *source) + aio_notify_accept(ctx); + + for (bh = ctx->first_bh; bh; bh = bh->next) { +- if (bh->scheduled) { ++ if (atomic_mb_read(&bh->scheduled)) { + return true; + } + } +@@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) + + void aio_notify(AioContext *ctx) + { +- /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs +- * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. ++ /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before ++ * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or ++ * atomic_add in aio_poll. + */ +- smp_mb(); +- if (ctx->notify_me) { +- event_notifier_set(&ctx->notifier); ++ if (atomic_mb_read(&ctx->notify_me)) { ++ event_notifier_set(&ctx->notifier); + atomic_mb_set(&ctx->notified, true); + } + } + + + include/block/aio.h | 3 +++ + qemu-img.c | 4 ++++ + util/async.c | 5 +---- + 3 files changed, 8 insertions(+), 4 deletions(-) + +diff --git a/include/block/aio.h b/include/block/aio.h +index e9bc04c..9153d87 100644 +--- a/include/block/aio.h ++++ b/include/block/aio.h +@@ -89,6 +89,9 @@ struct AioContext { + */ + uint32_t notify_me; + ++ /* force to notify for qemu-img convert */ ++ bool notify_for_convert; ++ + /* lock to protect between bh's adders and deleter */ + QemuMutex bh_lock; + +diff --git a/qemu-img.c b/qemu-img.c +index 60a2be3..cf037aa 100644 +--- a/qemu-img.c ++++ b/qemu-img.c +@@ -2411,6 +2411,10 @@ static int img_convert(int argc, char **argv) + .wr_in_order = wr_in_order, + .num_coroutines = num_coroutines, + }; ++ ++ AioContext *ctx = qemu_get_aio_context(); ++ ctx->notify_for_convert = 1; ++ + ret = convert_do_copy(&state); + + out: +diff --git a/util/async.c b/util/async.c +index 042bf8a..af235fc 100644 +--- a/util/async.c ++++ b/util/async.c +@@ -336,12 +336,9 @@ void aio_notify(AioContext *ctx) + * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. + */ + smp_mb(); +- if (ctx->notify_me) { ++ if (ctx->notify_me || ctx->notify_for_convert) { + event_notifier_set(&ctx->notifier); + atomic_mb_set(&ctx->notified, true); +-#if defined(__aarch64__) +- kill(getpid(), SIGIO); +-#endif + } + } + +Can you try this aboving patchset to slove it? + + + +I tested the patch in Comment #34, and it was able to pass 500 iterations. + +Hello Fred, + +Based on Dann's feedback on testing, I'm failing to see where your patch fixes the "root" cause (despite being able to mitigate the issue by changing the aio notification mechanism). + +I think the root cause is best described in this 2 emails from the thread: + +https://lore.kernel.org/qemu-devel/20191009080220.GA2905@hc/ + +and + +https://<email address hidden>/ + +So, by adding ctx->notify_for_convert, it is very likely you workarounded the issue by doing what Jan already said: removing both variables (ctx->list_lock and, in old case, ctx->notify_me, in your case, ctx->notify_for_convert) from the same cacheline and making the issue to "disappear" (as we would eventually do in a workaround patch). + +What about aarch64 issue with both, ctx->list_lock and ctx->notify_for_convert, being synchronized by qemu used primitives, and being in the same cache line ? + +Any "workaround" here would try to dodge the same cacheline situation, but, for upstream, I suppose Paolo wants to have something else regarding aarch64 ATOMIC_SEQ_CST. + +like describe in this part of the discussion: + +https://<email address hidden>/ + +Unless I'm missing something, am I ? + +Thank you! + + + + + +I tested the patch in Comment #34, and it was also failed to pass 5 iterations. +Copyright (C) 2018 Free Software Foundation, Inc. +License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. Type "show copying" +and "show warranty" for details. +This GDB was configured as "aarch64-linux-gnu". +Type "show configuration" for configuration details. +For bug reporting instructions, please see: +<http://www.gnu.org/software/gdb/bugs/>. +Find the GDB manual and other documentation resources online at: +<http://www.gnu.org/software/gdb/documentation/>. +For help, type "help". +Type "apropos word" to search for commands related to "word". +Attaching to process 3987 +[New LWP 3988] +[Thread debugging using libthread_db enabled] +Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". +0x0000ffffbd2b3154 in __GI_ppoll (fds=0xaaaae80ef080, nfds=187650381360708, + timeout=<optimized out>, sigmask=0xffffc31815f0) + at ../sysdeps/unix/sysv/linux/ppoll.c:39 +39 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory. +(gdb) + +fyi, what I tested in Comment #35 was upstream QEMU (@ aceeaa69d2) with a port of the patch in Comment #34 applied. I've attached that patch here. While it did avoid the issue in my testing, I agree with Rafael's Comment #36 that it does not appear to address the root cause (as I understand it), and is therefore unlikely something we'd ship in Ubuntu. + +Could HiSilicon respond to Dann & Rafael's comments #36 and #38? +Is there an upstream acceptable patch that addresses this issue? + +=》Could HiSilicon respond to Dann & Rafael's comments #36 and #38? +=》Is there an upstream acceptable patch that addresses this issue? + +No upstream patchset, I Only provide a private solution and do not know this root cause. + +PPA created with temporarily workaround in comment #34. + +https://launchpad.net/~ikepanhc/+archive/ubuntu/lp1805256 + +This PPA can solve temporarily but is not acceptable for offical release. + +Take several CPUs offline and re-test. Even only 32 threads left, I still can reproduce this issue easily. + +ubuntu@kreiken:~$ lscpu | grep list;for i in `seq 1 10`;do echo ;rm -f out.img;timeout 30 qemu-img convert -f qcow2 -O qcow2 ./bionic-server-cloudimg-arm64.img out.img -p; done +On-line CPU(s) list: 0-31 +Off-line CPU(s) list: 32-127 + + (100.00/100%) + + (43.20/100%) + (0.00/100%) + (1.00/100%) + + +Hi, Ike. + +I think this tricky bug was fixed by Paolo last month. +Please try patch https://git.qemu.org/?p=qemu.git;a=commitdiff;h=5710a3e09f9b85801e5ce70797a4a511e5fc9e2c. + +Thanks. I will test it. + +The test deb has been pushed to https://launchpad.net/~ikepanhc/+archive/ubuntu/lp1805256 + +40 run with patch mentioned in #43 and all passed. + +Thanks. + + +Hello Ike, + +Please, let me know if you want me to go after the needed SRUs for this fix or if you will. + +I'll wait for the final feedback from tests with your PPA. + +Cheers! + + +fyi, I backported that fix also to focal/groovy and eoan, and with those builds. On my test systems the hang reliable occurs within 20 iterations. After the fix, they have survived > 500 iterations thus far. I'll leave running overnight just to be sure. + +Isn't this fixed by commit 5710a3e09f9? + +commit 5710a3e09f9b85801e5ce70797a4a511e5fc9e2c +Author: Paolo Bonzini <email address hidden> +Date: Tue Apr 7 10:07:46 2020 -0400 + + async: use explicit memory barriers + + When using C11 atomics, non-seqcst reads and writes do not participate + in the total order of seqcst operations. In util/async.c and util/aio-posix.c, + in particular, the pattern that we use + + write ctx->notify_me write bh->scheduled + read bh->scheduled read ctx->notify_me + if !bh->scheduled, sleep if ctx->notify_me, notify + + needs to use seqcst operations for both the write and the read. In + general this is something that we do not want, because there can be + many sources that are polled in addition to bottom halves. The + alternative is to place a seqcst memory barrier between the write + and the read. This also comes with a disadvantage, in that the + memory barrier is implicit on strongly-ordered architectures and + it wastes a few dozen clock cycles. + + Fortunately, ctx->notify_me is never written concurrently by two + threads, so we can assert that and relax the writes to ctx->notify_me. + The resulting solution works and performs well on both aarch64 and x86. + + Note that the atomic_set/atomic_read combination is not an atomic + read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED; + on x86, ATOMIC_RELAXED compiles to a locked operation. + +On Wed, May 6, 2020 at 1:20 PM Philippe Mathieu-Daudé +<email address hidden> wrote: +> +> Isn't this fixed by commit 5710a3e09f9? + +See comment #43. The discussions hence are about testing/integration +of that fix. + + -dann + + +FYIO, from now on all the "merge" work will be done in the merge requests being linked to this BUG (at the top). @paelzer will be verifying those. + +Tested debs in ppa:rafaeldtinoco/lp1805256 for focal and eoan and 1000 qemu-img convert passed. + +Ike's backport in https://launchpad.net/~ikepanhc/+archive/ubuntu/lp1805256 tests well for me on Cavium Sabre. One minor note is that the function in_aio_context_home_thread() is being called in aio-win32.c, but that function didn't exist in 2.11. We probably want to change that to aio_context_in_iothread(). It was renamed in https://git.qemu.org/?p=qemu.git;a=commitdiff;h=d2b63ba8dd20c1091b3f1033e6a95ef95b18149d + +FYI: sponsored into groovy + +This bug was fixed in the package qemu - 1:4.2-3ubuntu8 + +--------------- +qemu (1:4.2-3ubuntu8) groovy; urgency=medium + + * d/p/ubuntu/lp-1805256*: Fixes for QEMU on aarch64 ARM hosts + - async: use explicit memory barriers (LP: #1805256) + - aio-wait: delegate polling of main AioContext if BQL not held + + -- Rafael David Tinoco <email address hidden> Wed, 27 May 2020 21:47:21 +0000 + +Migrated right now, sponsoring the related SRU portions into B/E/F ... for consideration by the SRU Team. + +Hello dann, or anyone else affected, + +Accepted qemu into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:4.2-3ubuntu6.2 in a few hours, and then in the -proposed repository. + +Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. + +If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed. + +Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! + +N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. + +Hello dann, or anyone else affected, + +Accepted qemu into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:4.0+dfsg-0ubuntu9.7 in a few hours, and then in the -proposed repository. + +Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. + +If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed. + +Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! + +N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. + +Hello dann, or anyone else affected, + +Accepted qemu into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.11+dfsg-1ubuntu7.27 in a few hours, and then in the -proposed repository. + +Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. + +If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed. + +Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! + +N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. + +All autopkgtests for the newly accepted qemu (1:4.0+dfsg-0ubuntu9.7) for eoan have finished running. +The following regressions have been reported in tests triggered by the package: + +edk2/0~20190606.20d2e5a1-2ubuntu1.1 (amd64, armhf) + + +Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1]. + +https://people.canonical.com/~ubuntu-archive/proposed-migration/eoan/update_excuses.html#qemu + +[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions + +Thank you! + + +100 run on bionic/eoan/focal -proposed `qemu-img convert` all successful. No hang occurs. Thanks a lot. + +All autopkgtests for the newly accepted qemu (1:4.2-3ubuntu6.2) for focal have finished running. +The following regressions have been reported in tests triggered by the package: + +systemd/245.4-4ubuntu3.1 (arm64) + + +Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1]. + +https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#qemu + +[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions + +Thank you! + + +I've looked and retried the tests - all green now. +Let us give it a few extra days in proposed as planned, but other than that it looks ok to be released. + +We had the 14 (instead f 7) days in -proposed for some extended maturing. Nothing came up in regard to this and all validations were good. +Dropping block-proposed to be released once the SRU Team gets to it. + +This bug was fixed in the package qemu - 1:4.2-3ubuntu6.2 + +--------------- +qemu (1:4.2-3ubuntu6.2) focal; urgency=medium + + * d/p/ubuntu/lp-1805256*: Fixes for QEMU on aarch64 ARM hosts + - async: use explicit memory barriers (LP: #1805256) + - aio-wait: delegate polling of main AioContext if BQL not held + + -- Rafael David Tinoco <email address hidden> Wed, 27 May 2020 21:19:20 +0000 + +The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions. + +This bug was fixed in the package qemu - 1:4.0+dfsg-0ubuntu9.7 + +--------------- +qemu (1:4.0+dfsg-0ubuntu9.7) eoan; urgency=medium + + * d/p/ubuntu/lp-1805256*: Fixes for QEMU on aarch64 ARM hosts + - async: use explicit memory barriers (LP: #1805256) + - aio-wait: delegate polling of main AioContext if BQL not held + + -- Rafael David Tinoco <email address hidden> Wed, 27 May 2020 20:07:57 +0000 + +This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu7.27 + +--------------- +qemu (1:2.11+dfsg-1ubuntu7.27) bionic; urgency=medium + + * d/p/ubuntu/lp-1805256*: Fixes for QEMU on aarch64 ARM hosts + - aio: rename aio_context_in_iothread() to in_aio_context_home_thread() + - aio: Do aio_notify_accept only during blocking aio_poll + - aio-posix: Assert that aio_poll() is always called in home thread + - async: use explicit memory barriers (LP: #1805256) + - aio-wait: delegate polling of main AioContext if BQL not held + - aio-posix: Don't count ctx->notifier as progress when polling + + -- Rafael David Tinoco <email address hidden> Tue, 26 May 2020 17:39:21 +0000 + +This will re-open again for Bionic due to bug 1885419 forcing a revert of the former backports. +After a deeper evaluation if the assert is wrong in the backport or just flagging a problem formerly already existing in Bionic this will be re-fixed. + +Re-open for bionic due to regression found + +Started working on this again... + +Worked being done for the Bionic SRU: + +BUG: https://bugs.launchpad.net/qemu/+bug/1805256 +(fix for the bionic regression demonstrated at LP: #1885419) +PPA: https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1805256-bionic +MERGE: https://tinyurl.com/y8sucs6x + +Merge proposal currently going under review, tests and discussions. + +I ran the new PPA build (1:2.11+dfsg-1ubuntu7.29~ppa01) on both a ThunderX2 system and a Hi1620 system overnight, and both survived (6574 & 12919 iterations, respectively). + +Thanks @dannf! I spoke to Christian and him and I agreed to confine this change into ARM builds only (as SRU for Bionic). Preparing it... + +Status from old attempts to solve same nature issues: + +---- + +Older (2018) merge request from @raharper: + +https://github.com/koverstreet/bcache-tools/pull/1 + +addressing the fact that kernel uevents would not always emit +CACHED_UUID parameters, making udev to delete (whenever that happens) +/dev/bcache/{by-uuid,by-label} symlinks. + +This last MR pointed to previous related bugs: + +https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890446 +https://bugs.launchpad.net/curtin/+bug/1728742 + +And to an upstream kernel patch: + +https://lore.kernel.org/patchwork/patch/921298/ + +to + +https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729145 + +that wasn't accepted upstream. + +Even not being accepted upstream, the SRU was attempted: + +LP: #1729145 + +https://lists.ubuntu.com/archives/kernel-team/2017-December/088680.html +https://lists.ubuntu.com/archives/kernel-team/2017-December/088679.html + +Both were NACKED. + +Attempted again: + +https://lists.ubuntu.com/archives/kernel-team/2017-December/088682.html +https://lists.ubuntu.com/archives/kernel-team/2017-December/088683.html + +NACKED again. + +And a v2 was sent: + +https://lists.ubuntu.com/archives/kernel-team/2017-December/088751.html +https://lists.ubuntu.com/archives/kernel-team/2017-December/088750.html +https://lists.ubuntu.com/archives/kernel-team/2017-December/088749.html + +and acked in January 2018 by Coling: + +https://lists.ubuntu.com/archives/kernel-team/2018-January/089492.html + +but not upstreamed. + +BIONIC contains the fix: + +commit ed9333e1b583 +Author: Ryan Harper <email address hidden> +Date: Mon Dec 11 12:12:01 2017 + + UBUNTU: SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent + + BugLink: http://bugs.launchpad.net/bugs/1729145 + + - decouple emitting a cached_dev CHANGE uevent which includes dev.uuid + and dev.label from bch_cached_dev_run() which only happens when a + bcacheX device is bound to the actual backing block device (bcache0 -> vdb) + + - update bch_cached_dev_run() to invoke bch_cached_dev_emit_change() as + needed; no functional code path changes here + + - Modify register_bcache to detect a re-registering of a bcache + cached_dev, and in that case call bcache_cached_dev_emit_change() to + + Signed-off-by: Ryan Harper <email address hidden> + Signed-off-by: Joseph Salisbury <email address hidden> + Acked-by: Colin Ian King <email address hidden> + Acked-by: Stefan Bader <email address hidden> + Signed-off-by: Khalid Elmously <email address hidden> + [ saf: fix incorrect indentation ] + Signed-off-by: Seth Forshee <email address hidden> + +FOCAL contains the fix: + +commit 67553dcd7905 +Author: Ryan Harper <email address hidden> +Date: Mon Dec 11 12:12:01 2017 + + UBUNTU: SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent + +GROOVY contains the fix: + +commit 67553dcd7905 +Author: Ryan Harper <email address hidden> +Date: Mon Dec 11 12:12:01 2017 + + UBUNTU: SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent + +---- + +So, the kernel patch wasn't accepted, nor bcache-tools patch by +@raharper, the bcache-export-cached. + +---- + +New Upstream summary from @raharper: + +https://github.com/systemd/systemd/pull/16317#issuecomment-655647313 + +in the upstream merge request made by @rbalint. + + diff --git a/results/classifier/118/risc-v/1820686 b/results/classifier/118/risc-v/1820686 new file mode 100644 index 000000000..a264e8c01 --- /dev/null +++ b/results/classifier/118/risc-v/1820686 @@ -0,0 +1,42 @@ +risc-v: 0.992 +device: 0.824 +network: 0.809 +socket: 0.724 +virtual: 0.719 +architecture: 0.717 +semantic: 0.691 +vnc: 0.683 +mistranslation: 0.678 +kernel: 0.658 +performance: 0.598 +graphic: 0.598 +x86: 0.595 +i386: 0.580 +debug: 0.553 +ppc: 0.532 +boot: 0.498 +VMM: 0.495 +peripherals: 0.494 +arm: 0.490 +hypervisor: 0.467 +register: 0.457 +files: 0.428 +TCG: 0.378 +PID: 0.366 +KVM: 0.321 +assembly: 0.305 +user-level: 0.296 +permissions: 0.246 + +risc-v: 'c.unimp' instruction decoded as 'c.addi4spn fp, 0' + +QEMU 3.1 incorrectly decodes the "c.unimp" instruction (opcode 0x0000) as an "addi4spn fp, 0" when either of the two following bytes are non-zero. This is because the ctx->opcode value used when decoding the instruction is actually filled with a 32-bit load (to handle normal uncompressed instructions) but when a compressed instruction is found only the low 16 bits are valid. Other reserved/illegal bit patterns with the addi4spn opcode are also incorrectly decoded. + +I believe that the switch to decodetree on master happened to fix this issue, but hopefully it is helpful to have this recorded somewhere. I've included a simple one line patch if anyone wants to backport this. + + + +Thanks. If you spin a full patch (ie, "git commit -s" and then "git show") I can drop it on riscv-qemu-3.1, our backports branch. Otherwise hopefully we got the bug via the decodetree conversion. + +Since this bug isn't present in the decodetree version of the riscv decoder, we might as well just close this as fix-released; we won't be doing more point-releases of QEMU versions as old as 3.1. + diff --git a/results/classifier/118/risc-v/1835827 b/results/classifier/118/risc-v/1835827 new file mode 100644 index 000000000..3f04c2576 --- /dev/null +++ b/results/classifier/118/risc-v/1835827 @@ -0,0 +1,45 @@ +risc-v: 0.899 +device: 0.833 +graphic: 0.829 +architecture: 0.758 +kernel: 0.724 +network: 0.711 +socket: 0.591 +vnc: 0.580 +mistranslation: 0.547 +ppc: 0.547 +register: 0.547 +semantic: 0.531 +files: 0.459 +arm: 0.426 +permissions: 0.388 +VMM: 0.344 +performance: 0.340 +boot: 0.328 +TCG: 0.327 +PID: 0.306 +debug: 0.237 +virtual: 0.217 +hypervisor: 0.186 +user-level: 0.170 +KVM: 0.166 +i386: 0.165 +peripherals: 0.154 +x86: 0.095 +assembly: 0.093 + +HTIF symbols no longer recognized by RISC-V spike board + +Tested commit: f34edbc760b0f689deddd175fc08732ecb46665f + +I belive this was introduced in 0ac24d56c5e7d32423ea78ac58a06b444d1df04d when the spike's load_kernel() was moved to riscv_load_kernel() which no longer included htif_symbol_callback(). + +I think you are right. Would you like to write a patch to fix the issue? + +No, patch sign-off requires a legal name. + +Ok, I'll add it to my to do list then + +Patch has been included in QEMU v4.2: +https://git.qemu.org/?p=qemu.git;a=commitdiff;h=6478dd745dca49d632 + diff --git a/results/classifier/118/risc-v/1850378 b/results/classifier/118/risc-v/1850378 new file mode 100644 index 000000000..529ae75b5 --- /dev/null +++ b/results/classifier/118/risc-v/1850378 @@ -0,0 +1,70 @@ +risc-v: 0.832 +user-level: 0.800 +kernel: 0.799 +graphic: 0.759 +device: 0.751 +architecture: 0.732 +performance: 0.713 +virtual: 0.662 +semantic: 0.653 +debug: 0.612 +mistranslation: 0.556 +peripherals: 0.551 +network: 0.513 +PID: 0.488 +assembly: 0.478 +hypervisor: 0.466 +socket: 0.463 +ppc: 0.462 +permissions: 0.455 +register: 0.402 +x86: 0.401 +i386: 0.393 +boot: 0.385 +VMM: 0.374 +vnc: 0.343 +TCG: 0.323 +KVM: 0.268 +arm: 0.266 +files: 0.191 + +RISC-V unreliable IPIs + +I am working on a project with custom inter processor interrupts (IPIs) on the RISC-V virt machine. +After upgrading from version 3.1.0 to 4.1.0 which fixes a related issue (https://github.com/riscv/riscv-qemu/issues/132) I am able to use the CPU hotplug feature. + +However, if I try to use IPIs for communication between two cores, the wfi instruction behaves strangely. Either it does not return, or it returns on timer interrupts, even though they are disabled. The code, I use on one core to wait for an interrupt is the following. + + csr_clear(sie, SIE_SEIE | SIE_STIE); + do { + wait_for_interrupt(); + sipval = csr_read(sip); + sieval = csr_read(sie); + scauseval = csr_read(scause) & 0xFF; + /* only break if wfi returns for an software interrupt */ + } while ((sipval & sieval) == 0 && scauseval != 1); + csr_set(sie, SIE_SEIE | SIE_STIE); + +Since the resulting sequence does not seem to be deterministic, my guess is, that it has something to do with the communication of qemu's threads for the different cores. + +Can you post a whole program that reproduces this? freedom-e-sdk <https://github.com/sifive/freedom-e-sdk> will run bare-metal code on QEMU if you don't want to post the rest of the surrounding infrastructure. + +I created a minimal example from my setup. I'm running a kernel 4.19.57 with a custom firmware based on bbl (https://github.com/riscv/riscv-pk). +An ioctl device from a kernel module is used to execute the code above in kernel space. +In the example, the userspace application proceeds after a couple of seconds without receiving the custom IPI. + +The QEMU project is currently considering to move its bug tracking to +another system. For this we need to know which bugs are still valid +and which could be closed already. Thus we are setting older bugs to +"Incomplete" now. + +If you still think this bug report here is valid, then please switch +the state back to "New" within the next 60 days, otherwise this report +will be marked as "Expired". Or please mark it as "Fix Released" if +the problem has been solved with a newer version of QEMU already. + +Thank you and sorry for the inconvenience. + + +[Expired for QEMU because there has been no activity for 60 days.] + diff --git a/results/classifier/118/risc-v/1910826 b/results/classifier/118/risc-v/1910826 new file mode 100644 index 000000000..a715d76d8 --- /dev/null +++ b/results/classifier/118/risc-v/1910826 @@ -0,0 +1,141 @@ +risc-v: 0.832 +device: 0.821 +hypervisor: 0.797 +user-level: 0.797 +virtual: 0.797 +TCG: 0.793 +peripherals: 0.787 +graphic: 0.780 +register: 0.777 +KVM: 0.763 +performance: 0.761 +ppc: 0.752 +mistranslation: 0.743 +permissions: 0.736 +x86: 0.718 +VMM: 0.709 +vnc: 0.708 +i386: 0.704 +architecture: 0.696 +semantic: 0.692 +arm: 0.668 +debug: 0.665 +boot: 0.654 +assembly: 0.632 +files: 0.632 +network: 0.630 +PID: 0.572 +kernel: 0.559 +socket: 0.519 + +[OSS-Fuzz] Issue 29224 rtl8139: Stack-overflow in rtlNUMBER_transmit_one + +=== Reproducer === +cat << EOF | ../build/qemu-system-i386 -machine q35 \ +-nodefaults -device rtl8139,netdev=net0 \ +-netdev user,id=net0 -display none -qtest stdio +outl 0xcf8 0x80000804 +outb 0xcfc 0x26 +outl 0xcf8 0x80000817 +outb 0xcfc 0xff +write 0x1 0x1 0x42 +write 0x5 0x1 0x42 +write 0x9 0x1 0x42 +write 0xd 0x1 0x42 +write 0xff000044 0x4 0x11 +write 0xff000037 0x1 0x1c +writel 0xff000030 0xff000000 +write 0xff000040 0x4 0x100006 +write 0xff000010 0x4 0x01020 +EOF + +=== Stack Trace === +==2819215==ERROR: AddressSanitizer: stack-overflow on address 0x7ffd2c714040 (pc 0x5639b3a933d9 bp 0x7ffd2c716210 sp 0x7ffd2c714040 T0) +#0 rtl8139_transmit_one /src/qemu/hw/net/rtl8139.c:1815 +#1 rtl8139_transmit /src/qemu/hw/net/rtl8139.c:2388:9 +#2 rtl8139_TxStatus_write /src/qemu/hw/net/rtl8139.c:2442:5 +#3 rtl8139_io_writel /src/qemu/hw/net/rtl8139.c:2865:13 +#4 rtl8139_ioport_write /src/qemu/hw/net/rtl8139.c:3290:9 +#5 memory_region_write_accessor /src/qemu/softmmu/memory.c:491:5 +#6 access_with_adjusted_size /src/qemu/softmmu/memory.c:552:18 +#7 memory_region_dispatch_write /src/qemu/softmmu/memory.c:0:13 +#8 flatview_write_continue /src/qemu/softmmu/physmem.c:2759:23 +#9 flatview_write /src/qemu/softmmu/physmem.c:2799:14 +#10 address_space_write /src/qemu/softmmu/physmem.c:2891:18 +#11 address_space_rw /src/qemu/softmmu/physmem.c:2901:16 +#12 dma_memory_rw_relaxed /src/qemu/include/sysemu/dma.h:88:12 +#13 dma_memory_rw /src/qemu/include/sysemu/dma.h:127:12 +#14 pci_dma_rw /src/qemu/include/hw/pci/pci.h:801:12 +#15 pci_dma_write /src/qemu/include/hw/pci/pci.h:837:12 +#16 rtl8139_write_buffer /src/qemu/hw/net/rtl8139.c:778:5 +#17 rtl8139_do_receive /src/qemu/hw/net/rtl8139.c:1172:9 +#18 rtl8139_transfer_frame /src/qemu/hw/net/rtl8139.c:1798:9 +#19 rtl8139_transmit_one /src/qemu/hw/net/rtl8139.c:1845:5 +#20 rtl8139_transmit /src/qemu/hw/net/rtl8139.c:2388:9 +#21 rtl8139_TxStatus_write /src/qemu/hw/net/rtl8139.c:2442:5 +#22 rtl8139_io_writel /src/qemu/hw/net/rtl8139.c:2865:13 +#23 rtl8139_ioport_write /src/qemu/hw/net/rtl8139.c:3290:9 +#24 memory_region_write_accessor /src/qemu/softmmu/memory.c:491:5 +#25 access_with_adjusted_size /src/qemu/softmmu/memory.c:552:18 +#26 memory_region_dispatch_write /src/qemu/softmmu/memory.c:0:13 +#27 flatview_write_continue /src/qemu/softmmu/physmem.c:2759:23 +#28 flatview_write /src/qemu/softmmu/physmem.c:2799:14 +#29 address_space_write /src/qemu/softmmu/physmem.c:2891:18 +#30 address_space_rw /src/qemu/softmmu/physmem.c:2901:16 +#31 dma_memory_rw_relaxed /src/qemu/include/sysemu/dma.h:88:12 +#32 dma_memory_rw /src/qemu/include/sysemu/dma.h:127:12 +#33 pci_dma_rw /src/qemu/include/hw/pci/pci.h:801:12 +#34 pci_dma_write /src/qemu/include/hw/pci/pci.h:837:12 +#35 rtl8139_write_buffer /src/qemu/hw/net/rtl8139.c:778:5 +#36 rtl8139_do_receive /src/qemu/hw/net/rtl8139.c:1172:9 +#37 rtl8139_transfer_frame /src/qemu/hw/net/rtl8139.c:1798:9 +Repeat until we run out of stack + +https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29224 + +A more concise version and corresponding notes. Might help :) + +-- [ Reproducer + +cat << EOF | ../build/qemu-system-i386 -machine q35 \ +-nodefaults -device rtl8139,netdev=net0 \ +-netdev user,id=net0 -display none -qtest stdio +outl 0xcf8 0x80000804 +outb 0xcfc 0x06 +outl 0xcf8 0x80000817 +outb 0xcfc 0xff +write 0xff000037 0x1 0x0c +writel 0xff000030 0xff000010 +write 0xff000040 0x4 0x100006 +write 0xff000044 0x4 0x01 +write 0xff000010 0x4 0x01 +EOF + +-- [ Notes + +/* Make the MMIO region start from 0xff000000 */ +outl 0xcf8 0x80000817 +outb 0xcfc 0xff + +/*Command Register: enable receiver and transmitter*/ +write 0xff000037 0x1 0x0c + +/* set Receive (Rx) Buffer Start Address at 0xff000010 */ +/* Note: 0xff000010 - 0xff000000 = 0x10 is the offset of TSD0*/ +writel 0xff000030 0xff000010 + +/* TXRR, Tx Retry Count = 1 */ +/* set transmit mode into the loopback */ +write 0xff000040 0x4 0x100006 + +/* Receive Configuration Register: Accept All Packets */ +write 0xff000044 0x4 0x01 + +/* TSD0: set Descriptor Size to 1 and trigger a tranfer*/ +write 0xff000010 0x4 0x01 + + +OSS-Fuzz says this issue has been fixed. + +https://gitlab.com/qemu-project/qemu/-/commit/5311fb805a4403bba + diff --git a/results/classifier/118/risc-v/1913915 b/results/classifier/118/risc-v/1913915 new file mode 100644 index 000000000..a90484379 --- /dev/null +++ b/results/classifier/118/risc-v/1913915 @@ -0,0 +1,81 @@ +risc-v: 0.817 +x86: 0.785 +KVM: 0.780 +TCG: 0.776 +peripherals: 0.773 +register: 0.769 +VMM: 0.759 +graphic: 0.757 +vnc: 0.731 +arm: 0.722 +device: 0.712 +permissions: 0.709 +performance: 0.707 +semantic: 0.706 +virtual: 0.704 +hypervisor: 0.700 +architecture: 0.690 +i386: 0.689 +ppc: 0.687 +assembly: 0.686 +debug: 0.685 +PID: 0.652 +kernel: 0.640 +boot: 0.632 +files: 0.619 +socket: 0.586 +network: 0.526 +mistranslation: 0.491 +user-level: 0.480 + +aarc64-virt: Null-ptr dereference through virtio_write_config + +Reproducer: +cat << EOF | ./qemu-system-aarch64 \ +-machine virt,accel=qtest -qtest stdio +writel 0x8000f00 0x81818191 +write 0x4010008004 0x1 0x06 +EOF + +Stacktrace: +../hw/intc/arm_gic.c:1498:13: runtime error: index 401 out of bounds for type 'uint8_t [16][8]' +SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../hw/intc/arm_gic.c:1498:13 in +OK +[S +0.048579] OK +[R +0.048593] write 0x4010008004 0x1 0x06 +../softmmu/memory.c:834:35: runtime error: member access within null pointer of type 'MemoryRegionIoeventfd' (aka 'struct MemoryRegionIoeventfd') +SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../softmmu/memory.c:834:35 in +AddressSanitizer:DEADLYSIGNAL +================================================================= +==637204==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55b2560417c1 bp 0x7ffefc928270 sp 0x7ffefc928020 T0) +==637204==The signal is caused by a READ memory access. +==637204==Hint: address points to the zero page. + #0 0x55b2560417c1 in addrrange_shift /home/alxndr/Development/qemu/build/../softmmu/memory.c:80:44 + #1 0x55b2560417c1 in address_space_update_ioeventfds /home/alxndr/Development/qemu/build/../softmmu/memory.c:834:19 + #2 0x55b2560408c7 in memory_region_transaction_commit /home/alxndr/Development/qemu/build/../softmmu/memory.c:1100:17 + #3 0x55b25481e065 in pci_update_mappings /home/alxndr/Development/qemu/build/../hw/pci/pci.c:1363:13 + #4 0x55b25481cec7 in pci_default_write_config /home/alxndr/Development/qemu/build/../hw/pci/pci.c:1423:9 + #5 0x55b254806227 in virtio_write_config /home/alxndr/Development/qemu/build/../hw/virtio/virtio-pci.c:608:5 + #6 0x55b2551f6e65 in pci_host_config_write_common /home/alxndr/Development/qemu/build/../hw/pci/pci_host.c:83:5 + #7 0x55b2560481fe in memory_region_write_accessor /home/alxndr/Development/qemu/build/../softmmu/memory.c:491:5 + #8 0x55b256047bfb in access_with_adjusted_size /home/alxndr/Development/qemu/build/../softmmu/memory.c:552:18 + #9 0x55b256047467 in memory_region_dispatch_write /home/alxndr/Development/qemu/build/../softmmu/memory.c + #10 0x55b2563d7ffb in flatview_write_continue /home/alxndr/Development/qemu/build/../softmmu/physmem.c:2759:23 + #11 0x55b2563cd71b in flatview_write /home/alxndr/Development/qemu/build/../softmmu/physmem.c:2799:14 + #12 0x55b2563cd71b in address_space_write /home/alxndr/Development/qemu/build/../softmmu/physmem.c:2891:18 + #13 0x55b256039d35 in qtest_process_command /home/alxndr/Development/qemu/build/../softmmu/qtest.c:654:9 + #14 0x55b256032b97 in qtest_process_inbuf /home/alxndr/Development/qemu/build/../softmmu/qtest.c:797:9 + #15 0x55b256883286 in fd_chr_read /home/alxndr/Development/qemu/build/../chardev/char-fd.c:68:9 + #16 0x7f8d8faf5aae in g_main_context_dispatch (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x51aae) + #17 0x55b256ede363 in glib_pollfds_poll /home/alxndr/Development/qemu/build/../util/main-loop.c:232:9 + #18 0x55b256ede363 in os_host_main_loop_wait /home/alxndr/Development/qemu/build/../util/main-loop.c:255:5 + #19 0x55b256ede363 in main_loop_wait /home/alxndr/Development/qemu/build/../util/main-loop.c:531:11 + #20 0x55b255f99599 in qemu_main_loop /home/alxndr/Development/qemu/build/../softmmu/runstate.c:721:9 + #21 0x55b2542261fd in main /home/alxndr/Development/qemu/build/../softmmu/main.c:50:5 + #22 0x7f8d8f59acc9 in __libc_start_main csu/../csu/libc-start.c:308:16 + #23 0x55b254179bc9 in _start (/home/alxndr/Development/qemu/build/qemu-system-aarch64+0x3350bc9) + +AddressSanitizer can not provide additional info. +SUMMARY: AddressSanitizer: SEGV /home/alxndr/Development/qemu/build/../softmmu/memory.c:80:44 in addrrange_shift +==637204==ABORTING + diff --git a/results/classifier/118/risc-v/2041 b/results/classifier/118/risc-v/2041 new file mode 100644 index 000000000..980b4af14 --- /dev/null +++ b/results/classifier/118/risc-v/2041 @@ -0,0 +1,57 @@ +risc-v: 0.877 +KVM: 0.864 +architecture: 0.743 +graphic: 0.739 +network: 0.704 +PID: 0.675 +device: 0.653 +performance: 0.626 +mistranslation: 0.530 +virtual: 0.511 +socket: 0.474 +kernel: 0.456 +permissions: 0.426 +boot: 0.423 +user-level: 0.416 +VMM: 0.406 +TCG: 0.393 +debug: 0.353 +arm: 0.352 +ppc: 0.342 +semantic: 0.342 +files: 0.316 +register: 0.313 +vnc: 0.288 +peripherals: 0.286 +hypervisor: 0.253 +assembly: 0.169 +x86: 0.142 +i386: 0.125 + +RISC-V KVM build error with Alpine Linux +Description of problem: +Native build of qemu fails on alpine linux riscv64. +Steps to reproduce: +1. install alpine on riscv or set up a container with qemu-riscv64 +2. build qemu 8.1.3 from source +3. +Additional information: +``` +kvm.c:(.text+0xc50): undefined reference to `strerrorname_np' +/usr/lib/gcc/riscv64-alpine-linux-musl/13.2.1/../../../../riscv64-alpine-linux-musl/bin/ld: libqemu-riscv64-softmmu.fa.p/target_riscv_kvm.c.o: in function `.L0 ': +kvm.c:(.text+0xcda): undefined reference to `strerrorname_np' +/usr/lib/gcc/riscv64-alpine-linux-musl/13.2.1/../../../../riscv64-alpine-linux-musl/bin/ld: libqemu-riscv64-softmmu.fa.p/target_riscv_kvm.c.o: in function `.L111': +kvm.c:(.text+0xd02): undefined reference to `strerrorname_np' +``` + +The `strerrorname_np` is a GNU specific non-portable function (that what _np stands for). This is the only place where it is use in the entire qemu codebase: +``` +$ rg strerrorname_np +target/riscv/kvm/kvm-cpu.c +837: strerrorname_np(errno)); +899: strerrorname_np(errno)); +909: strerrorname_np(errno)); +932: strerrorname_np(errno)); +``` + +Seems like other places uses `strerror(errno)`. diff --git a/results/classifier/118/risc-v/2137 b/results/classifier/118/risc-v/2137 new file mode 100644 index 000000000..cb8215b84 --- /dev/null +++ b/results/classifier/118/risc-v/2137 @@ -0,0 +1,31 @@ +risc-v: 0.986 +device: 0.775 +architecture: 0.739 +performance: 0.734 +boot: 0.414 +virtual: 0.413 +vnc: 0.333 +network: 0.285 +graphic: 0.227 +semantic: 0.213 +mistranslation: 0.205 +register: 0.163 +ppc: 0.161 +arm: 0.124 +debug: 0.115 +VMM: 0.076 +peripherals: 0.072 +x86: 0.056 +user-level: 0.054 +hypervisor: 0.053 +socket: 0.053 +permissions: 0.045 +files: 0.042 +assembly: 0.037 +kernel: 0.035 +i386: 0.018 +PID: 0.017 +TCG: 0.016 +KVM: 0.005 + +RISC-V Vector Slowdowns diff --git a/results/classifier/118/risc-v/2245 b/results/classifier/118/risc-v/2245 new file mode 100644 index 000000000..95086c475 --- /dev/null +++ b/results/classifier/118/risc-v/2245 @@ -0,0 +1,31 @@ +risc-v: 0.933 +architecture: 0.790 +device: 0.728 +performance: 0.549 +arm: 0.352 +register: 0.327 +boot: 0.323 +vnc: 0.264 +semantic: 0.263 +permissions: 0.242 +graphic: 0.230 +virtual: 0.226 +ppc: 0.220 +debug: 0.218 +network: 0.213 +hypervisor: 0.204 +mistranslation: 0.197 +socket: 0.172 +PID: 0.170 +VMM: 0.149 +peripherals: 0.103 +TCG: 0.083 +x86: 0.082 +files: 0.076 +i386: 0.063 +user-level: 0.055 +assembly: 0.053 +kernel: 0.031 +KVM: 0.006 + +RISC-V Extensions query for QEMU System diff --git a/results/classifier/118/risc-v/2488 b/results/classifier/118/risc-v/2488 new file mode 100644 index 000000000..8c2407deb --- /dev/null +++ b/results/classifier/118/risc-v/2488 @@ -0,0 +1,95 @@ +risc-v: 0.818 +VMM: 0.774 +permissions: 0.773 +device: 0.749 +kernel: 0.742 +ppc: 0.737 +arm: 0.732 +socket: 0.723 +network: 0.718 +peripherals: 0.697 +architecture: 0.692 +performance: 0.686 +debug: 0.679 +semantic: 0.669 +PID: 0.644 +mistranslation: 0.627 +TCG: 0.625 +assembly: 0.609 +graphic: 0.593 +register: 0.581 +files: 0.550 +virtual: 0.524 +hypervisor: 0.522 +KVM: 0.491 +x86: 0.463 +vnc: 0.440 +boot: 0.432 +user-level: 0.402 +i386: 0.278 + +m68k: 68030 (?): fmove.p doesn't work (6888[1|2] emulation isn't implemented??) +Description of problem: +The following code should be executing a move to the fpu and then a move from it and then branching. + +``` + ff813590 f2 10 4f 00 fmove.p (A0),FP6 + ff813594 f2 11 6f 7f fmove.p FP6,(A1) {#0x7f} + ff813598 61 00 fe 52 bsr.w SUB_ff8133ec +``` + +However, hitting the instruction at `0xff813590` causes the `PC` to go off into the weeds and then the emulation gets stuck and never proceeds. + +Before executing the instruction the CPU state looks like this + +``` +(qemu) info registers + +CPU#0 +D0 = ffffffff A0 = ff813584 F0 = c004 cc00000000000000 ( -51) +D1 = 0000ffff A1 = 0000335e F1 = c00d a866000000000000 ( -21555) +D2 = 00000002 A2 = ff8138a2 F2 = 401b 91a2b3c000000000 ( 3.0542e+08) +D3 = 00000003 A3 = ff824008 F3 = 3fb4 ab3c4d0000000000 ( 3.54107e-23) +D4 = 00000004 A4 = ff81dbb6 F4 = 3d12 919a22ab33bc4000 (3.84141e-226) +D5 = 00000000 A5 = 00000400 F5 = 1020 8060708090a0b0c0 ( 0) +D6 = 0000000c A6 = 00003790 F6 = 7fff ffffffffffffffff ( nan) +D7 = 00000000 A7 = 0000316e F7 = 7fff ffffffffffffffff ( nan) +PC = ff813590 SR = 2708 T:0 I:7 SI -N--- +FPSR = 00000000 ---- + FPCR = 0000 X RN + A7(MSP) = 00000000 A7(USP) = 80000000 ->A7(ISP) = 00003796 +VBR = 0x0000338e +SFC = 3 DFC 0 +SSW 00000000 TCR 00000000 URP 00000000 SRP 00000000 +DTTR0/1: 00000000/00000000 ITTR0/1: 00000000/00000000 +MMUSR 00000000, fault at 00000000 +``` + +After single stepping: + +``` +(qemu) info registers + +CPU#0 +D0 = ffffffff A0 = ff813584 F0 = c004 cc00000000000000 ( -51) +D1 = 0000ffff A1 = 0000335e F1 = c00d a866000000000000 ( -21555) +D2 = 00000002 A2 = ff8138a2 F2 = 401b 91a2b3c000000000 ( 3.0542e+08) +D3 = 00000003 A3 = ff824008 F3 = 3fb4 ab3c4d0000000000 ( 3.54107e-23) +D4 = 00000004 A4 = ff81dbb6 F4 = 3d12 919a22ab33bc4000 (3.84141e-226) +D5 = 00000000 A5 = 00000400 F5 = 1020 8060708090a0b0c0 ( 0) +D6 = 0000000c A6 = 00003790 F6 = 7fff ffffffffffffffff ( nan) +D7 = 00000000 A7 = 00003166 F7 = 7fff ffffffffffffffff ( nan) +PC = ff8138a2 SR = 2708 T:0 I:7 SI -N--- +FPSR = 00000000 ---- + FPCR = 0000 X RN + A7(MSP) = 00000000 A7(USP) = 80000000 ->A7(ISP) = 0000316e +VBR = 0x0000338e +SFC = 3 DFC 0 +SSW 00000000 TCR 00000000 URP 00000000 SRP 00000000 +DTTR0/1: 00000000/00000000 ITTR0/1: 00000000/00000000 +MMUSR 00000000, fault at 00000000 +``` + +With this code the `VBR` doesn't point at an actual vector table from what I can tell and it is pointing at some memory that contains `0xff8138a2` so I guess it hits the instruction, the FPU isn't implemented so it tries to do an `F-line exception` instead but the vector table doesn't actually contain a handler and it's trying to execute garbage that causes the lock up. + +Basically, I guess I need to implement the 6888[1|2] for this code to work. diff --git a/results/classifier/118/risc-v/2627 b/results/classifier/118/risc-v/2627 new file mode 100644 index 000000000..c862dce29 --- /dev/null +++ b/results/classifier/118/risc-v/2627 @@ -0,0 +1,31 @@ +risc-v: 0.935 +device: 0.625 +graphic: 0.498 +architecture: 0.432 +vnc: 0.345 +ppc: 0.338 +debug: 0.328 +mistranslation: 0.308 +arm: 0.283 +semantic: 0.222 +performance: 0.212 +boot: 0.182 +socket: 0.139 +register: 0.138 +virtual: 0.078 +kernel: 0.065 +permissions: 0.059 +network: 0.051 +peripherals: 0.050 +hypervisor: 0.050 +PID: 0.049 +VMM: 0.032 +files: 0.029 +i386: 0.018 +TCG: 0.015 +user-level: 0.012 +assembly: 0.010 +x86: 0.008 +KVM: 0.003 + +Possible incorrect exception order in RISC-V diff --git a/results/classifier/118/risc-v/2717 b/results/classifier/118/risc-v/2717 new file mode 100644 index 000000000..59d81b305 --- /dev/null +++ b/results/classifier/118/risc-v/2717 @@ -0,0 +1,42 @@ +risc-v: 0.983 +graphic: 0.803 +device: 0.744 +semantic: 0.725 +files: 0.666 +mistranslation: 0.625 +network: 0.522 +socket: 0.514 +vnc: 0.495 +register: 0.453 +arm: 0.334 +debug: 0.262 +VMM: 0.242 +TCG: 0.241 +boot: 0.234 +permissions: 0.213 +ppc: 0.196 +PID: 0.187 +kernel: 0.183 +user-level: 0.179 +performance: 0.166 +architecture: 0.165 +i386: 0.148 +KVM: 0.133 +hypervisor: 0.127 +x86: 0.123 +virtual: 0.115 +peripherals: 0.106 +assembly: 0.040 + +semihosting link to risc-v details in document is changed +Description of problem: + +Steps to reproduce: +1. Open https://gitlab.com/qemu-project/qemu/-/blob/master/docs/about/emulation.rst +2. Goto Supported Targets section +3. Click RISC-V link in the table +4. Got 404 + +New url looks like https://github.com/riscv-non-isa/riscv-semihosting/blob/main/riscv-semihosting.adoc +Additional information: + diff --git a/results/classifier/118/risc-v/2774 b/results/classifier/118/risc-v/2774 new file mode 100644 index 000000000..3719e5eaa --- /dev/null +++ b/results/classifier/118/risc-v/2774 @@ -0,0 +1,33 @@ +risc-v: 0.851 +graphic: 0.440 +boot: 0.348 +device: 0.295 +semantic: 0.286 +vnc: 0.223 +VMM: 0.113 +mistranslation: 0.100 +ppc: 0.092 +debug: 0.074 +x86: 0.072 +register: 0.055 +i386: 0.054 +performance: 0.050 +arm: 0.046 +TCG: 0.043 +socket: 0.032 +architecture: 0.029 +files: 0.026 +permissions: 0.020 +network: 0.020 +PID: 0.016 +kernel: 0.012 +user-level: 0.005 +assembly: 0.003 +peripherals: 0.002 +KVM: 0.002 +virtual: 0.002 +hypervisor: 0.001 + +Consider adding an `aliases` node to RISC-V DTB that includes `serial0` alias +Additional information: +Example of an [aliases section for physical SoC](https://github.com/torvalds/linux/blob/b62cef9a5c673f1b8083159f5dc03c1c5daced2f/arch/riscv/boot/dts/sophgo/cv1800b-milkv-duo.dts#L14-L20). diff --git a/results/classifier/118/risc-v/2778 b/results/classifier/118/risc-v/2778 new file mode 100644 index 000000000..3211c9a3d --- /dev/null +++ b/results/classifier/118/risc-v/2778 @@ -0,0 +1,129 @@ +risc-v: 0.830 +graphic: 0.765 +mistranslation: 0.758 +device: 0.738 +permissions: 0.738 +semantic: 0.729 +architecture: 0.715 +virtual: 0.698 +user-level: 0.680 +PID: 0.678 +ppc: 0.675 +register: 0.670 +debug: 0.670 +assembly: 0.665 +arm: 0.664 +hypervisor: 0.644 +kernel: 0.625 +peripherals: 0.612 +performance: 0.588 +boot: 0.582 +KVM: 0.581 +TCG: 0.577 +network: 0.564 +VMM: 0.554 +socket: 0.543 +x86: 0.542 +vnc: 0.535 +files: 0.528 +i386: 0.400 + +Null Dereference in ahci-hd device +Description of problem: +Issue was found by fuzzing. With some qtest commands we can crash qemu-system-x86_64 because of Null dereference. +Steps to reproduce: +Command: + +``` +cat << EOF | ./qemu-system-x86_64 -display none -machine accel=qtest -m 512M -machine q35 -nodefaults -drive file=null-co://,if=none,format=raw,id=disk0 -device ide-hd,drive=disk0 -qtest stdio +outl 0xcf8 0x8000fa24 +outl 0xcfc 0xe0000000 +outl 0xcf8 0x8000fa04 +outw 0xcfc 0x06 +write 0xe00003b8 0x1 0x01 +write 0x0 0x1 0x27 +write 0x1 0x1 0x80 +write 0x2 0x1 0x20 +write 0x7 0x1 0x01 +write 0xe0000398 0x1 0x01 +write 0xe0000398 0x1 0x00 +write 0xe0000398 0x1 0x01 +EOF +``` + +Results in + +``` +[I 0.000001] OPENED +[R +0.082978] outl 0xcf8 0x8000fa24 +[S +0.083040] OK +OK +[R +0.083070] outl 0xcfc 0xe0000000 +[S +0.083115] OK +OK +[R +0.083132] outl 0xcf8 0x8000fa04 +[S +0.083152] OK +OK +[R +0.083180] outw 0xcfc 0x06 +[S +0.084233] OK +OK +[R +0.084291] write 0xe00003b8 0x1 0x01 +[S +0.084344] OK +OK +[R +0.084384] write 0x0 0x1 0x27 +[S +0.085007] OK +OK +[R +0.085041] write 0x1 0x1 0x80 +[S +0.085055] OK +OK +[R +0.085071] write 0x2 0x1 0x20 +[S +0.085084] OK +OK +[R +0.085096] write 0x7 0x1 0x01 +[S +0.085110] OK +OK +[R +0.085123] write 0xe0000398 0x1 0x01 +[S +0.085254] OK +OK +[R +0.085294] write 0xe0000398 0x1 0x00 +[S +0.085324] OK +OK +[R +0.085349] write 0xe0000398 0x1 0x01 +[S +0.085408] OK +OK +../hw/ide/ahci.c:1377:46: runtime error: member access within null pointer of type 'AHCICmdHdr' (aka 'struct AHCICmdHdr') +SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../hw/ide/ahci.c:1377:46 in +../hw/ide/ahci.c:1377:46: runtime error: load of null pointer of type 'uint16_t' (aka 'unsigned short') +SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../hw/ide/ahci.c:1377:46 in +AddressSanitizer:DEADLYSIGNAL +================================================================= +==2547739==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55abf3a79f9c bp 0x7ffc213000d0 sp 0x7ffc212fffa0 T0) +==2547739==The signal is caused by a READ memory access. +==2547739==Hint: address points to the zero page. + #0 0x55abf3a79f9c in ahci_pio_transfer /home/artemiin/Work/original_qemu/build/../hw/ide/ahci.c:1377:46 + #1 0x55abf3a8a396 in ide_transfer_start_norecurse /home/artemiin/Work/original_qemu/build/../hw/ide/core.c:581:5 + #2 0x55abf3aab79e in ide_transfer_start /home/artemiin/Work/original_qemu/build/../hw/ide/core.c:588:9 + #3 0x55abf3aab79e in ide_sector_read_cb /home/artemiin/Work/original_qemu/build/../hw/ide/core.c:789:5 + #4 0x55abf3a8d6e2 in ide_buffered_readv_cb /home/artemiin/Work/original_qemu/build/../hw/ide/core.c:684:9 + #5 0x55abf4f31d33 in blk_aio_complete /home/artemiin/Work/original_qemu/build/../block/block-backend.c:1552:9 + #6 0x55abf545010b in aio_bh_call /home/artemiin/Work/original_qemu/build/../util/async.c:172:5 + #7 0x55abf545089f in aio_bh_poll /home/artemiin/Work/original_qemu/build/../util/async.c:219:13 + #8 0x55abf53e746a in aio_dispatch /home/artemiin/Work/original_qemu/build/../util/aio-posix.c:424:5 + #9 0x55abf545469a in aio_ctx_dispatch /home/artemiin/Work/original_qemu/build/../util/async.c:361:5 + #10 0x7f358845b7a8 in g_main_context_dispatch (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x547a8) (BuildId: 9f90bd7bbfcf84a1f1c5a6102f70e6264837b9d4) + #11 0x55abf5455787 in glib_pollfds_poll /home/artemiin/Work/original_qemu/build/../util/main-loop.c:287:9 + #12 0x55abf5455787 in os_host_main_loop_wait /home/artemiin/Work/original_qemu/build/../util/main-loop.c:310:5 + #13 0x55abf5455787 in main_loop_wait /home/artemiin/Work/original_qemu/build/../util/main-loop.c:589:11 + #14 0x55abf425c296 in qemu_main_loop /home/artemiin/Work/original_qemu/build/../system/runstate.c:835:9 + #15 0x55abf51df1c6 in qemu_default_main /home/artemiin/Work/original_qemu/build/../system/main.c:48:14 + #16 0x55abf51df1a1 in main /home/artemiin/Work/original_qemu/build/../system/main.c:76:9 + #17 0x7f3587219249 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 + #18 0x7f3587219304 in __libc_start_main csu/../csu/libc-start.c:360:3 + #19 0x55abf353be60 in _start (/home/artemiin/Work/original_qemu/build/qemu-system-x86_64+0x1828e60) (BuildId: f91712a3af40a999ce35e39809ce00f92c35ae25) + +AddressSanitizer can not provide additional info. +SUMMARY: AddressSanitizer: SEGV /home/artemiin/Work/original_qemu/build/../hw/ide/ahci.c:1377:46 in ahci_pio_transfer +==2547739==ABORTING +``` +Additional information: +This issue may need a complicated patch so I ask developers to take a look at this issue. diff --git a/results/classifier/118/risc-v/2852 b/results/classifier/118/risc-v/2852 new file mode 100644 index 000000000..3670fd4dc --- /dev/null +++ b/results/classifier/118/risc-v/2852 @@ -0,0 +1,110 @@ +risc-v: 0.806 +peripherals: 0.798 +x86: 0.781 +register: 0.769 +graphic: 0.753 +user-level: 0.750 +performance: 0.739 +hypervisor: 0.738 +mistranslation: 0.712 +device: 0.710 +arm: 0.703 +TCG: 0.695 +permissions: 0.687 +ppc: 0.684 +VMM: 0.681 +architecture: 0.676 +socket: 0.672 +semantic: 0.668 +virtual: 0.665 +vnc: 0.664 +debug: 0.654 +network: 0.642 +i386: 0.633 +KVM: 0.617 +files: 0.610 +PID: 0.597 +assembly: 0.590 +kernel: 0.581 +boot: 0.567 + +heap-use-after-free in timer_pending() +Description of problem: +In the QED block driver, the need_check_timer timer is freed in +bdrv_qed_detach_aio_context, but the pointer to the timer is not +set to NULL. This can lead to a use-after-free scenario +in bdrv_qed_drain_begin(). +Steps to reproduce: +1. [test.qed](/uploads/c8820345bfcd562308da99d9f83df3cf/test.qed) +2. ./qemu-img snapshot -q -a test test.qed +Additional information: +<details> +<pre> +./qemu-img snapshot -q -a test test.qed +==21083==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! +================================================================= +==21083==ERROR: AddressSanitizer: heap-use-after-free on address 0x60400004ca50 at pc 0x56050d1462b6 bp 0x7fff14d0d870 sp 0x7fff14d0d868 +READ of size 8 at 0x60400004ca50 thread T0 + #0 0x56050d1462b5 in timer_pending /home/gerben/qemu-img_fuzz/build/../util/qemu-timer.c:483:16 + #1 0x56050cddf82e in bdrv_qed_drain_begin /home/gerben/qemu-img_fuzz/build/../block/qed.c:378:32 + #2 0x56050cb9bb65 in bdrv_do_drained_begin /home/gerben/qemu-img_fuzz/build/../block/io.c:364:13 + #3 0x56050cb9ca03 in bdrv_drain_all_begin_nopoll /home/gerben/qemu-img_fuzz/build/../block/io.c:506:9 + #4 0x56050cb96318 in bdrv_graph_wrlock /home/gerben/qemu-img_fuzz/build/../block/graph-lock.c:116:5 + #5 0x56050cd0cbc4 in bdrv_snapshot_goto /home/gerben/qemu-img_fuzz/build/../block/snapshot.c:294:9 + #6 0x56050cf95dd2 in img_snapshot /home/gerben/qemu-img_fuzz/build/../qemu-img.c:3500:15 + #7 0x7f4adeddbefc in __libc_start_main (/lib64/libc.so.6+0x27efc) + #8 0x56050c96a9f9 in _start /usr/src/RPM/BUILD/glibc-2.32-alt5.p10.3/csu/../sysdeps/x86_64/start.S:120 + +0x60400004ca50 is located 0 bytes inside of 48-byte region [0x60400004ca50,0x60400004ca80) +freed by thread T0 here: + #0 0x56050ca0daef in free /usr/src/RPM/BUILD/llvm-11.0.1.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:123:3 + #1 0x56050cde6b86 in bdrv_qed_do_close /home/gerben/qemu-img_fuzz/build/../block/qed.c:619:5 + #2 0x56050cddbe85 in bdrv_qed_close /home/gerben/qemu-img_fuzz/build/../block/qed.c:639:5 + #3 0x56050cd0cbb2 in bdrv_snapshot_goto /home/gerben/qemu-img_fuzz/build/../block/snapshot.c:290:13 + #4 0x56050cf95dd2 in img_snapshot /home/gerben/qemu-img_fuzz/build/../qemu-img.c:3500:15 + #5 0x7f4adeddbefc in __libc_start_main (/lib64/libc.so.6+0x27efc) + +previously allocated by thread T0 here: + #0 0x56050ca0dfa7 in calloc /usr/src/RPM/BUILD/llvm-11.0.1.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3 + #1 0x7f4adf359670 in g_malloc0 (/lib64/libglib-2.0.so.0+0x5c670) + #2 0x56050cde4bd0 in bdrv_qed_do_open /home/gerben/qemu-img_fuzz/build/../block/qed.c:543:5 + #3 0x56050cde21a2 in bdrv_qed_open_entry /home/gerben/qemu-img_fuzz/build/../block/qed.c:569:16 + #4 0x56050d137706 in coroutine_trampoline /home/gerben/qemu-img_fuzz/build/../util/coroutine-ucontext.c:175:9 + #5 0x7f4adee066cf (/lib64/libc.so.6+0x526cf) + +SUMMARY: AddressSanitizer: heap-use-after-free /home/gerben/qemu-img_fuzz/build/../util/qemu-timer.c:483:16 in timer_pending +Shadow bytes around the buggy address: + 0x0c08800018f0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa + 0x0c0880001900: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa + 0x0c0880001910: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa + 0x0c0880001920: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fd + 0x0c0880001930: fa fa 00 00 00 00 01 fa fa fa 00 00 00 00 00 fa +=>0x0c0880001940: fa fa 00 00 00 00 00 fa fa fa[fd]fd fd fd fd fd + 0x0c0880001950: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa + 0x0c0880001960: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa + 0x0c0880001970: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa + 0x0c0880001980: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa + 0x0c0880001990: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa +Shadow byte legend (one shadow byte represents 8 application bytes): + Addressable: 00 + Partially addressable: 01 02 03 04 05 06 07 + Heap left redzone: fa + Freed heap region: fd + Stack left redzone: f1 + Stack mid redzone: f2 + Stack right redzone: f3 + Stack after return: f5 + Stack use after scope: f8 + Global redzone: f9 + Global init order: f6 + Poisoned by user: f7 + Container overflow: fc + Array cookie: ac + Intra object redzone: bb + ASan internal: fe + Left alloca redzone: ca + Right alloca redzone: cb + Shadow gap: cc +==21083==ABORTING +</pre> +</details> diff --git a/results/classifier/118/risc-v/74545755 b/results/classifier/118/risc-v/74545755 new file mode 100644 index 000000000..61ac1e632 --- /dev/null +++ b/results/classifier/118/risc-v/74545755 @@ -0,0 +1,369 @@ +risc-v: 0.845 +user-level: 0.790 +register: 0.778 +permissions: 0.770 +mistranslation: 0.752 +debug: 0.740 +TCG: 0.722 +performance: 0.721 +device: 0.720 +semantic: 0.669 +virtual: 0.667 +arm: 0.662 +KVM: 0.661 +graphic: 0.660 +ppc: 0.659 +vnc: 0.650 +assembly: 0.648 +architecture: 0.636 +boot: 0.607 +VMM: 0.602 +files: 0.577 +peripherals: 0.566 +hypervisor: 0.563 +network: 0.550 +socket: 0.549 +x86: 0.545 +PID: 0.479 +kernel: 0.452 +i386: 0.376 + +[Bug Report][RFC PATCH 0/1] block: fix failing assert on paused VM migration + +There's a bug (failing assert) which is reproduced during migration of +a paused VM. I am able to reproduce it on a stand with 2 nodes and a common +NFS share, with VM's disk on that share. + +root@fedora40-1-vm:~# virsh domblklist alma8-vm + Target Source +------------------------------------------ + sda /mnt/shared/images/alma8.qcow2 + +root@fedora40-1-vm:~# df -Th /mnt/shared +Filesystem Type Size Used Avail Use% Mounted on +127.0.0.1:/srv/nfsd nfs4 63G 16G 48G 25% /mnt/shared + +On the 1st node: + +root@fedora40-1-vm:~# virsh start alma8-vm ; virsh suspend alma8-vm +root@fedora40-1-vm:~# virsh migrate --compressed --p2p --persistent +--undefinesource --live alma8-vm qemu+ssh://fedora40-2-vm/system + +Then on the 2nd node: + +root@fedora40-2-vm:~# virsh migrate --compressed --p2p --persistent +--undefinesource --live alma8-vm qemu+ssh://fedora40-1-vm/system +error: operation failed: domain is not running + +root@fedora40-2-vm:~# tail -3 /var/log/libvirt/qemu/alma8-vm.log +2024-09-19 13:53:33.336+0000: initiating migration +qemu-system-x86_64: ../block.c:6976: int +bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & +BDRV_O_INACTIVE)' failed. +2024-09-19 13:53:42.991+0000: shutting down, reason=crashed + +Backtrace: + +(gdb) bt +#0 0x00007f7eaa2f1664 in __pthread_kill_implementation () at /lib64/libc.so.6 +#1 0x00007f7eaa298c4e in raise () at /lib64/libc.so.6 +#2 0x00007f7eaa280902 in abort () at /lib64/libc.so.6 +#3 0x00007f7eaa28081e in __assert_fail_base.cold () at /lib64/libc.so.6 +#4 0x00007f7eaa290d87 in __assert_fail () at /lib64/libc.so.6 +#5 0x0000563c38b95eb8 in bdrv_inactivate_recurse (bs=0x563c3b6c60c0) at +../block.c:6976 +#6 0x0000563c38b95aeb in bdrv_inactivate_all () at ../block.c:7038 +#7 0x0000563c3884d354 in qemu_savevm_state_complete_precopy_non_iterable +(f=0x563c3b700c20, in_postcopy=false, inactivate_disks=true) + at ../migration/savevm.c:1571 +#8 0x0000563c3884dc1a in qemu_savevm_state_complete_precopy (f=0x563c3b700c20, +iterable_only=false, inactivate_disks=true) at ../migration/savevm.c:1631 +#9 0x0000563c3883a340 in migration_completion_precopy (s=0x563c3b4d51f0, +current_active_state=<optimized out>) at ../migration/migration.c:2780 +#10 migration_completion (s=0x563c3b4d51f0) at ../migration/migration.c:2844 +#11 migration_iteration_run (s=0x563c3b4d51f0) at ../migration/migration.c:3270 +#12 migration_thread (opaque=0x563c3b4d51f0) at ../migration/migration.c:3536 +#13 0x0000563c38dbcf14 in qemu_thread_start (args=0x563c3c2d5bf0) at +../util/qemu-thread-posix.c:541 +#14 0x00007f7eaa2ef6d7 in start_thread () at /lib64/libc.so.6 +#15 0x00007f7eaa373414 in clone () at /lib64/libc.so.6 + +What happens here is that after 1st migration BDS related to HDD remains +inactive as VM is still paused. Then when we initiate 2nd migration, +bdrv_inactivate_all() leads to the attempt to set BDRV_O_INACTIVE flag +on that node which is already set, thus assert fails. + +Attached patch which simply skips setting flag if it's already set is more +of a kludge than a clean solution. Should we use more sophisticated logic +which allows some of the nodes be in inactive state prior to the migration, +and takes them into account during bdrv_inactivate_all()? Comments would +be appreciated. + +Andrey + +Andrey Drobyshev (1): + block: do not fail when inactivating node which is inactive + + block.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +-- +2.39.3 + +Instead of throwing an assert let's just ignore that flag is already set +and return. We assume that it's going to be safe to ignore. Otherwise +this assert fails when migrating a paused VM back and forth. + +Ideally we'd like to have a more sophisticated solution, e.g. not even +scan the nodes which should be inactive at this point. + +Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> +--- + block.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +diff --git a/block.c b/block.c +index 7d90007cae..c1dcf906d1 100644 +--- a/block.c ++++ b/block.c +@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK +bdrv_inactivate_recurse(BlockDriverState *bs) + return 0; + } + +- assert(!(bs->open_flags & BDRV_O_INACTIVE)); ++ if (bs->open_flags & BDRV_O_INACTIVE) { ++ /* ++ * Return here instead of throwing assert as a workaround to ++ * prevent failure on migrating paused VM. ++ * Here we assume that if we're trying to inactivate BDS that's ++ * already inactive, it's safe to just ignore it. ++ */ ++ return 0; ++ } + + /* Inactivate this node */ + if (bs->drv->bdrv_inactivate) { +-- +2.39.3 + +[add migration maintainers] + +On 24.09.24 15:56, Andrey Drobyshev wrote: +Instead of throwing an assert let's just ignore that flag is already set +and return. We assume that it's going to be safe to ignore. Otherwise +this assert fails when migrating a paused VM back and forth. + +Ideally we'd like to have a more sophisticated solution, e.g. not even +scan the nodes which should be inactive at this point. + +Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> +--- + block.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +diff --git a/block.c b/block.c +index 7d90007cae..c1dcf906d1 100644 +--- a/block.c ++++ b/block.c +@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK +bdrv_inactivate_recurse(BlockDriverState *bs) + return 0; + } +- assert(!(bs->open_flags & BDRV_O_INACTIVE)); ++ if (bs->open_flags & BDRV_O_INACTIVE) { ++ /* ++ * Return here instead of throwing assert as a workaround to ++ * prevent failure on migrating paused VM. ++ * Here we assume that if we're trying to inactivate BDS that's ++ * already inactive, it's safe to just ignore it. ++ */ ++ return 0; ++ } +/* Inactivate this node */ +if (bs->drv->bdrv_inactivate) { +I doubt that this a correct way to go. + +As far as I understand, "inactive" actually means that "storage is not belong to +qemu, but to someone else (another qemu process for example), and may be changed +transparently". In turn this means that Qemu should do nothing with inactive disks. So the +problem is that nobody called bdrv_activate_all on target, and we shouldn't ignore that. + +Hmm, I see in process_incoming_migration_bh() we do call bdrv_activate_all(), +but only in some scenarios. May be, the condition should be less strict here. + +Why we need any condition here at all? Don't we want to activate block-layer on +target after migration anyway? + +-- +Best regards, +Vladimir + +On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote: +> +[add migration maintainers] +> +> +On 24.09.24 15:56, Andrey Drobyshev wrote: +> +> [...] +> +> +I doubt that this a correct way to go. +> +> +As far as I understand, "inactive" actually means that "storage is not +> +belong to qemu, but to someone else (another qemu process for example), +> +and may be changed transparently". In turn this means that Qemu should +> +do nothing with inactive disks. So the problem is that nobody called +> +bdrv_activate_all on target, and we shouldn't ignore that. +> +> +Hmm, I see in process_incoming_migration_bh() we do call +> +bdrv_activate_all(), but only in some scenarios. May be, the condition +> +should be less strict here. +> +> +Why we need any condition here at all? Don't we want to activate +> +block-layer on target after migration anyway? +> +Hmm I'm not sure about the unconditional activation, since we at least +have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it +in such a case). In current libvirt upstream I see such code: + +> +/* Migration capabilities which should always be enabled as long as they +> +> +* are supported by QEMU. If the capability is supposed to be enabled on both +> +> +* sides of migration, it won't be enabled unless both sides support it. +> +> +*/ +> +> +static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] = +> +{ +> +> +{QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER, +> +> +QEMU_MIGRATION_SOURCE}, +> +> +> +> +{QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE, +> +> +QEMU_MIGRATION_DESTINATION}, +> +> +}; +which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set. + +The code from process_incoming_migration_bh() you're referring to: + +> +/* If capability late_block_activate is set: +> +> +* Only fire up the block code now if we're going to restart the +> +> +* VM, else 'cont' will do it. +> +> +* This causes file locking to happen; so we don't want it to happen +> +> +* unless we really are starting the VM. +> +> +*/ +> +> +if (!migrate_late_block_activate() || +> +> +(autostart && (!global_state_received() || +> +> +runstate_is_live(global_state_get_runstate())))) { +> +> +/* Make sure all file formats throw away their mutable metadata. +> +> +> +* If we get an error here, just don't restart the VM yet. */ +> +> +bdrv_activate_all(&local_err); +> +> +if (local_err) { +> +> +error_report_err(local_err); +> +> +local_err = NULL; +> +> +autostart = false; +> +> +} +> +> +} +It states explicitly that we're either going to start VM right at this +point if (autostart == true), or we wait till "cont" command happens. +None of this is going to happen if we start another migration while +still being in PAUSED state. So I think it seems reasonable to take +such case into account. For instance, this patch does prevent the crash: + +> +diff --git a/migration/migration.c b/migration/migration.c +> +index ae2be31557..3222f6745b 100644 +> +--- a/migration/migration.c +> ++++ b/migration/migration.c +> +@@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque) +> +*/ +> +if (!migrate_late_block_activate() || +> +(autostart && (!global_state_received() || +> +- runstate_is_live(global_state_get_runstate())))) { +> ++ runstate_is_live(global_state_get_runstate()))) || +> ++ (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) { +> +/* Make sure all file formats throw away their mutable metadata. +> +* If we get an error here, just don't restart the VM yet. */ +> +bdrv_activate_all(&local_err); +What are your thoughts on it? + +Andrey + |