Using setreuid / setegid crashes x86_64 user-mode target When setreuid() or setegid() are called from x86_64 target code in user mode, qemu crashes inside the NPTL signal handlers. x86 targets do not directly use a syscall to handle setreuid() / setegid(); instead the x86 NPTL implementation sets up a temporary data region in memory (__xidcmd) and issues a signal (SIGRT1) to all threads, allowing the handler for that signal to issue the syscall. Under qemu, __xidcmd remains null (see variable display below backtrace). Backtrace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x3fff85c74fc0 (LWP 74517)] 0x000000006017491c in sighandler_setxid (sig=33, si=0x3fff85c72d08, ctx=0x3fff85c71f90) at nptl-init.c:263 263 nptl-init.c: No such file or directory. (gdb) thread apply all bt Thread 3 (Thread 0x3fff87e8efc0 (LWP 74515)): #0 0x00000000601cc430 in syscall () #1 0x0000000060109080 in futex_wait (val=, ev=) at /build/qemu/util/qemu-thread-posix.c:292 #2 qemu_event_wait (ev=0x62367bb0 ) at /build/qemu/util/qemu-thread-posix.c:399 #3 0x000000006010f73c in call_rcu_thread (opaque=) at /build/qemu/util/rcu.c:250 #4 0x0000000060176f8c in start_thread (arg=0x3fff87e8efc0) at pthread_create.c:336 #5 0x00000000601cebf4 in clone () Thread 2 (Thread 0x3fff85c74fc0 (LWP 74517)): #0 0x000000006017491c in sighandler_setxid (sig=33, si=0x3fff85c72d08, ctx=0x3fff85c71f90) at nptl-init.c:263 #1 #2 0x00000000601cc42c in syscall () #3 0x0000000060044b08 in safe_futex (val3=, uaddr2=0x0, timeout=, val=, op=128, uaddr=) at /build/qemu/linux-user/syscall.c:748 #4 do_futex (val3=, uaddr2=275186650880, timeout=0, val=1129, op=128, uaddr=275186651116) at /build/qemu/linux-user/syscall.c:6201 #5 do_syscall (cpu_env=0x1000abfd350, num=, arg1=275186651116, arg2=, arg3=1129, arg4=0, arg5=275186650880, arg6=, arg7=0, arg8=0) at /build/qemu/linux-user/syscall.c:10651 #6 0x00000000600347b8 in cpu_loop (env=0x1000abfd350) at /build/qemu/linux-user/main.c:317 #7 0x0000000060036ae0 in clone_func (arg=0x3fffc4c2ca38) at /build/qemu/linux-user/syscall.c:5445 #8 0x0000000060176f8c in start_thread (arg=0x3fff85c74fc0) at pthread_create.c:336 #9 0x00000000601cebf4 in clone () Thread 1 (Thread 0x1000aa05000 (LWP 74511)): #0 0x00000000601cc430 in syscall () #1 0x0000000060044b08 in safe_futex (val3=, uaddr2=0x0, timeout=, val=, op=128, uaddr=) at /build/qemu/linux-user/syscall.c:748 #2 do_futex (val3=, uaddr2=1, timeout=0, val=1, op=128, uaddr=275078324992) at /build/qemu/linux-user/syscall.c:6201 #3 do_syscall (cpu_env=0x1000aa23890, num=, arg1=275078324992, arg2=, arg3=1, arg4=0, arg5=1, arg6=, arg7=0, arg8=0) at /build/qemu/linux-user/syscall.c:10651 #4 0x00000000600347b8 in cpu_loop (env=0x1000aa23890) at /build/qemu/linux-user/main.c:317 #5 0x00000000600020e4 in main (argc=, argv=, envp=) at /build/qemu/linux-user/main.c:4779 (gdb) p __xidcmd $1 = (struct xid_command *) 0x0 https://patches.linaro.org/patch/63313/ may be relevant here. Whoops, I meant http://patchwork.ozlabs.org/patch/590640/. Sounds very relevant, yes. Thanks for the link! What's the status of this? This is causing aptitude in Debian chroots to reliably segfault under qemu-arm-user. And to confirm, while somewhat of a hack, that patch does indeed fix aptitude as expected (as well as my minimal test case I wrote when debugging this before stumbling upon this bug report). I guess this is happening to my attempts to use pbuilder for cross-building armel / mipsel packages Did anything ever happen here? Trying to upgrade Ubuntu ARM container images using qemu-user on x86-64 from bionic to focal.. Steve, could you describe your problem with more details? What is the version of qemu you are using? Sorry, lost your reply in amongst the chaos of my life! OK, quick test case (type at command line, don't run as script!), host arch is x86-64, you need qemu-user-static installed.. wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-armhf-root.tar.xz sudo -s mkdir armcont cd armcont tar xf ../bionic-server-cloudimg-armhf-root.tar.xz cp /usr/bin/qemu-arm-static armcont/usr/bin/ rm armcont/etc/resolv.conf; cp /etc/resolv.conf armcont/etc/ systemd-nspawn -D armcont/ do-release-upgrade -d # may need to drop the "-d" once 20.04.1 is released Yields: qemu:handle_cpu_signal received signal outside vCPU context @ pc=0x601540af (Need a "cd .." after the tar, doh.) Actually, this is possibly not the same bug. I will add to the list to investigate further.. Sorry, heat is definitely getting to me - just realized this is an upstream bug, not an Ubuntu one! Ubuntu package version is 1:2.11+dfsg-1ubuntu7.26. At some point I will look at building qemu from source, but won't be an immediate thing. Apologies for noise .. though if anyone knows off-hand that a fix for this did or didn't get merged that would useful .. OK, messing around with https://hub.docker.com/r/multiarch/ubuntu-core for quick testing, it looks like modern versions of qemu do not have this bug - I used the test case from Bug #1815911 (which I think is probably a duplicate) and cannot reproduce with: REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/multiarch/ubuntu-core armhf-bionic 78064abebdab 41 minutes ago 52 MB Which seems to use "qemu-arm version 5.0.0 (qemu-5.0.0-2.fc33)" If copy in the ancient qemu-arm-static that ships with Bionic by default the error returns. So I guess this can be closed, but would be nice to know which commit fixed this, for cherrypicking .. Possibly https://github.com/qemu/qemu/commit/6bc024e713fd35eb5fddbe16acd8dc92d27872a9#diff-6389d258c2de2f974953be12cab45851 ? Bionic's QEMU is 2.11. There were a lot of fixes to the linux-user handling and in particular to various race conditions in its handling of multi-threaded guest programs and also to the guest signal handling code -- I'm not sure it'd be feasible to identify and cherry-pick them all at this point... My stock advice for "any linux-user guest bug" plus "QEMU prior to 4.0" is "try again with a newer QEMU". The QEMU project is currently considering to move its bug tracking to another system. For this we need to know which bugs are still valid and which could be closed already. Thus we are setting older bugs to "Incomplete" now. If you still think this bug report here is valid, then please switch the state back to "New" within the next 60 days, otherwise this report will be marked as "Expired". Or please mark it as "Fix Released" if the problem has been solved with a newer version of QEMU already. Thank you and sorry for the inconvenience. [Expired for QEMU because there has been no activity for 60 days.]