QEMU mishandling of SO_PEERSEC forces systemd into tight loop While building Debian images for embedded ARM target systems I detected that QEMU seems to force newer systemd daemons into a tight loop. My setup is the following: Host machine: Ubuntu 18.04, amd64 LXD container: Debian Buster, arm64, systemd 241 QEMU: qemu-aarch64-static, 4.0.0-rc2 (custom build) and 3.1.0 (Debian 1:3.1+dfsg-7) To easily reproduce the issue I have created the following repository: https://github.com/lueschem/edi-qemu The call where systemd gets looping is the following: 2837 getsockopt(3,1,31,274891889456,274887218756,274888927920) = -1 errno=34 (Numerical result out of range) Furthermore I also verified that the issue is not related to LXD. The same behavior can be reproduced using systemd-nspawn. This issue reported against systemd seems to be related: https://github.com/systemd/systemd/issues/11557 As described on the systemd issue, the syscall we're getting wrong here is getsockopt(fd, SOL_SOCKET, SO_PEERSEC, ...). Our linux-user/syscall.c:do_getsockopt() doesn't have any special case code for the payload on this function, so we treat it as if it were just an integer payload, which is not correct here. Unfortunately I can't find any documentation on exactly what SO_PEERSEC does or what the payload format is, which makes it pretty hard to fix this bug :-( It's not listed in the socket(7) manpage -- https://linux.die.net/man/7/socket -- which is where I'd expect to find it, and the kernel source code is pretty confusing in this area. This is probably the tight loop that gets triggered: https://github.com/systemd/systemd/commit/217d89678269334f461e9abeeffed57077b21454 It looks like the previous implementation was just a bit more "tolerant". I have just studied a bit the systemd code and this brought me to the following idea/temporary workaround: What about returning -1 (error) and setting errno when getsockopt(fd, SOL_SOCKET, SO_PEERSEC, ...) gets called? This would then let systemd know that SO_PEERSEC is not (yet) implemented. I filed the duplicate #1840252 of this bug. I think that the options SO_PEERCRED and SO_PEERSEC belong into the context of SELINUX. So maybe the format of the paylod can be found in the sources of libselinux? I'd like to compile qemu with a local hack to work around my current problem. Something like Matthias Lüscher suggested. @Peter Maydell: could you point me to the location in the qemu source where I could apply such a hack? I patched linux-user/syscall.c (see below, branch stable-2.11) which works around my problem. So far so good, but the qemu-arm that i compiled is terribly slow compared to the one that came with Ubuntu 18.04. Any hints? I configured as this: ./configure --static --enable-kvm --target-list=arm-linux-user diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 74d56e2ee6..4fa9a09b12 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -3185,6 +3185,8 @@ static abi_long do_getsockopt(int sockfd, int level, int optname, case TARGET_SO_SNDTIMEO: case TARGET_SO_PEERNAME: goto unimplemented; + case TARGET_SO_PEERSEC: /* added to escape infinite loop */ + goto unimplemented; case TARGET_SO_PEERCRED: { struct ucred cr; socklen_t crlen; I'm a bit surprised that this bug doesn't get more attention, as it makes it very hard to run qemu-emulated containers of Bionic hosted on Bionic. Running such containers is a common way to cross-compile packages for foreign architectures in the absence of sufficiently powerful target HW. The documentation on SO_PEERSEC is indeed sparse, but I do want to second Fritz in his approach. I don't see a reason, why treating the payload as incorrect and throwing it back at the application is better than handling it and saying it is not implemented (which is the case). Arguably, applications should be fixed to handle the error correctly, but I'm afraid that is a can of worms. I have encountered the same problem with systemd, apt and getent. Would the maintainers be open to an SRU request on QEMU for this? Could you test the attached patch? Thanks, Laurent! I'll get back to you, asap. > Could you test the attached patch? > Works great! This is my test setup: Host machine: Ubuntu 18.04, amd64 LXD container: Debian Buster, arm64, systemd 241 QEMU: qemu-aarch64(-static), compiled from source (4.2.0), patched with your patch. Many thanks! Matthias I carried out the following test: * fetched the QEMU coming with 18.04, * added this patch, * built an LXD container with arch arm64 and the patched qemu-aarch64-static inside, * launched it on amd64 Previously various systemd services would not come up properly, now they are running like a charm. The only grief I have is that network configuration does not work, but that is due to # ip addr Unsupported setsockopt level=270 optname=11 which is a different story. Well, it's kind of irrelevant but I am trying that on archlinux and this does not work for me. Using systemd-244.2-1 and qemu-user-static-4.2 that I built with Laurent's patch. May be I have done something wrong ? I still get that error that leads me here: Failed to enqueue loopback interface start request: Operation not supported Caught , dumped core as pid 3. Exiting PID 1... I am trying to boot with systemd-nspawn an archlinux-arm built for a rpi0. That's fine if I don't boot it. Laurent's patch worked for me as well. I grabbed the source for the Debian 10 qemu-user-static package, qemu_3.1+dfsg-8+deb10u3, applied the patch and re-built the qemu-arm-static binary. Copying the new binary into a Docker image based on arm32v7/debian:10-slim allowed /sbin/init to bring up the container with a responsive systemctl command. Prior to the patch, systemd did not start any services inside the container and systemctl would hang when executed directly. Thanks! -Charlie This seems to be the error reported in https://bugs.launchpad.net/qemu/+bug/1857811 Fixed here: https://git.qemu.org/?p=qemu.git;a=commitdiff;h=6d485a55d0cd Bisect worked and once you find it it seems obvious that this is exactly our case: commit 65b261a63a48fbb3b11193361d4ea0c38a3c3dfd Author: Laurent Vivier