1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
|
assembly: 0.986
socket: 0.986
semantic: 0.985
device: 0.985
graphic: 0.985
instruction: 0.984
other: 0.978
vnc: 0.973
KVM: 0.969
network: 0.968
mistranslation: 0.950
boot: 0.933
QEMU mishandling of SO_PEERSEC forces systemd into tight loop
While building Debian images for embedded ARM target systems I detected that QEMU seems to force newer systemd daemons into a tight loop.
My setup is the following:
Host machine: Ubuntu 18.04, amd64
LXD container: Debian Buster, arm64, systemd 241
QEMU: qemu-aarch64-static, 4.0.0-rc2 (custom build) and 3.1.0 (Debian 1:3.1+dfsg-7)
To easily reproduce the issue I have created the following repository:
https://github.com/lueschem/edi-qemu
The call where systemd gets looping is the following:
2837 getsockopt(3,1,31,274891889456,274887218756,274888927920) = -1 errno=34 (Numerical result out of range)
Furthermore I also verified that the issue is not related to LXD.
The same behavior can be reproduced using systemd-nspawn.
This issue reported against systemd seems to be related:
https://github.com/systemd/systemd/issues/11557
As described on the systemd issue, the syscall we're getting wrong here is getsockopt(fd, SOL_SOCKET, SO_PEERSEC, ...). Our linux-user/syscall.c:do_getsockopt() doesn't have any special case code for the payload on this function, so we treat it as if it were just an integer payload, which is not correct here.
Unfortunately I can't find any documentation on exactly what SO_PEERSEC does or what the payload format is, which makes it pretty hard to fix this bug :-( It's not listed in the socket(7) manpage -- https://linux.die.net/man/7/socket -- which is where I'd expect to find it, and the kernel source code is pretty confusing in this area.
This is probably the tight loop that gets triggered:
https://github.com/systemd/systemd/commit/217d89678269334f461e9abeeffed57077b21454
It looks like the previous implementation was just a bit more "tolerant".
I have just studied a bit the systemd code and this brought me to the following idea/temporary workaround: What about returning -1 (error) and setting errno when getsockopt(fd, SOL_SOCKET, SO_PEERSEC, ...) gets called? This would then let systemd know that SO_PEERSEC is not (yet) implemented.
I filed the duplicate #1840252 of this bug.
I think that the options SO_PEERCRED and SO_PEERSEC belong into the context of SELINUX. So maybe the format of the paylod can be found in the sources of libselinux?
I'd like to compile qemu with a local hack to work around my current problem. Something like Matthias Lüscher suggested.
@Peter Maydell: could you point me to the location in the qemu source where I could apply such a hack?
I patched linux-user/syscall.c (see below, branch stable-2.11) which works around my problem.
So far so good, but the qemu-arm that i compiled is terribly slow compared to the one that came with Ubuntu 18.04. Any hints?
I configured as this:
./configure --static --enable-kvm --target-list=arm-linux-user
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 74d56e2ee6..4fa9a09b12 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -3185,6 +3185,8 @@ static abi_long do_getsockopt(int sockfd, int level, int optname,
case TARGET_SO_SNDTIMEO:
case TARGET_SO_PEERNAME:
goto unimplemented;
+ case TARGET_SO_PEERSEC: /* added to escape infinite loop */
+ goto unimplemented;
case TARGET_SO_PEERCRED: {
struct ucred cr;
socklen_t crlen;
I'm a bit surprised that this bug doesn't get more attention, as it makes it very hard to run qemu-emulated containers of Bionic hosted on Bionic. Running such containers is a common way to cross-compile packages for foreign architectures in the absence of sufficiently powerful target HW.
The documentation on SO_PEERSEC is indeed sparse, but I do want to second Fritz in his approach. I don't see a reason, why treating the payload as incorrect and throwing it back at the application is better than handling it and saying it is not implemented (which is the case).
Arguably, applications should be fixed to handle the error correctly, but I'm afraid that is a can of worms. I have encountered the same problem with systemd, apt and getent. Would the maintainers be open to an SRU request on QEMU for this?
Could you test the attached patch?
Thanks, Laurent! I'll get back to you, asap.
> Could you test the attached patch?
>
Works great!
This is my test setup:
Host machine: Ubuntu 18.04, amd64
LXD container: Debian Buster, arm64, systemd 241
QEMU: qemu-aarch64(-static), compiled from source (4.2.0), patched with
your patch.
Many thanks!
Matthias
I carried out the following test:
* fetched the QEMU coming with 18.04,
* added this patch,
* built an LXD container with arch arm64 and the patched qemu-aarch64-static inside,
* launched it on amd64
Previously various systemd services would not come up properly, now they are running like a charm. The only grief I have is that network configuration does not work, but that is due to
# ip addr
Unsupported setsockopt level=270 optname=11
which is a different story.
Well, it's kind of irrelevant but I am trying that on archlinux and this does not work for me.
Using systemd-244.2-1 and qemu-user-static-4.2 that I built with Laurent's patch. May be I have done something wrong ?
I still get that error that leads me here:
Failed to enqueue loopback interface start request: Operation not supported
Caught <SEGV>, dumped core as pid 3.
Exiting PID 1...
I am trying to boot with systemd-nspawn an archlinux-arm built for a rpi0. That's fine if I don't boot it.
Laurent's patch worked for me as well.
I grabbed the source for the Debian 10 qemu-user-static package, qemu_3.1+dfsg-8+deb10u3, applied the patch and re-built the qemu-arm-static binary. Copying the new binary into a Docker image based on arm32v7/debian:10-slim allowed /sbin/init to bring up the container with a responsive systemctl command.
Prior to the patch, systemd did not start any services inside the container and systemctl would hang when executed directly.
Thanks!
-Charlie
This seems to be the error reported in https://bugs.launchpad.net/qemu/+bug/1857811
Fixed here:
https://git.qemu.org/?p=qemu.git;a=commitdiff;h=6d485a55d0cd
Bisect worked and once you find it it seems obvious that this is exactly our case:
commit 65b261a63a48fbb3b11193361d4ea0c38a3c3dfd
Author: Laurent Vivier <email address hidden>
Date: Thu Jul 9 09:23:32 2020 +0200
linux-user: add netlink RTM_SETLINK command
This command is needed to be able to boot systemd in a container.
$ sudo systemd-nspawn -D /chroot/armhf/sid/ -b
Spawning container sid on /chroot/armhf/sid.
Press ^] three times within 1s to kill container.
systemd 245.6-2 running in system mode.
Detected virtualization systemd-nspawn.
Detected architecture arm.
Welcome to Debian GNU/Linux bullseye/sid!
Set hostname to <virt-arm>.
Failed to enqueue loopback interface start request: Operation not supported
Caught <SEGV>, dumped core as pid 3.
Exiting PID 1...
Container sid failed with error code 255.
Signed-off-by: Laurent Vivier <email address hidden>
Message-Id: <email address hidden>
linux-user/fd-trans.c | 1 +
1 file changed, 1 insertion(+)
Sorry, posted this on the wrong bug :-/
I beg your pardon for the noise.
Hmm, I'm using qemu-5.1.0 and I'm still seeing this (host is bionic, container image is focal)
[..]
getsockopt(31, SOL_SOCKET, SO_PEERGROUPS, 0x7ffdbd58941c, [4]) = -1 ERANGE (Numerical result out of range)
getsockopt(31, SOL_SOCKET, SO_PEERGROUPS, 0x7ffdbd58941c, [4]) = -1 ERANGE (Numerical result out of range)
getsockopt(31, SOL_SOCKET, SO_PEERGROUPS, 0x7ffdbd58941c, [4]) = -1 ERANGE (Numerical result out of range)
[..]
Regression?
I can confirm that I see the same issue when trying to run Ubuntu 20.04 aarch64 in a container, strace shows a tight loop on:
getsockopt(31, SOL_SOCKET, SO_PEERGROUPS, 0x7ffdbd58941c, [4]) = -1 ERANGE
Re-building qemu-user-static with the patch has fixed Debian 10 armhf for me, but the same re-build does not seem to help with this Ubuntu issue.
SO_PEERGROUPS is not implemented and processed as an "int" (this explains the ERANGE)
Could you try the attached patch?
In my case the issue with using Ubuntu 20.04 as a container host appears to have come down to the use of the F, or "fix binary", flag by binfmnt_misc:
# cat /proc/sys/fs/binfmt_misc/qemu-aarch64
enabled
interpreter /usr/bin/qemu-aarch64-static
flags: OCF
offset 0
magic 7f454c460201010000000000000000000200b700
mask ffffffffffffff00fffffffffffffffffeffffff
This appears to be new in Ubuntu 20.04 relative to 18.04 and results in the qemu-user-static binaries being loaded and locked into memory _when the package is installed_.
Therefore, what I observed as a regression of the issue was actually just the packaged binaries now taking precedence over patched versions that I had built and copied into place.
Ubuntu 20.04 ships QEMU 4.2. Using the QEMU 5.1 packages from Debian Bullseye allows SystemD to run in a foreign architecture container without spinning in a loop:
https://packages.debian.org/bullseye/qemu-user-static
|