1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
|
socket: 0.802
graphic: 0.771
device: 0.754
kernel: 0.742
performance: 0.724
architecture: 0.689
debug: 0.678
x86: 0.664
network: 0.646
PID: 0.565
vnc: 0.558
ppc: 0.518
hypervisor: 0.510
permissions: 0.499
VMM: 0.490
risc-v: 0.489
semantic: 0.474
user-level: 0.444
assembly: 0.430
mistranslation: 0.423
files: 0.417
i386: 0.414
boot: 0.381
peripherals: 0.361
arm: 0.334
TCG: 0.295
register: 0.283
KVM: 0.258
virtual: 0.172
qemu8-user on Linux: SIGSEGV because brk(NULL) does not exist
Description of problem:
On Linux, the return value of the system call brk(NULL) need not point to a page that exists.
If so, then qemu8-user will generate SIGSEGV at the next call to brk() with a higher value,
because qemu8 believes that it should maintain contiguous .bss with bytes of value 0.
Thus qemu8-user so calls `memset(g2h_untagged(target_brk), 0, brk_page - target_brk);
in do_brk() at ../linux-user/syscall.c:867, and this generates SIGSEGV at
the non-existent page that covers brk(NULL).
Instead, the safest thing to do is nothing at all.
Linux deliberately returns a random value for brk(NULL), subject to the conditions
that the value be at least as large as the maximum over all PT_LOAD of (.p_vaddr + .p_memsz),
and "somewhat near" that maximum. The purpose of randomness is to use variability
to interfere with effectiveness of malware, and to expose application coding errors
regarding brk() and sbrk(). If qemu-user wants to preserve contiguous .bss,
then qemu-user should call memset() only if the first page of the range exists.
(As explained in the next paragraph, "contiguous .bss" is a murky concept.)
Linux itself is partly to blame, because it computes the maximum (.p_vaddr + .p_memsz)
over all the PT_LOAD of the most recent execve(). The most recent execve() seen by
Linux might have no relationship to the state of the address space at the time of
_either_ call to brk(). The app can do arbitrary mmap, munmap, mprotect at any time.
In particular, the run-time de-compressor of UPX does exactly that for a compressed
main program. The maximum computed by Linux is for the compressed program,
which has a different layout than the de-compressed program.
There is a Linux system call prctl(PR_SET_MM_BRK, new_value) which sets a value
for "the brk", but that syscall tries to validate the new_value based on
the most recent execve(). Once again, that has no relationship to the current
layout of the address space produced by the UPX de-compressor.
Steps to reproduce:
1. build qemu8-x86_64 from
```
commit fcb237e64f9d026c03d635579c7b288d0008a6e5 (HEAD -> master, origin/master, origin/HEAD)
Merge: 2ff49e96ac c00aac6f14
Date: Mon Jul 10 09:17:06 2023 +0100
```
2. run `build/qemu-x86_64 -strace upx-4.0.2-amd64_linux/upx --version` where the upx
is from https://github.com/upx/upx/releases/download/v4.0.2/upx-4.0.2-amd64_linux.tar.xz
3. output ends with
```
372621 close(3) = 0
372621 munmap(0x0000004000803000,3055) = 0
|