diff options
Diffstat (limited to 'results/classifier/105/other/1727737')
| -rw-r--r-- | results/classifier/105/other/1727737 | 191 |
1 files changed, 191 insertions, 0 deletions
diff --git a/results/classifier/105/other/1727737 b/results/classifier/105/other/1727737 new file mode 100644 index 00000000..961a853b --- /dev/null +++ b/results/classifier/105/other/1727737 @@ -0,0 +1,191 @@ +other: 0.917 +graphic: 0.907 +mistranslation: 0.880 +semantic: 0.875 +assembly: 0.867 +instruction: 0.867 +socket: 0.830 +device: 0.829 +vnc: 0.822 +KVM: 0.818 +boot: 0.811 +network: 0.804 + +qemu-arm stalls on a GCC sanitizer test since qemu-2.7 + +Hi, + +I have noticed that several GCC/sanitizer tests fail with timeout when executed under QEMU. + +After a bit of investigation, I have noticed that this worked with qemu-2.7, and started failing with qemu-2.8, and still fails with qemu-2.10.1 + +I'm attaching a tarball containing: +alloca_instruments_all_paddings.exe : the testcase, and the needed libs: +lib/librt.so.1 +lib/libdl.so.2 +lib/ld-linux-armhf.so.3 +lib/libasan.so.5 +lib/libc.so.6 +lib/libgcc_s.so.1 +lib/libpthread.so.0 +lib/libm.so.6 + +To reproduce the problem: +$ qemu-arm -cpu any -R 0 -L $PWD $PWD/alloca_instruments_all_paddings.exe +returns in less than a second with qemu-2.7, and never with qemu-2.8 + +Using -d in_asm suggests that the program "almost" completes and qemu seems to stall on: +0x40b6eb44: e08f4004 add r4, pc, r4 + + + +Hi. Your test case doesn't run for me: + +qemu-arm -cpu any -R 0 -L $PWD $PWD/alloca_instruments_all_paddings.exe +/tmp/bug1727737/alloca_instruments_all_paddings.exe: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory + +Did you forget to include one of the needed libs in the tarball? + + +Right, it worked for me because of the encoded rpath. +Here is the missing libstdc++.so.6 + + +Thanks. With that extra library, if I run with QEMU_STRACE=1 the following looks very suspicious: + +28865 getpid() = 28865 +28865 mmap2(NULL,2101248,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,0) = 0x43234000 +28865 mprotect(0x43234000,4096,PROT_NONE) = 0 +28865 rt_sigprocmask(SIG_BLOCK,0x40e077bc,0x40e0783c) = 0 +28865 clone(CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_UNTRACED,child_stack=0x43434ff8,parent_tidptr=0x00000000,tl +s=0x00000000,child_tidptr=0x00000000) = -1 errno=22 (Invalid argument) +28865 rt_sigprocmask(SIG_SETMASK,0x40e0783c,NULL) = 0 +28865 getpid() = 28865 +28865 sched_yield(1082131140,0,0,0,1084256812,1084256808) = 0 +28865 sched_yield(0,0,0,0,1084256812,1084256808) = 0 +28865 sched_yield(0,0,0,0,1084256812,1084256808) = 0 +28865 sched_yield(0,0,0,0,1084256812,1084256808) = 0 + + +It looks like the test case is (a) calling clone() with non-standard flags and (b) not checking whether it failed (presumably it then hangs forever waiting for the non-existent second thread to do something). + +This has started failing because we tightened up the handling of flags in our clone() syscall implementation: instead of blithely accepting any combination of flags but only giving you the behaviour that glibc pthread_create() gives, we now fail the clone() syscall if you ask for some behaviour we can't implement with pthread_create() or fork(). In this case you've asked for CLONE_VM|CLONE_FS|CLONE_FILES, which is very nearly a pthread thread but you also need CLONE_SIGHAND|CLONE_THREAD|CLONESYSVSEM. Also you ask for CLONE_UNTRACED, which we can't support. + +It's unfortunate that this tightening up of the checks means that some programs which ask for things we can't do but don't actually care about them will no longer run, but I think this is overall better than behaving wrongly for guest programs which do care, since we can't tell which is which. + + +I suspect this happens when the sanitizer library calls StopTheWorld() (in libsanitizer/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cc in GCC sources). + +It does: + uptr tracer_pid = internal_clone( + TracerThread, tracer_stack.Bottom(), + CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_UNTRACED, + &tracer_thread_argument, nullptr /* parent_tidptr */, + nullptr /* newtls */, nullptr /* child_tidptr */); + +See: https://gcc.gnu.org/viewcvs/gcc/trunk/libsanitizer/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cc?revision=253887&view=markup#l383 + +The recent merge with the upstream libsanitizer means that stoptheworld is now enabled on arm as well, leading to this call to internal_clone(). + +This matches the comment I received on the gcc-patches list: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg02215.html +"LSan sets atexit handler that calls internal_clone function that's not supported in QEMU" + +I'm wondering why this works on aarch64? (I am also using QEMU for validations of aarch64 gcc). I mean the validations do not timeout. That being said, on aarch64 the test exits with 4 as return-code (like it did on arm with qemu-2.7) + +It also seems to me that the sanitizer lib is trying to handle the error (see if (internal_iserror(tracer_pid, &local_errno)) line 427). + +As a side note, doing +$ qemu-arm -E ASAN_OPTIONS=detect_leaks=0 blah +does not affect the execution, while +$ env ASAN_OPTIONS=detect_leaks=0 qemu-arm blah +does +(my question here being: why doesn't -E do what I want?) + + + +My bad: on aarch64 it does not "work", the test actually exits with a LeakSanitizer error message ("fatal error"). + +Using QEMU_STRACE=1 shows that clone() fails in the same way as for arm (which is expected), but apparently this error is handled better on aarch64, maybe because the internal_clone implementation is different. + + +I looked a bit more at the sanitizers source code, to understand the differences between arm and aarch64. And it turns out that on aarch64, we have: + +sanitizer_common/sanitizer_syscall_linux_aarch64.inc: +133 // Helper function used to avoid cobbler errno. +134 bool internal_iserror(uptr retval, int *rverrno) { +135 if (retval >= (uptr)-4095) { + +but on arm, in the GCC version, we use: + +sanitizer_common/sanitizer_syscall_generic.inc: +54 bool internal_iserror(uptr retval, int *rverrno) { +55 if (retval == (uptr)-1) { + +But recently (Nov 8th), the upstream sanitizer repo got a new file: + +sanitizer_common/sanitizer_syscall_linux_arm.inc +133 // Helper function used to avoid cobbler errno. +134 bool internal_iserror(uptr retval, int *rverrno) { +135 if (retval >= (uptr)-4095) { + +With that change, I now observe the same behaviour with qemu-aarch64 and qemu-arm. + + +I also looked at QEMU's code, and I am suprised that do_syscall() returns the value of errno rather than the return code from the syscall. So for instance, if clone() fails, do_syscall() returns get_errno(do_fork(...)) instead of -1. I thought the target code expects -1 in case of failure, but I'm not familiar with QEMU sources, so I'm probably missing something. + +Looking at QEMU's linux-user/syscall.c:do_fork(), I noticed several places with return -TARGET_EXXXX: should this be: +errno = TARGET_EXXX; +return -1; +instead? +But given than most (if not all) syscalls in do_syscall actually use 'ret = get_errno(xxxx)' I must be wrong :-) + + +Hmm, the do_fork() code is a bit inconsistent there. Generally in linux-user/ functions should either: +(1) return -1 with host errno set to a host errno; the caller then must use get_errno() to convert to the negative-target-errno that we need to return from do_syscall() +(2) return negative-target-errno; the caller then need do nothing + +In this case do_fork() is supposed to be using approach 2, but some code paths are using approach 1 and the callers are all using get_errno(). This hybrid approach works OK as long as none of the negative-target-errno values returned are -1 (which happens to be TARGET_EPERM for all architectures, and which we only use once in linux-user, in the sigaltstack handling). In an ideal world we'd clean this up to consistently use approach 2, but I don't think the code as it stands is actually buggy. + + +Thanks for the clarification. + +But how does the target get the actual syscall return code, if do_syscall() is supposed to return negative-target-errno? + +I mean, in general the target code will check if the syscall returned -1, and only then query errno? +But if QEMU's do_syscall returns -errno, and put this value in r0 (for arm) how is the target code supposed to work? + + +The kernel syscall ABI is "returns negative-errno". In the target code, if the libc ABI says "return -1 with errno set", it's the target libc code's job to move the return value into the TLS errno variable and return -1 from the library function. (Some target architectures have slightly weird ABIs like SPARC's "sets the carry flag on syscall failure" one; QEMU handles that kind of detail in the linux-user/main.c code which calls do_syscall().) + + +Thanks fixing my ignorance :-) + +So it really seems this is a feature, not a bug here. + +This was a bit off-topic, but I have a pending question in comment #5: +As a side note, doing +$ qemu-arm -E ASAN_OPTIONS=detect_leaks=0 blah +does not affect the execution, while +$ env ASAN_OPTIONS=detect_leaks=0 qemu-arm blah +does +(my question here being: why doesn't -E propagate ASA_OPTIONS to the target code?) + +No idea about the environment variable thing -- it seems to work for me. In a chroot: +# qemu-arm-static -E ASAN_OPTIONS=bar=baz /usr/bin/env +ASAN_OPTIONS=bar=baz +[... other things ...] + +shows that -E is being passed into the child process's environment as would be expected. + + +Ha! I think I found the problem.... the sanitizer reads /proc/self/environ, which is not where QEMU wrote the target environment... + +Thanks a lot for your support, I think you can close this report as: "it's a feature, not a bug". + + + +It would be nice if we got /proc/self/environ right, though... + + +[Expired for QEMU because there has been no activity for 60 days.] + |