diff options
Diffstat (limited to 'results/classifier/118/none/1899082')
| -rw-r--r-- | results/classifier/118/none/1899082 | 165 |
1 files changed, 165 insertions, 0 deletions
diff --git a/results/classifier/118/none/1899082 b/results/classifier/118/none/1899082 new file mode 100644 index 000000000..0382c54bc --- /dev/null +++ b/results/classifier/118/none/1899082 @@ -0,0 +1,165 @@ +user-level: 0.681 +risc-v: 0.626 +mistranslation: 0.552 +virtual: 0.498 +permissions: 0.484 +register: 0.479 +semantic: 0.475 +debug: 0.474 +boot: 0.474 +PID: 0.463 +KVM: 0.456 +x86: 0.455 +assembly: 0.453 +architecture: 0.452 +graphic: 0.444 +arm: 0.437 +device: 0.425 +ppc: 0.425 +performance: 0.422 +kernel: 0.422 +peripherals: 0.420 +socket: 0.388 +TCG: 0.378 +VMM: 0.368 +vnc: 0.342 +hypervisor: 0.341 +files: 0.301 +i386: 0.246 +network: 0.236 + +ReplayKernel.test_x86_64_pc fails intermittently + +Even though this acceptance test is already skipped on GitLab CI, the intermittent failures can be seen on other environments too. + +The record phase works fine, but during the replay phase fail to finish booting the kernel (until the expected place): + +16:34:47 DEBUG| [ 0.034498] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 +16:34:47 DEBUG| [ 0.034790] Spectre V2 : Spectre mitigation: LFENCE not serializing, switching to generic retpoline +16:34:47 DEBUG| [ 0.035093] Spectre V2 : Mitigation: Full generic retpoline +16:34:47 DEBUG| [ 0.035347] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch +16:34:47 DEBUG| [ 0.035667] +16:36:02 ERROR| +16:36:02 ERROR| Reproduced traceback from: /home/cleber/src/avocado/avocado/avocado/core/test.py:767 +16:36:02 ERROR| Traceback (most recent call last): +16:36:02 ERROR| File "/var/lib/users/cleber/build/qemu/tests/acceptance/replay_kernel.py", line 92, in test_x86_64_pc +16:36:02 ERROR| self.run_rr(kernel_path, kernel_command_line, console_pattern, shift=5) +16:36:02 ERROR| File "/var/lib/users/cleber/build/qemu/tests/acceptance/replay_kernel.py", line 73, in run_rr +16:36:02 ERROR| False, shift, args, replay_path) +16:36:02 ERROR| File "/var/lib/users/cleber/build/qemu/tests/acceptance/replay_kernel.py", line 55, in run_vm +16:36:02 ERROR| self.wait_for_console_pattern(console_pattern, vm) +16:36:02 ERROR| File "/var/lib/users/cleber/build/qemu/tests/acceptance/boot_linux_console.py", line 53, in wait_for_console_pattern +16:36:02 ERROR| vm=vm) +16:36:02 ERROR| File "/var/lib/users/cleber/build/qemu/tests/acceptance/avocado_qemu/__init__.py", line 130, in wait_for_console_pattern +16:36:02 ERROR| _console_interaction(test, success_message, failure_message, None, vm=vm) +16:36:02 ERROR| File "/var/lib/users/cleber/build/qemu/tests/acceptance/avocado_qemu/__init__.py", line 82, in _console_interaction +16:36:02 ERROR| msg = console.readline().strip() +16:36:02 ERROR| File "/usr/lib64/python3.7/socket.py", line 575, in readinto +16:36:02 ERROR| def readinto(self, b): +16:36:02 ERROR| File "/home/cleber/src/avocado/avocado/avocado/plugins/runner.py", line 77, in sigterm_handler +16:36:02 ERROR| raise RuntimeError("Test interrupted by SIGTERM") +16:36:02 ERROR| RuntimeError: Test interrupted by SIGTERM +16:36:02 ERROR| + +On my workstation, I can replicate the failure roughly once every 50 runs. + +I'm actually able to increase the reproducibility to ~ 90% when running 8 of those tests simultaneously (on an 8 core system). + +I can 100% reproduce it with the following command line: +taskset 1 tests/venv/bin/avocado --show=app,console,replay run -t arch:x86_64 ../tests/acceptance/replay_kernel.py + +But I can't reproduce it outside the avocado toolchain, by running qemu directly. + +I traced this bug to hw/char/serial.c/serial_ioport_read + +Bug disappears when I add qemu_log("serial_ioport_read %x %x\n", (int)addr, ret); into the end of this function. + +I suppose that there is avocado (or socket) io synchronization problem, because running the same test without avocado works normally. + +I could reproduce this without Avocado: + +-- +#!/bin/bash + +SOCKET="/tmp/qemu.sock" +VMLINUZ_PATH="/tmp/vmlinuz" +REPLAY_FILE="/tmp/replay.bin" + +function run_and_wait() { + /usr/bin/qemu-system-x86_64 -display none \ + -vga none \ + -machine pc \ + -chardev socket,id=console,path=${SOCKET},server=on,wait=off \ + -serial chardev:console \ + -icount shift=5,rr=$1,rrfile=${REPLAY_FILE} \ + -kernel ${VMLINUZ_PATH} \ + -append "printk.time=1 panic=-1 console=ttyS0" -net none -no-reboot & + # Wait a little for the socket creation + sleep 1 + socat - UNIX-CONNECT:${SOCKET} + echo $? +} + + +run_and_wait "record" +echo "Was this (record) finished?" + +run_and_wait "replay" +echo "Was this (replay) finished?" +-- + +The second echo is never displayed and my console stops here: + +--- +[ 0.036667] Speculative Store Bypass: Vulnerable +[ 0.256061] random: fast init done +[ 0.308652] Freeing SMP alternatives memory: 36K +--- + +I was able to run the reproducer from Beraldo Leal, and achieved the same results. + +Additionally, I got the following output from QEMU: + + qemu-system-x86_64: Missing character write event in the replay log + +Which seems to come from replay/replay-char.c:158. + +I then tested the record and replay separately, and found that, while the above message is given and QEMU exits at the replay phase, the amount of CPUs given to the *record* stage actually make the difference. When the recording is done with a single CPU, the replay log seems to be written with the "missing character write event". + +Beraldo, thanks for the script. +However, I can't reproduce the bug using it. I've got the newest QEMU from the repository, and it never hangs in this scenario. + +But there are some problems in other runs with more complex tasks. + +The QEMU project is currently moving its bug tracking to another system. +For this we need to know which bugs are still valid and which could be +closed already. Thus we are setting the bug state to "Incomplete" now. + +If the bug has already been fixed in the latest upstream version of QEMU, +then please close this ticket as "Fix released". + +If it is not fixed yet and you think that this bug report here is still +valid, then you have two options: + +1) If you already have an account on gitlab.com, please open a new ticket +for this problem in our new tracker here: + + https://gitlab.com/qemu-project/qemu/-/issues + +and then close this ticket here on Launchpad (or let it expire auto- +matically after 60 days). Please mention the URL of this bug ticket on +Launchpad in the new ticket on GitLab. + +2) If you don't have an account on gitlab.com and don't intend to get +one, but still would like to keep this ticket opened, then please switch +the state back to "New" or "Confirmed" within the next 60 days (other- +wise it will get closed as "Expired"). We will then eventually migrate +the ticket automatically to the new system (but you won't be the reporter +of the bug in the new system and thus you won't get notified on changes +anymore). + +Thank you and sorry for the inconvenience. + + +[Expired for QEMU because there has been no activity for 60 days.] + |