results/classifier/118/architecture/1023


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120

architecture: 0.828
kernel: 0.757
virtual: 0.754
performance: 0.739
graphic: 0.725
PID: 0.718
device: 0.712
boot: 0.690
network: 0.672
debug: 0.671
permissions: 0.670
semantic: 0.641
ppc: 0.632
x86: 0.620
hypervisor: 0.618
socket: 0.617
files: 0.614
arm: 0.590
KVM: 0.576
user-level: 0.563
assembly: 0.562
register: 0.552
risc-v: 0.508
TCG: 0.499
vnc: 0.441
peripherals: 0.413
i386: 0.377
VMM: 0.346
mistranslation: 0.303
--------------------
x86: 0.968
debug: 0.965
kernel: 0.932
KVM: 0.928
TCG: 0.914
virtual: 0.913
hypervisor: 0.884
user-level: 0.240
performance: 0.202
files: 0.125
architecture: 0.089
VMM: 0.059
register: 0.048
boot: 0.043
PID: 0.035
risc-v: 0.025
semantic: 0.023
device: 0.020
ppc: 0.017
assembly: 0.009
socket: 0.007
network: 0.004
graphic: 0.004
permissions: 0.003
peripherals: 0.003
i386: 0.002
vnc: 0.001
mistranslation: 0.001
arm: 0.000

TCG & LA57 (5-level page tables) causes intermittent triple fault when setting %CR3
Description of problem:
Enabling LA57 (5-level page tables) + TCG causes an intermittent triple fault when the kernel loads %cr3 in preparation for jumping to protected mode.  It is quite rare, only happening on perhaps 1 in 20 runs.

The observed behaviour for most users is that we see SeaBIOS messages, and no kernel messages, and qemu exits.  (Triple fault in TCG code causes qemu to reset the virtual CPU, and we are using `-no-reboot` so that causes qemu to exit).

There's a simple reproducer below.  I enabled qemu -d options to capture the full instruction traces which can be found here:

http://oirase.annexia.org/tmp/fullexec-failed (error case)
http://oirase.annexia.org/tmp/fullexec-good (successful run)

I also added an `abort()` into qemu after the triple fault message in order to capture a stack trace, which can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=2082806#c8
Steps to reproduce:
1. Save the following script into a file, adjusting the two variables at the top as appropriate:

```
#!/bin/bash -

# Point this to any kernel in /boot:
kernel=/boot/vmlinuz-4.18.0-387.el8.x86_64

# Point this to qemu:
qemu=/usr/libexec/qemu-kvm
#qemu=/home/rjones/d/qemu/build/qemu-system-x86_64

log=/tmp/log

cpu=max
#cpu=max,la57=off

while $qemu \
    -global virtio-blk-pci.scsi=off \
    -no-user-config \
    -nodefaults \
    -display none \
    -machine accel=tcg,graphics=off \
    -cpu "$cpu" \
    -m 2048 \
    -no-reboot \
    -rtc driftfix=slew \
    -no-hpet \
    -global kvm-pit.lost_tick_policy=discard \
    -kernel $kernel \
    -object rng-random,filename=/dev/urandom,id=rng0 \
    -device virtio-rng-pci,rng=rng0 \
    -device virtio-serial-pci \
    -serial stdio \
    -append "panic=1 console=ttyS0" >& $log &&
    grep -sq "Linux version" $log; do
    echo -n .
done
```

2. Run the script.  It will run qemu many times, checking that it reaches the kernel.
3. Eventually the script may exit. 
4. Check `/tmp/log` and see if you only see SeaBIOS messages.
5. Modify the script to add `-cpu max,la57=off` and the error will stop happening.
Additional information:
Downstream bug report: https://bugzilla.redhat.com/show_bug.cgi?id=2082806
LA57 was enabled here: https://gitlab.com/qemu-project/qemu/-/issues/661