summary refs log tree commit diff stats
path: root/results/classifier/semantic-bugs-usermode/test/1748296
blob: cab7dadc586c05eb79b852e12b87fe8ce9df5e0c (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
boot: 0.673
device: 0.671
semantic: 0.667
other: 0.644
instruction: 0.641
mistranslation: 0.640
network: 0.584
graphic: 0.581
assembly: 0.566
KVM: 0.560
socket: 0.552
vnc: 0.462

TCG throws Invalid Opcode when executing x86 BMI shlx instruction

I am unable to use BMI in my project when running under TCG. I narrowed the problem down to incorrect instruction decoding for BMI instructions (which have a 2 byte VEX prefix). The gen_sse function in translate.c reaches the goto label do_0f_38_fx, but b does not equal 0x1f7, 0x2f7, or 0x3f7, so the switch takes the default path and raises an invalid opcode exception.

The code executes correctly and passes the test under KVM.

I have created a complete repro here: https://github.com/doug65536/qemu-bmibug

The makefile has the following utility targets:

debug-kvm: Build and run the VM using KVM and wait for gdbstub attach

run: Run the test case with TCG, make fails if the test fails. (It will fail)

run-kvm: Run the test case with KVM, make fails if the test fails. (It will succeed)

debug: Build and run the VM with TCG and wait for GDB attach

attach-gdb: Run GDB and attach to KVM gdbstub

The VM runs with -cpu max. CPUID reports support for BMI, BMI2, and ABM.

You can quickly verify the issue by executing `make run-kvm` to confirm that KVM passes, then `make run` to confirm that TCG fails.

I believe the bug affects other BMI, BMI2, and ABM instructions, but I have only completely verified incorrect execution of SHLX.

I hit this today on QEMU head. The problem appears to crop up when:

  1. Decoding a VEX instruction (see [1]) that uses the 0x66 mandatory
     prefix; and

  2. The OSFXSR bit in CR4 is clear (that is, SSE is disabled)

This means that x86_64 instructions such as:

     c4 e2 f9 f7 c0                shlxq   %rax, %rax, %rax

fail. Similar instructions the use a different mandatory prefix
(such as `shrxq`, which uses prefix 0xf2) work fine.

Most operating systems presumably set the OSFXSR bit fairly early on, which I
guess is why this problem isn't likely to be seen except in low-level or early
boot code.

The culprit appears to be the block of code in `gen_sse` [2]:

    if (is_xmm
        && !(s->flags & HF_OSFXSR_MASK)
        && ((b != 0x38 && b != 0x3a) || (s->prefix & PREFIX_DATA))) {
        goto unknown_op;
    }

Removing the check `... || (s->prefix & DATA_DATA)` causes QEMU to correctly
translate the instruction, and allows doug16k's test above to pass.

I must confess, I'm not clear what this clause was testing for. My best guess
is that early code (e.g. 4242b1bd8ac) required it to avoid accessing invalid
opcode tables, but we seem to be handling that more gracefully today (e.g.
[3]), so I suspect it is no longer needed.

[1]: https://wiki.osdev.org/X86-64_Instruction_Encoding#VEX.2FXOP_opcodes
[2]: https://github.com/qemu/qemu/blob/6b63d126121a9535784003924fcb67f574a6afc0/target/i386/tcg/translate.c#L3078
[3]: https://github.com/qemu/qemu/blob/6b63d126121a9535784003924fcb67f574a6afc0/target/i386/tcg/translate.c#L3696-L3700


The QEMU project is currently considering to move its bug tracking to
another system. For this we need to know which bugs are still valid
and which could be closed already. Thus we are setting older bugs to
"Incomplete" now.

If you still think this bug report here is valid, then please switch
the state back to "New" within the next 60 days, otherwise this report
will be marked as "Expired". Or please mark it as "Fix Released" if
the problem has been solved with a newer version of QEMU already.

Thank you and sorry for the inconvenience.

Ah, never mind, posted the text before seeing that it still affects people in 2021 ... so I'm not changing this bug to "Incomplete". Sorry for the noise.

Fix has been included here:
https://gitlab.com/qemu-project/qemu/-/commit/51909241d26fe6fe18a