results/classifier/deepseek-r1:32b/output/instruction/1245543


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Wrong implementation of SSE4.1 pmovzxbw and similar instructions

QEMU 1.5.0 (and git version, as far as I can tell from the source code) has incorrect implementation of pmovzxbw and similar SSE4.1 instructions. The instruction zero-extends the first 8 8-bit elements of a vector to 16bit vector and puts them to another vector. The current implementation applies this operation only to the first element and zeros out the rest.

To verify, compile the attached program for SSE4.1 (g++ -msse4.1 cvtint.cc). On real hardware, it produces the following output:

$ ./a.out
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0

On QEMU, the output is as follows:

$ ./a.out
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

QEMU is invoked as:

qemu-system-x86_64 \
    -M pc -cpu Haswell,+sse4.1,+avx,+avx2,+fma,enforce -m 512 \
    -serial stdio -no-reboot \
    -kernel vmlinuz -initrd initrd.img \
    -netdev user,id=user.0 -device rtl8139,netdev=user.0  -redir tcp:2222::22 \
    -hda ubuntu-amd64.ext3 \
    --append "rw console=tty root=/dev/sda"