instruction: 0.966
other: 0.798
device: 0.770
semantic: 0.629
socket: 0.619
network: 0.548
vnc: 0.529
graphic: 0.455
boot: 0.437
mistranslation: 0.417
assembly: 0.269
KVM: 0.262

Wrong implementation of SSE4.1 pmovzxbw and similar instructions

QEMU 1.5.0 (and git version, as far as I can tell from the source code) has incorrect implementation of pmovzxbw and similar SSE4.1 instructions. The instruction zero-extends the first 8 8-bit elements of a vector to 16bit vector and puts them to another vector. The current implementation applies this operation only to the first element and zeros out the rest.

To verify, compile the attached program for SSE4.1 (g++ -msse4.1 cvtint.cc). On real hardware, it produces the following output:

$ ./a.out
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0

On QEMU, the output is as follows:

$ ./a.out
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

QEMU is invoked as:

qemu-system-x86_64 \
    -M pc -cpu Haswell,+sse4.1,+avx,+avx2,+fma,enforce -m 512 \
    -serial stdio -no-reboot \
    -kernel vmlinuz -initrd initrd.img \
    -netdev user,id=user.0 -device rtl8139,netdev=user.0  -redir tcp:2222::22 \
    -hda ubuntu-amd64.ext3 \
    --append "rw console=tty root=/dev/sda"


Looking through old bug tickets... is this still an issue with the latest version of QEMU? Or could we close this ticket nowadays?

[Expired for QEMU because there has been no activity for 60 days.]