instruction: 0.966 other: 0.798 device: 0.770 semantic: 0.629 socket: 0.619 network: 0.548 vnc: 0.529 graphic: 0.455 boot: 0.437 mistranslation: 0.417 assembly: 0.269 KVM: 0.262 Wrong implementation of SSE4.1 pmovzxbw and similar instructions QEMU 1.5.0 (and git version, as far as I can tell from the source code) has incorrect implementation of pmovzxbw and similar SSE4.1 instructions. The instruction zero-extends the first 8 8-bit elements of a vector to 16bit vector and puts them to another vector. The current implementation applies this operation only to the first element and zeros out the rest. To verify, compile the attached program for SSE4.1 (g++ -msse4.1 cvtint.cc). On real hardware, it produces the following output: $ ./a.out 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 On QEMU, the output is as follows: $ ./a.out 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 QEMU is invoked as: qemu-system-x86_64 \ -M pc -cpu Haswell,+sse4.1,+avx,+avx2,+fma,enforce -m 512 \ -serial stdio -no-reboot \ -kernel vmlinuz -initrd initrd.img \ -netdev user,id=user.0 -device rtl8139,netdev=user.0 -redir tcp:2222::22 \ -hda ubuntu-amd64.ext3 \ --append "rw console=tty root=/dev/sda" Looking through old bug tickets... is this still an issue with the latest version of QEMU? Or could we close this ticket nowadays? [Expired for QEMU because there has been no activity for 60 days.]