x86: 0.950 register: 0.943 architecture: 0.933 graphic: 0.904 ppc: 0.886 TCG: 0.883 performance: 0.839 mistranslation: 0.822 semantic: 0.810 files: 0.792 device: 0.773 user-level: 0.750 peripherals: 0.748 arm: 0.681 kernel: 0.666 permissions: 0.645 PID: 0.638 debug: 0.634 assembly: 0.619 hypervisor: 0.611 network: 0.607 vnc: 0.570 VMM: 0.512 risc-v: 0.511 socket: 0.508 virtual: 0.492 i386: 0.438 boot: 0.419 KVM: 0.401 -------------------- x86: 0.981 TCG: 0.059 hypervisor: 0.052 debug: 0.044 files: 0.034 virtual: 0.013 assembly: 0.010 PID: 0.009 register: 0.007 semantic: 0.006 kernel: 0.005 user-level: 0.005 risc-v: 0.005 socket: 0.004 device: 0.004 performance: 0.004 network: 0.004 architecture: 0.003 vnc: 0.003 boot: 0.003 permissions: 0.002 graphic: 0.002 peripherals: 0.001 ppc: 0.001 mistranslation: 0.001 VMM: 0.001 KVM: 0.000 i386: 0.000 arm: 0.000 x86_64 cmpxchg behavior in qemu tcg does not match the real CPU QEMU version: 1214d55d1c (HEAD, origin/master, origin/HEAD) Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into staging Consider the following little program: $ cat 1.c #include int main() { int mem = 0x12345678; register long rax asm("rax") = 0x1234567812345678; register int edi asm("edi") = 0x77777777; asm("cmpxchg %[edi],%[mem]" : [ mem ] "+m"(mem), [ rax ] "+r"(rax) : [ edi ] "r"(edi)); long rax2 = rax; printf("rax2 = %lx\n", rax2); } According to the Intel Manual, cmpxchg should not touch the accumulator in case the values are equal, which is indeed the case on the real CPU: $ gcc 1.c $ ./a.out rax2 = 1234567812345678 However, QEMU appears to zero extend EAX to RAX: $ qemu-x86_64 ./a.out rax2 = 12345678 This is also the case for lock cmpxchg. Found in BPF development context: https://lore