semantic: 0.926 graphic: 0.883 instruction: 0.851 assembly: 0.836 device: 0.825 mistranslation: 0.805 vnc: 0.803 socket: 0.794 other: 0.750 network: 0.677 KVM: 0.582 boot: 0.506 Cache Layout wrong on many Zen Arch CPUs AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications. Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT): EPYC-IBPB AMD In windows, coreinfo reports correctly: ****---- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----**** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 On a 3-CCX CPU (3960X /w 6 cores and no SMT): EPYC-IBPB AMD in windows, coreinfo reports incorrectly: ****-- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 ----** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64 Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm. With newer Qemu there is a fix (that does behave correctly) in using the dies parameter: The problem is that the dies are exposed differently than how AMD does it natively, they are exposed to Windows as sockets, which means, you can't ever have a machine with more than two CCX (6 cores) as Windows only supports two sockets. (Should this be reported as a separate bug?) Hi, I've since confirmed that this bug also exist (as expected) on Linux guests, as well as Zen1 EPYC 7401 CPUs, to make sure this wasn't a problem with the detection of the newer consumer platform. Basically it seems (looking at the code with layman eyes) that as long as you have a topology that is dividable by 4 or 8, it will always result in the wrong topology being exposed to the guest, even when the correct option can be built (12, 24 core CPUs, although, it would be great if we could support 9 Core VM CPus as that is a reasonable use case for VMs (3 CCXs of 3 Cores for a total of 9 (or 18 SMT threads)). Pinging the author and committer of the TopoEXT feature / EPYC cpu model as they should probably know best how to solve this issue. This is the commit I am referencing: https://git.qemu.org/?p=qemu.git;a=commitdiff;h=8f4202fb1080f86958782b1fca0bf0279f67d136 Damir, We normally test Linux guests here. Can you please give me exact qemu command line. Even the SMP parameters(sockets,cores,threads,dies) will also work. I will try to recreate it locally first. Give me example what works and what does not work. I have recently sent few more patches to fix another bug. Please check if this makes any difference. https://patchwork.kernel.org/cover/11272063/ https://lore.kernel