summary refs log tree commit diff stats
path: root/results/classifier/deepseek-2-tmp/output/hypervisor/1856335
blob: 525d3b2050ec3daea3d25b683e953f80b10020ed (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Cache Layout wrong on many Zen Arch CPUs

AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications.

Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT): 

  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <vendor>AMD</vendor>
    <topology sockets='1' cores='8' threads='1'/>

In windows, coreinfo reports correctly: 

****----  Unified Cache 1, Level 3,    8 MB, Assoc  16, LineSize  64
----****  Unified Cache 6, Level 3,    8 MB, Assoc  16, LineSize  64

On a 3-CCX CPU (3960X /w 6 cores and no SMT):

 <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <vendor>AMD</vendor>
    <topology sockets='1' cores='6' threads='1'/>

in windows, coreinfo reports incorrectly: 

****--  Unified Cache  1, Level 3,    8 MB, Assoc  16, LineSize  64
----**  Unified Cache  6, Level 3,    8 MB, Assoc  16, LineSize  64


Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm. 

With newer Qemu there is a fix (that does behave correctly) in using the dies parameter: 
 <qemu:arg value='cores=3,threads=1,dies=2,sockets=1'/>

The problem is that the dies are exposed differently than how AMD does it natively, they are exposed to Windows as sockets, which means, you can't ever have a machine with more than two CCX (6 cores) as Windows only supports two sockets. (Should this be reported as a separate bug?)