1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
|
Cache Layout wrong on many Zen Arch CPUs
AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications.
Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT):
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC-IBPB</model>
<vendor>AMD</vendor>
<topology sockets='1' cores='8' threads='1'/>
In windows, coreinfo reports correctly:
****---- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64
----**** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64
On a 3-CCX CPU (3960X /w 6 cores and no SMT):
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC-IBPB</model>
<vendor>AMD</vendor>
<topology sockets='1' cores='6' threads='1'/>
in windows, coreinfo reports incorrectly:
****-- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64
----** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64
Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm.
With newer Qemu there is a fix (that does behave correctly) in using the dies parameter:
<qemu:arg value='cores=3,threads=1,dies=2,sockets=1'/>
The problem is that the dies are exposed differently than how AMD does it natively, they are exposed to Windows as sockets, which means, you can't ever have a machine with more than two CCX (6 cores) as Windows only supports two sockets. (Should this be reported as a separate bug?)
|