summary refs log tree commit diff stats
path: root/results/classifier/gemma3:12b/hypervisor/1856335
diff options
context:
space:
mode:
authorChristian Krinitsin <mail@krinitsin.com>2025-07-03 07:27:52 +0000
committerChristian Krinitsin <mail@krinitsin.com>2025-07-03 07:27:52 +0000
commitd0c85e36e4de67af628d54e9ab577cc3fad7796a (patch)
treef8f784b0f04343b90516a338d6df81df3a85dfa2 /results/classifier/gemma3:12b/hypervisor/1856335
parent7f4364274750eb8cb39a3e7493132fca1c01232e (diff)
downloademulator-bug-study-d0c85e36e4de67af628d54e9ab577cc3fad7796a.tar.gz
emulator-bug-study-d0c85e36e4de67af628d54e9ab577cc3fad7796a.zip
add deepseek and gemma results
Diffstat (limited to 'results/classifier/gemma3:12b/hypervisor/1856335')
-rw-r--r--results/classifier/gemma3:12b/hypervisor/185633536
1 files changed, 36 insertions, 0 deletions
diff --git a/results/classifier/gemma3:12b/hypervisor/1856335 b/results/classifier/gemma3:12b/hypervisor/1856335
new file mode 100644
index 00000000..525d3b20
--- /dev/null
+++ b/results/classifier/gemma3:12b/hypervisor/1856335
@@ -0,0 +1,36 @@
+
+Cache Layout wrong on many Zen Arch CPUs
+
+AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems to always map Cache ass if it was an 4-Core per CCX CPU, which is incorrect, and costs upwards 30% performance (more realistically 10%) in L3 Cache Layout aware applications.
+
+Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT): 
+
+  <cpu mode='custom' match='exact' check='full'>
+    <model fallback='forbid'>EPYC-IBPB</model>
+    <vendor>AMD</vendor>
+    <topology sockets='1' cores='8' threads='1'/>
+
+In windows, coreinfo reports correctly: 
+
+****----  Unified Cache 1, Level 3,    8 MB, Assoc  16, LineSize  64
+----****  Unified Cache 6, Level 3,    8 MB, Assoc  16, LineSize  64
+
+On a 3-CCX CPU (3960X /w 6 cores and no SMT):
+
+ <cpu mode='custom' match='exact' check='full'>
+    <model fallback='forbid'>EPYC-IBPB</model>
+    <vendor>AMD</vendor>
+    <topology sockets='1' cores='6' threads='1'/>
+
+in windows, coreinfo reports incorrectly: 
+
+****--  Unified Cache  1, Level 3,    8 MB, Assoc  16, LineSize  64
+----**  Unified Cache  6, Level 3,    8 MB, Assoc  16, LineSize  64
+
+
+Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm. 
+
+With newer Qemu there is a fix (that does behave correctly) in using the dies parameter: 
+ <qemu:arg value='cores=3,threads=1,dies=2,sockets=1'/>
+
+The problem is that the dies are exposed differently than how AMD does it natively, they are exposed to Windows as sockets, which means, you can't ever have a machine with more than two CCX (6 cores) as Windows only supports two sockets. (Should this be reported as a separate bug?)
\ No newline at end of file