diff options
Diffstat (limited to 'results/classifier/118/all/53568181')
| -rw-r--r-- | results/classifier/118/all/53568181 | 103 |
1 files changed, 103 insertions, 0 deletions
diff --git a/results/classifier/118/all/53568181 b/results/classifier/118/all/53568181 new file mode 100644 index 00000000..935dd2b2 --- /dev/null +++ b/results/classifier/118/all/53568181 @@ -0,0 +1,103 @@ +debug: 0.968 +permissions: 0.965 +performance: 0.948 +peripherals: 0.947 +architecture: 0.945 +virtual: 0.944 +hypervisor: 0.943 +semantic: 0.943 +graphic: 0.940 +user-level: 0.939 +PID: 0.938 +assembly: 0.936 +device: 0.936 +vnc: 0.935 +x86: 0.932 +kernel: 0.931 +register: 0.930 +network: 0.925 +TCG: 0.921 +arm: 0.918 +KVM: 0.917 +risc-v: 0.907 +ppc: 0.903 +VMM: 0.901 +files: 0.890 +boot: 0.876 +socket: 0.875 +mistranslation: 0.854 +i386: 0.634 + +[BUG] x86/PAT handling severely crippled AMD-V SVM KVM performance + +Hi, I maintain an out-of-tree 3D APIs pass-through QEMU device models at +https://github.com/kjliew/qemu-3dfx +that provide 3D acceleration for legacy +32-bit Windows guests (Win98SE, WinME, Win2k and WinXP) with the focus on +playing old legacy games from 1996-2003. It currently supports the now-defunct +3Dfx propriety API called Glide and an alternative OpenGL pass-through based on +MESA implementation. + +The basic concept of both implementations create memory-mapped virtual +interfaces consist of host/guest shared memory with guest-push model instead of +a more common host-pull model for typical QEMU device model implementation. +Guest uses shared memory as FIFOs for drawing commands and data to bulk up the +operations until serialization event that flushes the FIFOs into host. This +achieves extremely good performance since virtual CPUs are fast with hardware +acceleration (Intel VT/AMD-V) and reduces the overhead of frequent VMEXITs to +service the device emulation. Both implementations work on Windows 10 with WHPX +and HAXM accelerators as well as KVM in Linux. + +On Windows 10, QEMU WHPX implementation does not sync MSR_IA32_PAT during +host/guest states sync. There is no visibility into the closed-source WHPX on +how things are managed behind the scene, but from measuring performance figures +I can conclude that it didn't handle the MSR_IA32_PAT correctly for both Intel +and AMD. Call this fair enough, if you will, it didn't flag any concerns, in +fact games such as Quake2 and Quake3 were still within playable frame rate of +40~60FPS on Win2k/XP guest. Until the same games were run on Win98/ME guest and +the frame rate blew off the roof (300~500FPS) on the same CPU and GPU. In fact, +the later seemed to be more inlined with runnng the games bare-metal with vsync +off. + +On Linux (at the time of writing kernel 5.6.7/Mesa 20.0), the difference +prevailed. Intel CPUs (and it so happened that I was on laptop with Intel GPU), +the VMX-based kvm_intel got it right while SVM-based kvm_amd did not. +To put this in simple exaggeration, an aging Core i3-4010U/HD Graphics 4400 +(Haswell GT2) exhibited an insane performance in Quake2/Quake3 timedemos that +totally crushed more recent AMD Ryzen 2500U APU/Vega 8 Graphics and AMD +FX8300/NVIDIA GT730 on desktop. Simply unbelievable! + +It turned out that there was something to do with AMD-V NPT. By loading kvm_amd +with npt=0, AMD Ryzen APU and FX8300 regained a huge performance leap. However, +AMD NPT issue with KVM was supposedly fixed in 2017 kernel commits. NPT=0 would +actually incur performance loss for VM due to intervention required by +hypervisors to maintain the shadow page tables. Finally, I was able to find the +pointer that pointed to MSR_IA32_PAT register. By updating the MSR_IA32_PAT to +0x0606xxxx0606xxxxULL, AMD CPUs now regain their rightful performance without +taking the hit of NPT=0 for Linux KVM. Taking the same solution into Windows, +both Intel and AMD CPUs no longer require Win98/ME guest to unleash the full +performance potentials and performance figures based on games measured on WHPX +were not very far behind Linux KVM. + +So I guess the problem lies in host/guest shared memory regions mapped as +uncacheable from virtual CPU perspective. As virtual CPUs now completely execute +in hardware context with x86 hardware virtualiztion extensions, the cacheability +of memory types would severely impact the performance on guests. WHPX didn't +handle it for both Intel EPT and AMD NPT, but KVM seems to do it right for Intel +EPT. I don't have the correct fix for QEMU. But what I can do for my 3D APIs +pass-through device models is to implement host-side hooks to reprogram and +restore MSR_IA32_PAT upon activation/deactivation of the 3D APIs. Perhaps there +is also a better solution of having the proper kernel drivers for virtual +interfaces to manage the memory types of host/guest shared memory in kernel +space, but to do that and the needs of Microsoft tools/DDKs, I will just forget +it. The guest stubs uses the same kernel drivers included in 3Dfx drivers for +memory mapping and the virtual interfaces remain driver-less from Windows OS +perspective. Considering the current state of halting progress for QEMU native +virgil3D to support Windows OS, I am just being pragmatic. I understand that +QEMU virgil3D will eventually bring 3D acceleration for Windows guests, but I do +not expect anything to support legacy 32-bit Windows OSes which have out-grown +their commercial usefulness. + +Regards, +KJ Liew + |