summary refs log tree commit diff stats
path: root/results/scraper/fex/3467
blob: 8fd72c862ddc8cc03a250dc12baa92df8328f0f0 (plain) (blame)
1
2
3
4
5
6
7
8
9
Optimize rep movs and rep stos
Decent number of games spent a decent amount of time in memcpy and memset.

Stray's render thread spends 3% in a single block `rep movsb` memcpy.
Metal Gear Rising Revengeance top block is a `rep movsd` for 32-bytes (8 elements).

`rep movsb` is the one that is worst off since we move byte-by-byte, achieving only 2.7GB/s instead of 17.5GB/s in my microbench.

Probably want to implement ASIMD and SVE optimized variants in the dispatcher that we can dynamically branch to. For FEAT_MOPS devices we can keep the copy/set inline whenever we implement that.