blob: 8334aaef06f08d07b896c58f74595e3b8fe797c9 (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
|
ARM64JIT: Optimize Memset operation.
When switching over to the new IR operation, the primary concern was about changing IR semantics rather than optimizing.
With inline constant on the IR operation we can detect zero being stored and optimize to `DC ZVA`, but also we can do the same optimization that compilers do and unwind the loop to 128-bit stores on the non-TSO variant. Getting closer to native memset perf is ideal.
And additional step in the future will be to have an additional optimization in order to use the new MOPS instructions that ARM provides.

[ ] - Loop unwind (Only need to have a tight 128-bit loop. Enough to saturate the store pipelines)
[ ] - DC ZVA optimization
[ ] - ARM MOPS implementation
|