summary refs log tree commit diff stats
path: root/results/scraper/fex/2464
diff options
context:
space:
mode:
Diffstat (limited to 'results/scraper/fex/2464')
-rw-r--r--results/scraper/fex/246411
1 files changed, 11 insertions, 0 deletions
diff --git a/results/scraper/fex/2464 b/results/scraper/fex/2464
new file mode 100644
index 000000000..8334aaef0
--- /dev/null
+++ b/results/scraper/fex/2464
@@ -0,0 +1,11 @@
+ARM64JIT: Optimize Memset operation.
+When switching over to the new IR operation, the primary concern was about changing IR semantics rather than optimizing.

+

+With inline constant on the IR operation we can detect zero being stored and optimize to `DC ZVA`, but also we can do the same optimization that compilers do and unwind the loop to 128-bit stores on the non-TSO variant. Getting closer to native memset perf is ideal.

+

+And additional step in the future will be to have an additional optimization in order to use the new MOPS instructions that ARM provides.

+![cortex-x1c](https://user-images.githubusercontent.com/1018829/222933188-00817ec0-9b10-4029-a7d8-f11da96ac853.png)

+

+[ ] - Loop unwind (Only need to have a tight 128-bit loop. Enough to saturate the store pipelines)

+[ ] - DC ZVA optimization

+[ ] - ARM MOPS implementation