summary refs log tree commit diff stats
path: root/results/scraper/fex/2464
diff options
context:
space:
mode:
authorChristian Krinitsin <mail@krinitsin.com>2025-07-17 09:10:43 +0200
committerChristian Krinitsin <mail@krinitsin.com>2025-07-17 09:10:43 +0200
commitf2ec263023649e596c5076df32c2d328bc9393d2 (patch)
tree5dd86caab46e552bd2e62bf9c4fb1a7504a44db4 /results/scraper/fex/2464
parent63d2e9d409831aa8582787234cae4741847504b7 (diff)
downloadqemu-analysis-main.tar.gz
qemu-analysis-main.zip
add downloaded fex bug-reports HEAD main
Diffstat (limited to 'results/scraper/fex/2464')
-rw-r--r--results/scraper/fex/246411
1 files changed, 11 insertions, 0 deletions
diff --git a/results/scraper/fex/2464 b/results/scraper/fex/2464
new file mode 100644
index 000000000..8334aaef0
--- /dev/null
+++ b/results/scraper/fex/2464
@@ -0,0 +1,11 @@
+ARM64JIT: Optimize Memset operation.
+When switching over to the new IR operation, the primary concern was about changing IR semantics rather than optimizing.

+

+With inline constant on the IR operation we can detect zero being stored and optimize to `DC ZVA`, but also we can do the same optimization that compilers do and unwind the loop to 128-bit stores on the non-TSO variant. Getting closer to native memset perf is ideal.

+

+And additional step in the future will be to have an additional optimization in order to use the new MOPS instructions that ARM provides.

+![cortex-x1c](https://user-images.githubusercontent.com/1018829/222933188-00817ec0-9b10-4029-a7d8-f11da96ac853.png)

+

+[ ] - Loop unwind (Only need to have a tight 128-bit loop. Enough to saturate the store pipelines)

+[ ] - DC ZVA optimization

+[ ] - ARM MOPS implementation