diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-07-17 09:10:43 +0200 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-07-17 09:10:43 +0200 |
| commit | f2ec263023649e596c5076df32c2d328bc9393d2 (patch) | |
| tree | 5dd86caab46e552bd2e62bf9c4fb1a7504a44db4 /results/scraper/fex/2464 | |
| parent | 63d2e9d409831aa8582787234cae4741847504b7 (diff) | |
| download | qemu-analysis-main.tar.gz qemu-analysis-main.zip | |
Diffstat (limited to 'results/scraper/fex/2464')
| -rw-r--r-- | results/scraper/fex/2464 | 11 |
1 files changed, 11 insertions, 0 deletions
diff --git a/results/scraper/fex/2464 b/results/scraper/fex/2464 new file mode 100644 index 000000000..8334aaef0 --- /dev/null +++ b/results/scraper/fex/2464 @@ -0,0 +1,11 @@ +ARM64JIT: Optimize Memset operation. +When switching over to the new IR operation, the primary concern was about changing IR semantics rather than optimizing. + +With inline constant on the IR operation we can detect zero being stored and optimize to `DC ZVA`, but also we can do the same optimization that compilers do and unwind the loop to 128-bit stores on the non-TSO variant. Getting closer to native memset perf is ideal. + +And additional step in the future will be to have an additional optimization in order to use the new MOPS instructions that ARM provides. + + +[ ] - Loop unwind (Only need to have a tight 128-bit loop. Enough to saturate the store pipelines) +[ ] - DC ZVA optimization +[ ] - ARM MOPS implementation |