summary refs log tree commit diff stats
path: root/results/scraper/fex/4480
diff options
context:
space:
mode:
authorChristian Krinitsin <mail@krinitsin.com>2025-07-17 09:10:43 +0200
committerChristian Krinitsin <mail@krinitsin.com>2025-07-17 09:10:43 +0200
commitf2ec263023649e596c5076df32c2d328bc9393d2 (patch)
tree5dd86caab46e552bd2e62bf9c4fb1a7504a44db4 /results/scraper/fex/4480
parent63d2e9d409831aa8582787234cae4741847504b7 (diff)
downloadqemu-analysis-main.tar.gz
qemu-analysis-main.zip
add downloaded fex bug-reports HEAD main
Diffstat (limited to 'results/scraper/fex/4480')
-rw-r--r--results/scraper/fex/448011
1 files changed, 11 insertions, 0 deletions
diff --git a/results/scraper/fex/4480 b/results/scraper/fex/4480
new file mode 100644
index 000000000..632946eea
--- /dev/null
+++ b/results/scraper/fex/4480
@@ -0,0 +1,11 @@
+Optimize multiple stack push/pop in to ldp/stp
+An optimization that we can do since we disable TSO emulation on stack accesses is to merge pushes and pops in to ldp and stp respectively.
+This would allow performance improvements in almost every application since any function doing real work is going to push and pop the stack.
+
+An example that we already have in instrcountci is the `Sekiro spill block` function. This pushes and pops 8 GPRs to the stack coming in to the function and then eight again coming out. This is a total of 16 instructions today, but can be reduced to 8 instead. Just enough to reduce the full block size from 126 ARM instructions down to 118, one instruction less than the original x86 code count of 119.
+
+This is fairly trivial to implement, and should give a decent performance improvement since pair loadstore instructions are just as fast as non-paired on any ARM cpu cores that matter.
+
+Creating this tracking issue so we don't forget about it.
+
+Additional note, FEAT_LRCPC3 adds pair-wise push/pop instructions with x86 TSO memory model semantics. While we don't use TSO semantics on stack accesses today (Today's hardware aside from Apple's isn't good enough), once that extension is in any shipping hardware we should add support for it and a TSO toggle for stack accesses. Although this extension only added support for 32-bit and 64-bit pushes and pops, so 8-bit and 16-bit is left out. 
\ No newline at end of file