diff options
Diffstat (limited to 'results/scraper/fex/4480')
| -rw-r--r-- | results/scraper/fex/4480 | 11 |
1 files changed, 11 insertions, 0 deletions
diff --git a/results/scraper/fex/4480 b/results/scraper/fex/4480 new file mode 100644 index 000000000..632946eea --- /dev/null +++ b/results/scraper/fex/4480 @@ -0,0 +1,11 @@ +Optimize multiple stack push/pop in to ldp/stp +An optimization that we can do since we disable TSO emulation on stack accesses is to merge pushes and pops in to ldp and stp respectively. +This would allow performance improvements in almost every application since any function doing real work is going to push and pop the stack. + +An example that we already have in instrcountci is the `Sekiro spill block` function. This pushes and pops 8 GPRs to the stack coming in to the function and then eight again coming out. This is a total of 16 instructions today, but can be reduced to 8 instead. Just enough to reduce the full block size from 126 ARM instructions down to 118, one instruction less than the original x86 code count of 119. + +This is fairly trivial to implement, and should give a decent performance improvement since pair loadstore instructions are just as fast as non-paired on any ARM cpu cores that matter. + +Creating this tracking issue so we don't forget about it. + +Additional note, FEAT_LRCPC3 adds pair-wise push/pop instructions with x86 TSO memory model semantics. While we don't use TSO semantics on stack accesses today (Today's hardware aside from Apple's isn't good enough), once that extension is in any shipping hardware we should add support for it and a TSO toggle for stack accesses. Although this extension only added support for 32-bit and 64-bit pushes and pops, so 8-bit and 16-bit is left out. \ No newline at end of file |