diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-07-17 09:10:43 +0200 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-07-17 09:10:43 +0200 |
| commit | f2ec263023649e596c5076df32c2d328bc9393d2 (patch) | |
| tree | 5dd86caab46e552bd2e62bf9c4fb1a7504a44db4 /results/scraper/fex/3791 | |
| parent | 63d2e9d409831aa8582787234cae4741847504b7 (diff) | |
| download | qemu-analysis-main.tar.gz qemu-analysis-main.zip | |
Diffstat (limited to 'results/scraper/fex/3791')
| -rw-r--r-- | results/scraper/fex/3791 | 27 |
1 files changed, 27 insertions, 0 deletions
diff --git a/results/scraper/fex/3791 b/results/scraper/fex/3791 new file mode 100644 index 000000000..7eed39dba --- /dev/null +++ b/results/scraper/fex/3791 @@ -0,0 +1,27 @@ +AVX128: Optimize 256-bit contiguous masked loadstore with immediate offset +Currently the SVE contiguous masked load splits the instruction in to two with an add in the middle for address generation. +We can remove that add inbetween the two halves because the `ld1w` and `ld1d` instructions support a short immediate offset multiplied by VL. + +```json + "vmaskmovps ymm0, ymm1, [rax]": { + "ExpectedInstructionCount": 9, + "Comment": [ + "Map 2 0b01 0x2c 256-bit" + ], + "ExpectedArm64ASM": [ + "ldr q2, [x28, #32]", + "mrs x20, nzcv", + "cmplt p0.s, p6/z, z17.s, #0", + "ld1w {z16.s}, p0/z, [x4]", + "add x21, x4, #0x10 (16)", <------------- This add + "cmplt p0.s, p6/z, z2.s, #0", + "ld1w {z2.s}, p0/z, [x21]", <-------------- Can be rolled in to this instruction + "msr nzcv, x20", + "str q2, [x28, #16]" + ] + }, +``` + +Only saves a single instruction for both 256-bit `vmaskmovps` and `vmaskmovpd` but is fairly trivial do do. + +Same thing can be improved on the store side as well. \ No newline at end of file |