diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-07-17 09:10:43 +0200 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-07-17 09:10:43 +0200 |
| commit | f2ec263023649e596c5076df32c2d328bc9393d2 (patch) | |
| tree | 5dd86caab46e552bd2e62bf9c4fb1a7504a44db4 /results/scraper/fex/3788 | |
| parent | 63d2e9d409831aa8582787234cae4741847504b7 (diff) | |
| download | qemu-analysis-main.tar.gz qemu-analysis-main.zip | |
Diffstat (limited to 'results/scraper/fex/3788')
| -rw-r--r-- | results/scraper/fex/3788 | 21 |
1 files changed, 21 insertions, 0 deletions
diff --git a/results/scraper/fex/3788 b/results/scraper/fex/3788 new file mode 100644 index 000000000..8f03370fd --- /dev/null +++ b/results/scraper/fex/3788 @@ -0,0 +1,21 @@ +AVX128: 256-bit stores to memory can be converted to store pairs +Example: +```json + "vmovdqa [rax], ymm0": { + "ExpectedInstructionCount": 3, + "Comment": [ + "Map 1 0b01 0x7f 128-bit" + ], + "ExpectedArm64ASM": [ + "ldr q2, [x28, #16]", + "str q16, [x4]", + "str q2, [x4, #16]" + ] + }, + ``` + +A little quirky since some care should be taken for the address calculation overhead, but today we currently implement these stores as two 128-bit stores. When vector TSO emulation is enabled, this turns in to dmb+str+dmb+str, if we were using pair stores this would turn in to a single dmb+stp pairing, eating a little bit of ALU work if we can't synthesize the SIB addressing directly. + +Additionally FEAT_LRCPC3 adds the STLUR and LDAPUR x86-TSO instructions, which we would probably want to choose to use in the backend instead when Vector TSO is enabled. For example the `dmb+stp` instruction pairing would turn in to `stlur+stlur` since that would be more efficient in that case. Similar story for the load side, but we don't have a good solution for returning a pair of registers atm. + +No hardware supports FEAT_LRCPC3 today so that particular edge case isn't very interesting. \ No newline at end of file |