diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-07-17 09:10:43 +0200 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-07-17 09:10:43 +0200 |
| commit | f2ec263023649e596c5076df32c2d328bc9393d2 (patch) | |
| tree | 5dd86caab46e552bd2e62bf9c4fb1a7504a44db4 /results/scraper/fex/3795 | |
| parent | 63d2e9d409831aa8582787234cae4741847504b7 (diff) | |
| download | qemu-analysis-main.tar.gz qemu-analysis-main.zip | |
Diffstat (limited to 'results/scraper/fex/3795')
| -rw-r--r-- | results/scraper/fex/3795 | 27 |
1 files changed, 27 insertions, 0 deletions
diff --git a/results/scraper/fex/3795 b/results/scraper/fex/3795 new file mode 100644 index 000000000..7554a08f8 --- /dev/null +++ b/results/scraper/fex/3795 @@ -0,0 +1,27 @@ +AVX128: VPERM{Q,PD} can be more optimized +We only hand optimize four selectors out of the total 256. Anything that falls out of those four turns in to a zero-register plus four inserts which is pretty bad. We don't have these sitting in instcountci at all right now. + +An example of the bad output. +```json + "vpermpd ymm0, ymm1, 01101011b": { + "ExpectedInstructionCount": 9, + "Comment": [ + "Map 3 0b01 0x01 256-bit" + ], + "ExpectedArm64ASM": [ + "ldr q2, [x28, #32]", + "movi v3.2d, #0x0", + "mov v4.16b, v3.16b", + "mov v4.d[0], v2.d[1]", + "mov v16.16b, v4.16b", + "mov v16.d[1], v2.d[0]", + "mov v3.d[0], v2.d[0]", + "mov v3.d[1], v17.d[1]", + "str q3, [x28, #16]" + ] + }, +``` + +Theoretically worst case should devolve in to a TBL2 instruction which would be better than that mess of inserts, and of course we should support solving more selectors with handwritten optimizations. + +VPERMQ and VPERMPD are aliases of each other so they behave the same. \ No newline at end of file |