diff options
Diffstat (limited to 'results/scraper/fex/3795')
| -rw-r--r-- | results/scraper/fex/3795 | 27 |
1 files changed, 27 insertions, 0 deletions
diff --git a/results/scraper/fex/3795 b/results/scraper/fex/3795 new file mode 100644 index 000000000..7554a08f8 --- /dev/null +++ b/results/scraper/fex/3795 @@ -0,0 +1,27 @@ +AVX128: VPERM{Q,PD} can be more optimized +We only hand optimize four selectors out of the total 256. Anything that falls out of those four turns in to a zero-register plus four inserts which is pretty bad. We don't have these sitting in instcountci at all right now. + +An example of the bad output. +```json + "vpermpd ymm0, ymm1, 01101011b": { + "ExpectedInstructionCount": 9, + "Comment": [ + "Map 3 0b01 0x01 256-bit" + ], + "ExpectedArm64ASM": [ + "ldr q2, [x28, #32]", + "movi v3.2d, #0x0", + "mov v4.16b, v3.16b", + "mov v4.d[0], v2.d[1]", + "mov v16.16b, v4.16b", + "mov v16.d[1], v2.d[0]", + "mov v3.d[0], v2.d[0]", + "mov v3.d[1], v17.d[1]", + "str q3, [x28, #16]" + ] + }, +``` + +Theoretically worst case should devolve in to a TBL2 instruction which would be better than that mess of inserts, and of course we should support solving more selectors with handwritten optimizations. + +VPERMQ and VPERMPD are aliases of each other so they behave the same. \ No newline at end of file |