summary refs log tree commit diff stats
path: root/results/scraper/fex/3795
diff options
context:
space:
mode:
Diffstat (limited to 'results/scraper/fex/3795')
-rw-r--r--results/scraper/fex/379527
1 files changed, 27 insertions, 0 deletions
diff --git a/results/scraper/fex/3795 b/results/scraper/fex/3795
new file mode 100644
index 000000000..7554a08f8
--- /dev/null
+++ b/results/scraper/fex/3795
@@ -0,0 +1,27 @@
+AVX128: VPERM{Q,PD} can be more optimized
+We only hand optimize four selectors out of the total 256. Anything that falls out of those four turns in to a zero-register plus four inserts which is pretty bad. We don't have these sitting in instcountci at all right now.

+

+An example of the bad output.

+```json

+    "vpermpd ymm0, ymm1, 01101011b": {

+      "ExpectedInstructionCount": 9,

+      "Comment": [

+        "Map 3 0b01 0x01 256-bit"

+      ],

+      "ExpectedArm64ASM": [

+        "ldr q2, [x28, #32]",

+        "movi v3.2d, #0x0",

+        "mov v4.16b, v3.16b",

+        "mov v4.d[0], v2.d[1]",

+        "mov v16.16b, v4.16b",

+        "mov v16.d[1], v2.d[0]",

+        "mov v3.d[0], v2.d[0]",

+        "mov v3.d[1], v17.d[1]",

+        "str q3, [x28, #16]"

+      ]

+    },

+``` 

+

+Theoretically worst case should devolve in to a TBL2 instruction which would be better than that mess of inserts, and of course we should support solving more selectors with handwritten optimizations.

+

+VPERMQ and VPERMPD are aliases of each other so they behave the same.
\ No newline at end of file