summary refs log tree commit diff stats
path: root/results/scraper/fex/3782
diff options
context:
space:
mode:
authorChristian Krinitsin <mail@krinitsin.com>2025-07-17 09:10:43 +0200
committerChristian Krinitsin <mail@krinitsin.com>2025-07-17 09:10:43 +0200
commitf2ec263023649e596c5076df32c2d328bc9393d2 (patch)
tree5dd86caab46e552bd2e62bf9c4fb1a7504a44db4 /results/scraper/fex/3782
parent63d2e9d409831aa8582787234cae4741847504b7 (diff)
downloadqemu-analysis-main.tar.gz
qemu-analysis-main.zip
add downloaded fex bug-reports HEAD main
Diffstat (limited to 'results/scraper/fex/3782')
-rw-r--r--results/scraper/fex/378227
1 files changed, 27 insertions, 0 deletions
diff --git a/results/scraper/fex/3782 b/results/scraper/fex/3782
new file mode 100644
index 000000000..5831076c5
--- /dev/null
+++ b/results/scraper/fex/3782
@@ -0,0 +1,27 @@
+AVX128: 256-bit vmovmsk can be improved
+```json

+    "vmovmskps rax, ymm0": {

+      "ExpectedInstructionCount": 11,

+      "Comment": [

+        "Map 1 0b00 0x50 256-bit"

+      ],

+      "ExpectedArm64ASM": [

+        "ldr q2, [x28, #16]",

+        "ushr v3.4s, v16.4s, #31",

+        "ldr q4, [x28, #2512]",

+        "ushl v3.4s, v3.4s, v4.4s",

+        "addv s3, v3.4s",

+        "mov w20, v3.s[0]",

+        "ushr v2.4s, v2.4s, #31",

+        "ushl v2.4s, v2.4s, v4.4s",

+        "addv s2, v2.4s",

+        "mov w21, v2.s[0]",

+        "orr x4, x20, x21, lsl #4"

+      ]

+```

+Current algorithm just does two 128-bit algorithms and then merges at the end.

+

+This can be improved by loading a second shift value for the high 128-bits which shifts the signs in to different locations. Then do a add between the two 128-bit halves, then a horizontal add, single FPR->GPR element extract.

+Would shave the instructions from 11 to 10, removing a horizontal add and a FPR->GPR transfer.

+

+vmovmskpd is similar in that it is doing two 128-bit algorithms and then extracting. Different algorithm so it would need more noodling.
\ No newline at end of file