diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-07-17 09:10:43 +0200 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-07-17 09:10:43 +0200 |
| commit | f2ec263023649e596c5076df32c2d328bc9393d2 (patch) | |
| tree | 5dd86caab46e552bd2e62bf9c4fb1a7504a44db4 /results/scraper/fex/3782 | |
| parent | 63d2e9d409831aa8582787234cae4741847504b7 (diff) | |
| download | qemu-analysis-main.tar.gz qemu-analysis-main.zip | |
Diffstat (limited to 'results/scraper/fex/3782')
| -rw-r--r-- | results/scraper/fex/3782 | 27 |
1 files changed, 27 insertions, 0 deletions
diff --git a/results/scraper/fex/3782 b/results/scraper/fex/3782 new file mode 100644 index 000000000..5831076c5 --- /dev/null +++ b/results/scraper/fex/3782 @@ -0,0 +1,27 @@ +AVX128: 256-bit vmovmsk can be improved +```json + "vmovmskps rax, ymm0": { + "ExpectedInstructionCount": 11, + "Comment": [ + "Map 1 0b00 0x50 256-bit" + ], + "ExpectedArm64ASM": [ + "ldr q2, [x28, #16]", + "ushr v3.4s, v16.4s, #31", + "ldr q4, [x28, #2512]", + "ushl v3.4s, v3.4s, v4.4s", + "addv s3, v3.4s", + "mov w20, v3.s[0]", + "ushr v2.4s, v2.4s, #31", + "ushl v2.4s, v2.4s, v4.4s", + "addv s2, v2.4s", + "mov w21, v2.s[0]", + "orr x4, x20, x21, lsl #4" + ] +``` +Current algorithm just does two 128-bit algorithms and then merges at the end. + +This can be improved by loading a second shift value for the high 128-bits which shifts the signs in to different locations. Then do a add between the two 128-bit halves, then a horizontal add, single FPR->GPR element extract. +Would shave the instructions from 11 to 10, removing a horizontal add and a FPR->GPR transfer. + +vmovmskpd is similar in that it is doing two 128-bit algorithms and then extracting. Different algorithm so it would need more noodling. \ No newline at end of file |