1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
Redundant context loads when distinct pshufd's in a block
bytemark has a sequence like:
```
"pshufd xmm2,xmm3,0xf5",
"addpd xmm1,xmm12",
"pmuludq xmm3,xmm9",
"pshufd xmm3,xmm3,0xe8",
```
because the pshufd indices differ, we can't cache the constant. and we end up generating code
```
"ldr x0, [x28, #1760]",
"ldr q3, [x0, #3920]",
"tbl v18.16b, {v19.16b}, v3.16b",
"fadd v17.2d, v17.2d, v28.2d",
"uzp1 v4.4s, v19.4s, v19.4s",
"uzp1 v5.4s, v25.4s, v25.4s",
"umull v19.2d, v4.2s, v5.2s",
"ldr x0, [x28, #1760]",
"ldr q4, [x0, #3712]",
"tbl v19.16b, {v19.16b}, v4.16b",
```
Notice that we reload the base vector addresses in the same block, instead of caching it. At minimum we should probably fix that. But I'm also questioning if there's maybe a better way to do pshufd? Not sure
|