Redundant context loads when distinct pshufd's in a block bytemark has a sequence like: ``` "pshufd xmm2,xmm3,0xf5", "addpd xmm1,xmm12", "pmuludq xmm3,xmm9", "pshufd xmm3,xmm3,0xe8", ``` because the pshufd indices differ, we can't cache the constant. and we end up generating code ``` "ldr x0, [x28, #1760]", "ldr q3, [x0, #3920]", "tbl v18.16b, {v19.16b}, v3.16b", "fadd v17.2d, v17.2d, v28.2d", "uzp1 v4.4s, v19.4s, v19.4s", "uzp1 v5.4s, v25.4s, v25.4s", "umull v19.2d, v4.2s, v5.2s", "ldr x0, [x28, #1760]", "ldr q4, [x0, #3712]", "tbl v19.16b, {v19.16b}, v4.16b", ``` Notice that we reload the base vector addresses in the same block, instead of caching it. At minimum we should probably fix that. But I'm also questioning if there's maybe a better way to do pshufd? Not sure