diff options
| author | Alexander Monakov <amonakov@ispras.ru> | 2024-02-06 23:48:08 +0300 |
|---|---|---|
| committer | Richard Henderson <richard.henderson@linaro.org> | 2024-05-03 08:03:05 -0700 |
| commit | f28e0bbefa41fe643cce2f107e868abff312ced9 (patch) | |
| tree | 933db7fedccb1c2590441909271db03ff8cba52f /plugins/api.c | |
| parent | 93a6085618f16fb2cd316d1e84f1a638b7e2d8ff (diff) | |
| download | focaccia-qemu-f28e0bbefa41fe643cce2f107e868abff312ced9.tar.gz focaccia-qemu-f28e0bbefa41fe643cce2f107e868abff312ced9.zip | |
util/bufferiszero: Optimize SSE2 and AVX2 variants
Increase unroll factor in SIMD loops from 4x to 8x in order to move their bottlenecks from ALU port contention to load issue rate (two loads per cycle on popular x86 implementations). Avoid using out-of-bounds pointers in loop boundary conditions. Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of PTEST, which is not profitable there (like in the removed SSE4 variant). Signed-off-by: Alexander Monakov <amonakov@ispras.ru> Signed-off-by: Mikhail Romanov <mmromanov@ispras.ru> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240206204809.9859-6-amonakov@ispras.ru>
Diffstat (limited to 'plugins/api.c')
0 files changed, 0 insertions, 0 deletions