blob: 4cd2dceb8f3fecd9d673b59a454cce001ad79c83 (
plain) (
blame)
1
2
3
4
|
AVX128: Scalar FMA can be optimized with AFP.NEP
Currently scalar FMA eats the insert after calculation even when AFP.NEP is supported. This is because I didn't implement the AFP.NEP path.
Will cut the scalar FMA implementation by one instruction in those cases, just needs to be implemented.
|