ppc: softfloat float implementation issues Per bug #1841491, Richard Henderson (rth) said: > The float test failure is part of a larger problem for target/powerpc in which all float > routines are implemented incorrectly. They are all implemented as double operations with > rounding to float as a second step. Which not only produces incorrect exceptions, as in > this case, but incorrect > numerical results from the double rounding. > > This should probably be split to a separate bug... -- ppc64le native: $ gcc -c -O2 ffma.c $ gcc -O2 test-ffma.c ffma.o -lm -o test-ffma $ ./test-ffma $(./test-ffma) ffma(0x1p-149, 0x1p-149, 0x1p-149) 0x0 0xa000000 FE_INEXACT FE_UNDERFLOW 0x1p-149 -- qemu-system-ppc64: $ ./test-ffma $(./test-ffma) ffma(0x1p-149, 0x1p-149, 0x1p-149) 0x0 0x2000000 FE_INEXACT 0x1p-149 I'm confused by this testcase as it's not a fused multiply-add but as you say two combined operations. It should be a fused multiply add; you may need to use -ffast-math or something to get the compiler to generate the proper instruction. However, one can see from target/ppc/translate/fp-impl.inc.c: /* fmadd - fmadds */ GEN_FLOAT_ACB(madd, 0x1D, 1, PPC_FLOAT); through to _GEN_FLOAT_ACB: gen_helper_f##op(t3, cpu_env, t0, t1, t2); \ if (isfloat) { \ gen_helper_frsp(t3, cpu_env, t3); \ } \ That right there is a double-precision fma followed by a round to single precision. This pattern is replicated for all single precision operations, and is of course wrong. I believe that correct results may be obtained by having single-precision helpers that first convert the double-precision input into a single-precision input using helper_tosingle(), perform the required operation, then convert the result back to double-precision using helper_todouble(). The manual says: # For single-precision arithmetic instructions, all input values # must be representable in single format; if they are not, the # result placed into the target FPR, and the setting of # status bits in the FPSCR and in the Condition Register # (if Rc=1), are undefined. The tosingle/todouble conversions are exact and bit-preserving. They are used by load-single and store-single that convert a single-precision in-memory value to the double-precision register value. Therefore the input given to float32_add using this conversion would be exactly the same as if we had given the value unmollested from a memory input. I don't know what real ppc hw does -- whether it takes all of the double-precision input bits and rounds to 23-bits, like the old 80387 hardware does, or truncates the input as I propose. But for architectural results we don't have to care, because of the UNDEFINED escape clause. Testing on current master shows the behavior is correct. I guess rth's patch fixed this case. It looks like the test case isn't properly exercising the code that is likely to be wrong. It sounds like we need a proper comprehensive testcase for fused operations (along the line of the ARM fcvt test case). This can probably be a multiarch testcase which we can build for all the various targets. This is a generic floating point multiply and accumulate test for single precision floating point values. I've split of the common float functions into a helper library so additional tests can use the same common code. Signed-off-by: Alex Bennée