summary refs log tree commit diff stats
path: root/results/classifier/phi4:14b/output/runtime/1086
diff options
context:
space:
mode:
Diffstat (limited to 'results/classifier/phi4:14b/output/runtime/1086')
-rw-r--r--results/classifier/phi4:14b/output/runtime/108672
1 files changed, 72 insertions, 0 deletions
diff --git a/results/classifier/phi4:14b/output/runtime/1086 b/results/classifier/phi4:14b/output/runtime/1086
new file mode 100644
index 00000000..56d70a3b
--- /dev/null
+++ b/results/classifier/phi4:14b/output/runtime/1086
@@ -0,0 +1,72 @@
+
+
+
+Numpy/scipy test suites fails in QEMU on ppc64le (but not on aarch64)
+Description of problem:
+I'm not really qualified to report this problem, but after being affected by it for ~2 years (and QEMU 7 not fixing things), I decided to give it a shot. Please excuse reporting deficiencies, I'll endeavour to fix them as best I can once pointed out.
+
+In my spare time, I help out for the packaging effort in the [conda-forge](https://conda-forge.org/) ecosystem, which is mostly associated/attached to the python world, but - in contrast to the vanilla python tools - also deals with non-python dependencies, and in particular has strong enough abstractions to deal with ABI-issues and generally provides much better integration than the packages on PyPI.
+
+This strength of abstraction has also allowed conda-forge to publish artefacts for many more architectures than most projects are commonly able to provide precompiled binaries for. Due to the lack of (reliable) public CI for aarch64 & ppc64le, these packages are mostly cross-compiled from linux-x86. Where cross compilation is not possible, the packages are compiled in emulation through QEMU, coming through https://github.com/multiarch/qemu-user-static (this is the part of the infrastructure I don't fully understand myself...). The full infrastructure is somewhat involved, but should not be relevant (hopefully) to the issue at hand (see instructions below) - and even if that turns out to be the case, that would be a great information gain as well.
+
+In either case, the tests for the package (ideally comprising the entire upstream test suite) are then run in emulation.
+
+Two of the so-called "feedstocks" I co-maintain are for [numpy](https://github.com/conda-forge/numpy-feedstock) and [scipy](https://github.com/conda-forge/scipy-feedstock), and there have been persistent issues with running the test suite in emulation on PPC (interestingly, the same setup on a different architecture - aarch64 - has no problems). However, the compiled artefacts on PPC run fine on native hardware.
+
+Said otherwise, it appears numpy/scipy are exercising QEMU enough to uncover some bugs. I've seen similar problems also in other packages (e.g. the cvxpy-stack), reinforcing the impression that this is a QEMU issue, and not one on the level of the individual packages.
+
+Depending on the exact combination of python version, the result of the numpy test suite might be as follows:
+```
+320 failed, 18900 passed, 361 skipped, 36 xfailed, 9 xpassed, 144 warnings in 2516.49s (0:41:56)
+```
+
+Looking at the test failures, sometimes the results are garbage
+```
+>       assert_array_max_ulp(x, x+eps, maxulp=20)
+E       AssertionError: Arrays are not almost equal up to 20 ULP (max difference is 8.55554e+08 ULP)
+
+eps        = 1.1920929e-07
+self       = <numpy.testing.tests.test_utils.TestULP object at 0x401ec8beb0>
+x          = array([ 2.3744986e-38,            nan,  2.2482052e-15,  7.5780330e+28,
+                  nan,            nan,  5.8310814e+29, -5.6511531e+24,
+        1.0010809e+00,  1.0101526e+00], dtype=float32)
+```
+sometimes the values are permuted
+```
+>           assert_array_equal(actual, desired)
+E           AssertionError: 
+E           Arrays are not equal
+E           
+E           x and y nan location mismatch:
+E            x: array([0.000000e+00, 6.704092e-39, 9.000000e+00, 2.350989e-38,
+E                  0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
+E                  6.772341e-39,          nan], dtype=float32)
+E            y: array([6.704092e-39, 6.772341e-39, 0.000000e+00, 0.000000e+00,
+E                  0.000000e+00, 0.000000e+00,          nan, 2.350989e-38,
+E                  2.000000e+00, 7.000000e+00], dtype=float32)
+```
+sometimes the results are fundamentally different (zero vs. non-zero)
+```
+>               raise AssertionError(msg)
+E               AssertionError: 
+E               Arrays are not almost equal to 6 decimals
+E               
+E               Mismatched elements: 72 / 216 (33.3%)
+E               Max absolute difference: 1.
+E               Max relative difference: 1.
+E                x: array([[[[[0., 0., 0.],
+E                         [0., 0., 0.],
+E                         [0., 0., 0.]],...
+E                y: array([[[[[1., 0., 0.],
+E                         [0., 1., 0.],
+E                         [0., 0., 1.]],...
+```
+
+I don't know where it goes wrong, but it's not just a little tolerance violation. One PR that illustrates this is [here](https://github.com/conda-forge/numpy-feedstock/pull/274) and the respective CI run is [here](https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=526218&view=results) (ignore the errors for osx-arm64, those are unrelated).
+Steps to reproduce:
+1. In an emulated ppc64 machine, install miniforge from [here](https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-ppc64le.sh)
+2. Run `conda create -n test_env numpy pytest cython hypothesis typing_extensions` and then `conda activate test_env`
+3. Run `python -c "import numpy; numpy.test()"`
+4. Pick any test that fails and run it as `python -c "import numpy; numpy.test(tests='x.y.z')"`
+Additional information:
+