results/classifier/zero-shot-user-mode/runtime/1086


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

runtime: 0.432
instruction: 0.308
syscall: 0.261


Numpy/scipy test suites fails in QEMU on ppc64le (but not on aarch64)
Description of problem:
I'm not really qualified to report this problem, but after being affected by it for ~2 years (and QEMU 7 not fixing things), I decided to give it a shot. Please excuse reporting deficiencies, I'll endeavour to fix them as best I can once pointed out.

In my spare time, I help out for the packaging effort in the [conda-forge](https://conda-forge.org/) ecosystem, which is mostly associated/attached to the python world, but - in contrast to the vanilla python tools - also deals with non-python dependencies, and in particular has strong enough abstractions to deal with ABI-issues and generally provides much better integration than the packages on PyPI.

This strength of abstraction has also allowed conda-forge to publish artefacts for many more architectures than most projects are commonly able to provide precompiled binaries for. Due to the lack of (reliable) public CI for aarch64 & ppc64le, these packages are mostly cross-compiled from linux-x86. Where cross compilation is not possible, the packages are compiled in emulation through QEMU, coming through https://github.com/multiarch/qemu-user-static (this is the part of the infrastructure I don't fully understand myself...). The full infrastructure is somewhat involved, but should not be relevant (hopefully) to the issue at hand (see instructions below) - and even if that turns out to be the case, that would be a great information gain as well.

In either case, the tests for the package (ideally comprising the entire upstream test suite) are then run in emulation.

Two of the so-called "feedstocks" I co-maintain are for [numpy](https://github.com/conda-forge/numpy-feedstock) and [scipy](https://github.com/conda-forge/scipy-feedstock), and there have been persistent issues with running the test suite in emulation on PPC (interestingly, the same setup on a different architecture - aarch64 - has no problems). However, the compiled artefacts on PPC run fine on native hardware.

Said otherwise, it appears numpy/scipy are exercising QEMU enough to uncover some bugs. I've seen similar problems also in other packages (e.g. the cvxpy-stack), reinforcing the impression that this is a QEMU issue, and not one on the level of the individual packages.

Depending on the exact combination of python version, the result of the numpy test suite might be as follows:
```
320 failed, 18900 passed, 361 skipped, 36 xfailed, 9 xpassed, 144 warnings in 2516.49s (0:41:56)
```

Looking at the test failures, sometimes the results are garbage
```
>       assert_array_max_ulp(x, x+eps, maxulp=20)
E       AssertionError: Arrays are not almost equal up to 20 ULP (max difference is 8.55554e+08 ULP)

eps        = 1.1920929e-07
self       = <numpy.testing.tests.test_utils.TestULP object at 0x401ec8beb0>
x          = array([ 2.3744986e-38,            nan,  2.2482052e-15,  7.5780330e+28,
                  nan,            nan,  5.8310814e+29, -5.6511531e+24,
        1.0010809e+00,  1.0101526e+00], dtype=float32)
```
sometimes the values are permuted
```
>           assert_array_equal(actual, desired)
E           AssertionError: 
E           Arrays are not equal
E           
E           x and y nan location mismatch:
E            x: array([0.000000e+00, 6.704092e-39, 9.000000e+00, 2.350989e-38,
E                  0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
E                  6.772341e-39,          nan], dtype=float32)
E            y: array([6.704092e-39, 6.772341e-39, 0.000000e+00, 0.000000e+00,
E                  0.000000e+00, 0.000000e+00,          nan, 2.350989e-38,
E                  2.000000e+00, 7.000000e+00], dtype=float32)
```
sometimes the results are fundamentally different (zero vs. non-zero)
```
>               raise AssertionError(msg)
E               AssertionError: 
E               Arrays are not almost equal to 6 decimals
E               
E               Mismatched elements: 72 / 216 (33.3%)
E               Max absolute difference: 1.
E               Max relative difference: 1.
E                x: array([[[[[0., 0., 0.],
E                         [0., 0., 0.],
E                         [0., 0., 0.]],...
E                y: array([[[[[1., 0., 0.],
E                         [0., 1., 0.],
E                         [0., 0., 1.]],...
```

I don't know where it goes wrong, but it's not just a little tolerance violation. One PR that illustrates this is [here](https://github.com/conda-forge/numpy-feedstock/pull/274) and the respective CI run is [here](https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=526218&view=results) (ignore the errors for osx-arm64, those are unrelated).
Steps to reproduce:
1. In an emulated ppc64 machine, install miniforge from [here](https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-ppc64le.sh)
2. Run `conda create -n test_env numpy pytest cython hypothesis typing_extensions` and then `conda activate test_env`
3. Run `python -c "import numpy; numpy.test()"`
4. Pick any test that fails and run it as `python -c "import numpy; numpy.test(tests='x.y.z')"`
Additional information: