Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures for gges and qz for float32 input in macOS CI #16949

Closed
rgommers opened this issue Sep 2, 2022 · 49 comments
Closed

Test failures for gges and qz for float32 input in macOS CI #16949

rgommers opened this issue Sep 2, 2022 · 49 comments
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg
Milestone

Comments

@rgommers
Copy link
Member

rgommers commented Sep 2, 2022

These started showing up consistently over the last day (example log). Only in the macOS tests (meson) job. That job uses conda-forge, the root cause is very likely the new build (0.23.1 build _3) of OpenBLAS pushed to https://anaconda.org/conda-forge/openblas about 16 hours ago.

___________________________ test_gges_tgexc[float32] ___________________________
[gw0] darwin -- Python 3.10.6 /Users/runner/miniconda3/envs/scipy-dev/bin/python
scipy/linalg/tests/test_lapack.py:3087: in test_gges_tgexc
    assert_allclose(q @ s @ z.conj().T, a, rtol=0, atol=atol)
E   AssertionError: 
E   Not equal to tolerance rtol=0, atol=1.19209e-05
E   
E   Mismatched elements: 100 / 100 (100%)
E   Max absolute difference: 0.5596407
E   Max relative difference: 45.19812
E    x: array([[ 0.269565,  0.510163,  0.307816,  0.720287,  0.713599,  0.25576 ,
E            0.131274,  0.843207,  0.835774,  1.005878],
E          [ 0.364995, -0.002645,  0.148027,  0.409446,  0.200136,  0.596843,...
E    y: array([[0.191519, 0.622109, 0.437728, 0.785359, 0.779976, 0.272593,
E           0.276464, 0.801872, 0.958139, 0.875933],
E          [0.357817, 0.500995, 0.683463, 0.712702, 0.370251, 0.561196,...
        a          = array([[0.19151945, 0.62210876, 0.43772775, 0.7853586 , 0.77997583,
        0.2725926 , 0.27646425, 0.8018722 , 0.9581...43 , 0.9514288 , 0.48035917,
        0.50255954, 0.53687817, 0.81920207, 0.05711564, 0.66942173]],
      dtype=float32)
        atol       = 1.1920928955078125e-05
        b          = array([[0.7671166 , 0.70811534, 0.7968672 , 0.55776083, 0.9658365 ,
        0.1471569 , 0.029647  , 0.59389347, 0.1140...526, 0.7370865 , 0.12702939,
        0.3696499 , 0.604334  , 0.10310444, 0.8023742 , 0.94555324]],
      dtype=float32)
        d1         = -10.973611
        d2         = 0.65421426
        dtype      = <class 'numpy.float32'>
        gges       = <fortran object>
        n          = 10
        q          = array([[ 0.36440596,  0.14539199, -0.48573047,  0.18267933, -0.10118745,
         0.15227516, -0.48237514,  0.06109872....19003534, -0.15249437,
        -0.1685543 , -0.12377075, -0.35968414, -0.3202702 , -0.10042791]],
      dtype=float32)
        result     = (array([[-1.00654995e+00, -1.57352880e-01, -7.44751692e-02,
        -1.94980763e-03,  1.06006734e-01,  7.20489264e-01,... 0.36549678, 0.9152956 ,
       1.0120404 , 0.9464567 , 0.94311225, 0.        , 0.9372638 ],
      dtype=float32), ...)
        s          = array([[-1.00654995e+00, -1.57352880e-01, -7.44751692e-02,
        -1.94980763e-03,  1.06006734e-01,  7.20489264e-01,
...,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         3.35054606e-01]], dtype=float32)
        t          = array([[ 0.09172459,  0.18612614, -0.10655626, -0.10534273, -0.5556871 ,
         0.8352932 , -0.11271314,  0.13819972....        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.9372638 ]],
      dtype=float32)
        tgexc      = <fortran object>
        z          = array([[ 0.45143077,  0.07711446,  0.04053342, -0.06187871, -0.2751032 ,
        -0.56598544, -0.22434531, -0.289968  ....21228388, -0.15313199,
         0.13994187, -0.6340214 ,  0.16412702,  0.56076413,  0.37706485]],
      dtype=float32)
___________________________ test_gges_tgsen[float32] ___________________________
[gw0] darwin -- Python 3.10.6 /Users/runner/miniconda3/envs/scipy-dev/bin/python
scipy/linalg/tests/test_lapack.py:3257: in test_gges_tgsen
    assert_allclose(q @ s @ z.conj().T, a, rtol=0, atol=atol)
E   AssertionError: 
E   Not equal to tolerance rtol=0, atol=1.19209e-05
E   
E   Mismatched elements: 100 / 100 (100%)
E   Max absolute difference: 0.5596407
E   Max relative difference: 45.19812
E    x: array([[ 0.269565,  0.510163,  0.307816,  0.720287,  0.713599,  0.25576 ,
E            0.131274,  0.843207,  0.835774,  1.005878],
E          [ 0.364995, -0.002645,  0.148027,  0.409446,  0.200136,  0.596843,...
E    y: array([[0.191519, 0.622109, 0.437728, 0.785359, 0.779976, 0.272593,
E           0.276464, 0.801872, 0.958139, 0.875933],
E          [0.357817, 0.500995, 0.683463, 0.712702, 0.370251, 0.561196,...
        a          = array([[0.19151945, 0.62210876, 0.43772775, 0.7853586 , 0.77997583,
        0.2725926 , 0.27646425, 0.8018722 , 0.9581...43 , 0.9514288 , 0.48035917,
        0.50255954, 0.53687817, 0.81920207, 0.05711564, 0.66942173]],
      dtype=float32)
        atol       = 1.1920928955078125e-05
        b          = array([[0.7671166 , 0.70811534, 0.7968672 , 0.55776083, 0.9658365 ,
        0.1471569 , 0.029647  , 0.59389347, 0.1140...526, 0.7370865 , 0.12702939,
        0.3696499 , 0.604334  , 0.10310444, 0.8023742 , 0.94555324]],
      dtype=float32)
        d1         = -10.973611
        d2         = 0.65421426
        dtype      = <class 'numpy.float32'>
        gges       = <fortran object>
        n          = 10
        q          = array([[ 0.36440596,  0.14539199, -0.48573047,  0.18267933, -0.10118745,
         0.15227516, -0.48237514,  0.06109872....19003534, -0.15249437,
        -0.1685543 , -0.12377075, -0.35968414, -0.3202702 , -0.10042791]],
      dtype=float32)
        result     = (array([[-1.00654995e+00, -1.57352880e-01, -7.44751692e-02,
        -1.94980763e-03,  1.06006734e-01,  7.20489264e-01,... 0.36549678, 0.9152956 ,
       1.0120404 , 0.9464567 , 0.94311225, 0.        , 0.9372638 ],
      dtype=float32), ...)
        s          = array([[-1.00654995e+00, -1.57352880e-01, -7.44751692e-02,
        -1.94980763e-03,  1.06006734e-01,  7.20489264e-01,
...,  0.00000000e+00,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
         3.35054606e-01]], dtype=float32)
        t          = array([[ 0.09172459,  0.18612614, -0.10655626, -0.10534273, -0.5556871 ,
         0.8352932 , -0.11271314,  0.13819972....        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.9372638 ]],
      dtype=float32)
        tgsen      = <fortran object>
        tgsen_lwork = <fortran object>
        z          = array([[ 0.45143077,  0.07711446,  0.04053342, -0.06187871, -0.2751032 ,
        -0.56598544, -0.22434531, -0.289968  ....21228388, -0.15313199,
         0.13994187, -0.6340214 ,  0.16412702,  0.56076413,  0.37706485]],
      dtype=float32)
____________________________ TestQZ.test_qz_single _____________________________
[gw1] darwin -- Python 3.10.6 /Users/runner/miniconda3/envs/scipy-dev/bin/python
scipy/linalg/tests/test_decomp.py:2039: in test_qz_single
    assert_array_almost_equal(Q @ AA @ Z.T, A, decimal=5)
E   AssertionError: 
E   Arrays are not almost equal to 5 decimals
E   
E   Mismatched elements: 25 / 25 (100%)
E   Max absolute difference: 1.9282217
E   Max relative difference: 12.652091
E    x: array([[ 0.25942,  0.42247,  0.29118,  1.3622 ,  0.07058],
E          [ 1.54704,  1.04254,  0.65235, -1.17932,  1.4969 ],
E          [ 0.37006,  0.76962, -0.09774,  1.07189, -0.13269],...
E    y: array([[0.92962, 0.31638, 0.18392, 0.20456, 0.56773],
E          [0.59554, 0.96451, 0.65318, 0.74891, 0.65357],
E          [0.74771, 0.96131, 0.00839, 0.10644, 0.2987 ],...
        A          = array([[0.9296161 , 0.31637555, 0.18391882, 0.20456028, 0.567725  ],
       [0.5955447 , 0.9645145 , 0.6531771 , 0.748...93, 0.9646476 , 0.7236853 ],
       [0.6424753 , 0.7174536 , 0.467599  , 0.32558468, 0.4396446 ]],
      dtype=float32)
        AA         = array([[ 0.39677864, -0.87905234,  0.31681818,  0.25420108, -0.31528485],
       [ 0.        ,  2.6049669 ,  0.5027338...10585941, -0.06299087],
       [ 0.        ,  0.        ,  0.        ,  0.        , -0.35764655]],
      dtype=float32)
        B          = array([[0.72968906, 0.99401456, 0.6768737 , 0.7908225 , 0.17091426],
       [0.02684928, 0.8003702 , 0.9037225 , 0.024...4 , 0.09596852, 0.21895005],
       [0.25871906, 0.46810576, 0.4593732 , 0.7095098 , 0.178053  ]],
      dtype=float32)
        BB         = array([[ 0.0964531 , -0.7481051 ,  0.0091472 , -0.6136254 ,  0.07135387],
       [ 0.        ,  2.5299823 ,  0.       ...        ,  0.01339715],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.80180556]],
      dtype=float32)
        Q          = array([[-0.5236176 ,  0.43169242,  0.54998064, -0.28841138, -0.39218202],
       [ 0.10638957,  0.3615428 , -0.7044734...05049007,  0.65567476],
       [-0.04425064,  0.36560142, -0.14396492,  0.8444302 , -0.36137164]],
      dtype=float32)
        Z          = array([[-4.1415977e-01,  4.8910376e-01,  6.0941523e-01,  2.2133094e-01,
         4.1094345e-01],
       [ 8.1124866e-0...e-01],
       [-2.1641581e-01,  2.0534211e-01, -2.5105488e-01, -8.4537733e-01,
         3.6511230e-01]], dtype=float32)
        n          = 5
        self       = <scipy.linalg.tests.test_decomp.TestQZ object at 0x12de02ad0>
=========================== short test summary info ============================
FAILED scipy/linalg/tests/test_lapack.py::test_gges_tgexc[float32] - Assertio...
FAILED scipy/linalg/tests/test_lapack.py::test_gges_tgsen[float32] - Assertio...
FAILED scipy/linalg/tests/test_decomp.py::TestQZ::test_qz_single - AssertionE...
@rgommers rgommers added scipy.linalg defect A clear bug or issue that prevents SciPy from being installed or used as expected labels Sep 2, 2022
@rgommers
Copy link
Member Author

rgommers commented Sep 2, 2022

The one change in that build is conda-forge/openblas-feedstock#146. Most of that is arm64-specific, for the x86_64 part the Fortran compiler is upgraded from Gfortran 9 to 11 though.

@rgommers
Copy link
Member Author

rgommers commented Sep 2, 2022

At first sight these tests are writting correctly, but it's not clear whether this is a bug in OpenBLAS or in SciPy. We've had a lot of problems with qz in particular before.

@rgommers
Copy link
Member Author

rgommers commented Sep 2, 2022

Note that qz is implemented in Python and uses gges internally, so there is only one problem here it looks like gges with float32 input.

@rgommers
Copy link
Member Author

rgommers commented Sep 2, 2022

On arm64 I also see new (but different) failures:

_____________________________________________________________ TestQRupdate_D.test_unsupported_dtypes ______________________________________________________________
scipy/linalg/tests/test_decomp_update.py:1561: in test_unsupported_dtypes
    r = r0.real.astype(dtype)
E   RuntimeWarning: invalid value encountered in cast
        a          = array([[0.36852295+0.70656594j, 0.97637554+0.88157274j,
        0.73179659+0.53904779j, 0.68487472+0.25772088j,
      ...842j, 0.1716864 +0.8069605j ,
        0.29238511+0.90236963j, 0.08459105+0.7541054j ,
        0.83912041+0.23152167j]])
        dts        = ['int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', ...]
        dtype      = 'uint32'
        q          = array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0,..., 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint32)
        q0         = array([[-0.11519073-0.22085421j,  0.40310762+0.25545139j,
         0.20641454-0.02725359j,  0.04916084+0.1205459j ,
  ...,
         0.07221112-0.17945509j,  0.10256997-0.11285204j,
        -0.05898739+0.03267021j,  0.56949722-0.06679237j]])
        r          = array([[65533, 65534, 65535, 65534, 65534, 65534, 65534],
       [    0,     1,     0,     0,     0,     0,     0],
  ... 0,     0,     0,     0,     0,     0,     0],
       [    0,     0,     0,     0,     0,     0,     0]], dtype=uint16)
        r0         = array([[-3.19924142+0.j        , -2.18995473+0.2389295j ,
        -1.64672593+0.4729315j , -2.40332574+0.41776413j,
  ...,  0.        +0.j        ,
         0.        +0.j        ,  0.        +0.j        ,
         0.        +0.j        ]])
        self       = <scipy.linalg.tests.test_decomp_update.TestQRupdate_D object at 0x144bba580>
        u          = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint16)
        u0         = array([0.97255678+0.26777079j, 0.36071622+0.78994569j,
       0.28201392+0.67496934j, 0.70583085+0.81774514j,
       0...358533j,
       0.09839256+0.43700129j, 0.56218546+0.88397738j,
       0.9777969 +0.18651321j, 0.14721884+0.59979179j])
        v          = array([0, 0, 0, 0, 0, 0, 0], dtype=uint16)
        v0         = array([0.01989467+0.01116458j, 0.23820349+0.06328583j,
       0.58360935+0.43995546j, 0.63492416+0.60213318j,
       0.45233787+0.88219112j, 0.63913689+0.71684251j,
       0.03427202+0.81291815j])
_____________________________________________________________________ test_solve_discrete_are _____________________________________________________________________
scipy/linalg/tests/test_solvers.py:528: in test_solve_discrete_are
    _test_factory(case, min_decimal[ind])
        _test_factory = <function test_solve_discrete_are.<locals>._test_factory at 0x144e44ee0>
        case       = (array([[      0., 1000000.],
       [      0.,       0.]]), array([[0],
       [1]]), array([[1., 0.],
       [0., 1.]]), array([[1]]), None)
        cases      = [(array([[ 2.+0.j,  1.-2.j],
       [ 0.+0.j, -0.-3.j]]), array([[0],
       [1]]), array([[1, 0],
       [0, 2]]), ar...0.005, 0.   ],
       [0.   , 0.02 ]]), array([[0.33333333, 0.        ],
       [0.        , 3.        ]]), None), ...]
        ind        = 15
        min_decimal = (12, 14, 13, 14, 13, 16, ...)
scipy/linalg/tests/test_solvers.py:525: in _test_factory
    assert_array_almost_equal(res, np.zeros_like(res), decimal=dec)
        a          = array([[      0., 1000000.],
       [      0.,       0.]])
        b          = array([[0],
       [1]])
        case       = (array([[      0., 1000000.],
       [      0.,       0.]]), array([[0],
       [1]]), array([[1., 0.],
       [0., 1.]]), array([[1]]), None)
        dec        = 2
        knownfailure = None
        q          = array([[1., 0.],
       [0., 1.]])
        r          = array([[1]])
        res        = array([[0.00000000e+00, 1.22970054e-20],
       [1.22970054e-20, 1.63574219e-02]])
        x          = array([[ 1.00000000e+00, -1.22970054e-20],
       [-1.22970054e-20,  1.00000000e+12]])
/Users/rgommers/mambaforge/envs/scipy-dev/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
        args       = (array([[0.00000000e+00, 1.22970054e-20],
       [1.22970054e-20, 1.63574219e-02]]), array([[0., 0.],
       [0., 0.]]))
        func       = <function assert_array_almost_equal at 0x122465e50>
        kwds       = {'decimal': 2}
        self       = <contextlib._GeneratorContextManager object at 0x12246a3a0>
/Users/rgommers/mambaforge/envs/scipy-dev/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
E   AssertionError: 
E   Arrays are not almost equal to 2 decimals
E   
E   Mismatched elements: 1 / 4 (25%)
E   Max absolute difference: 0.01635742
E   Max relative difference: inf
E    x: array([[0.00e+00, 1.23e-20],
E          [1.23e-20, 1.64e-02]])
E    y: array([[0., 0.],
E          [0., 0.]])
        args       = (<function assert_array_almost_equal.<locals>.compare at 0x144e448b0>, array([[0.00000000e+00, 1.22970054e-20],
       [1.22970054e-20, 1.63574219e-02]]), array([[0., 0.],
       [0., 0.]]))
        func       = <function assert_array_compare at 0x122465ca0>
        kwds       = {'err_msg': '', 'header': 'Arrays are not almost equal to 2 decimals', 'precision': 2, 'verbose': True}
        self       = <contextlib._GeneratorContextManager object at 0x12246a310>
===================================================================== short test summary info =====================================================================
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRdelete_f::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRdelete_F::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRdelete_d::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRdelete_D::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRinsert_f::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRinsert_F::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRinsert_d::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRinsert_D::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRupdate_f::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRupdate_F::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRupdate_d::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_decomp_update.py::TestQRupdate_D::test_unsupported_dtypes - RuntimeWarning: invalid value encountered in cast
FAILED scipy/linalg/tests/test_solvers.py::test_solve_discrete_are - AssertionError:

The qr failures are all the same ("invalid value in cast"), so I just copied one of the tracebacks.

@ngam
Copy link

ngam commented Sep 3, 2022

How do I run a specific test locally from scipy.test?

>>> scipy.test("test_lapack")
================================== test session starts ===================================
platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /Users/ngam
collected 46281 items / 46281 deselected / 0 selected                                    

=============================== 46281 deselected in 3.23s ================================

@rgommers
Copy link
Member Author

rgommers commented Sep 3, 2022

Depends on how you've installed or built SciPy. If a dev build according to the docs with dev.py, then:

$ python dev.py test -s linalg  # for the whole linalg module
$ python dev.py test -t scipy.linalg.tests.test_lapack  # for a specific file
$ python dev.py test -t scipy.linalg.tests.test_lapack::test_gges_tgexc  # for a specific test failing here

If you have SciPy installed, e.g. from conda-forge, then you can use pytest in a similar way:

$ pytest --pyargs scipy.linalg.tests.test_lapack
$ pytest --pyargs scipy.linalg.tests.test_lapack::test_gges_tgexc

@ngam
Copy link

ngam commented Sep 3, 2022

On arm64, with the newer builds of openblas (with only the automatic rerendering changes) from here: conda-forge/openblas-feedstock#147

I can reproduce this error by pytest --pyargs scipy.linalg.tests.test_solvers

E   AssertionError: 
E   Arrays are not almost equal to 2 decimals
E   
E   Mismatched elements: 1 / 4 (25%)
E   Max absolute difference: 0.01635742
E   Max relative difference: inf
E    x: array([[0.00e+00, 1.23e-20],
E          [1.23e-20, 1.64e-02]])
E    y: array([[0., 0.],
E          [0., 0.]])

but not the other.

(test_openblas_scipy) ~$ pytest --pyargs scipy.linalg.tests.test_decomp_update    
============================================== test session starts ===============================================
platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /Users/ngam
collected 665 items                                                                                              

.mambaforge/envs/test_openblas_scipy/lib/python3.10/site-packages/scipy/linalg/tests/test_decomp_update.py . [  0%]
.......................................................................................................... [ 16%]
.......................................................................................................... [ 32%]
.......................................................................................................... [ 47%]
.......................................................................................................... [ 63%]
.......................................................................................................... [ 79%]
.......................................................................................................... [ 95%]
............................                                                                               [100%]

============================================== 665 passed in 1.18s ===============================================
(test_openblas_scipy) ~$ 

Environment:

(test_openblas_scipy) ~$ mamba list | grep scipy
# packages in environment at /Users/ngam/.mambaforge/envs/test_openblas_scipy:
scipy                     1.9.1           py310ha0d8a01_0    conda-forge
(test_openblas_scipy) ~$ mamba list | grep openblas
# packages in environment at /Users/ngam/.mambaforge/envs/test_openblas_scipy:
libblas                   3.9.0           16_osxarm64_openblas    conda-forge
libcblas                  3.9.0           16_osxarm64_openblas    conda-forge
liblapack                 3.9.0           16_osxarm64_openblas    conda-forge
libopenblas               0.3.21          openmp_hc731615_4    ngam
openblas                  0.3.21          openmp_hf78f355_4    ngam
(test_openblas_scipy) ~$ mamba list | grep fortran 
libgfortran               5.0.0           11_3_0_hd922786_25    conda-forge
libgfortran5              11.3.0              hdaf2cc0_25    conda-forge
(test_openblas_scipy) ~$ 

@ngam
Copy link

ngam commented Sep 3, 2022

I am going to upload all the artifacts to my personal channel. I can then test them in a CI (I no longer have access to osx-64 machines easily at home). Done: https://anaconda.org/ngam/openblas/files (-c ngam)

@ngam
Copy link

ngam commented Sep 3, 2022

Testing the osx-64 here: https://github.com/ngam/test_openblas_scipy

@ngam
Copy link

ngam commented Sep 3, 2022

It seems only some of the tests could be reproduced. Not sure what's going on. @rgommers feel free to copy my repo or I can add you to it to edit as you wish https://github.com/ngam/test_openblas_scipy

@rgommers
Copy link
Member Author

rgommers commented Sep 5, 2022

The x86-64 build log has a bunch of -Wconversion warnings for sgges:

2022-09-01T18:29:33.4498390Z $BUILD_PREFIX/bin/x86_64-apple-darwin13.4.0-gfortran -march=core2 -mtune=haswell -ftree-vectorize -fPIC -fstack-protector -O2 -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/openblas-0.3.21 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -frecursive -O2 -m128bit-long-double -Wall -frecursive -fno-optimize-sibling-calls -m64 -fdefault-integer-8 -fopenmp -fPIC -c -o sgges.o sgges.f
2022-09-01T18:29:33.4551650Z sggbal.f:340:20:
2022-09-01T18:29:33.4656340Z 
2022-09-01T18:29:33.4759930Z   340 |       LSCALE( M ) = I
2022-09-01T18:29:33.4827060Z       |                    1
2022-09-01T18:29:33.4930060Z Warning: Possible change of value in conversion from INTEGER(8) to REAL(4) at (1) [-Wconversion]
2022-09-01T18:29:33.5016680Z sggbal.f:349:20:
2022-09-01T18:29:33.5103390Z 
2022-09-01T18:29:33.5133300Z   349 |       RSCALE( M ) = J
2022-09-01T18:29:33.5220650Z       |                    1
2022-09-01T18:29:33.5268710Z Warning: Possible change of value in conversion from INTEGER(8) to REAL(4) at (1) [-Wconversion]
2022-09-01T18:29:33.5323880Z sggbal.f:525:14:
2022-09-01T18:29:33.5336790Z 
2022-09-01T18:29:33.5371860Z   525 |          IR = LSCALE( I ) + SIGN( HALF, LSCALE( I ) )
2022-09-01T18:29:33.5426640Z       |              1
2022-09-01T18:29:33.5439640Z Warning: Possible change of value in conversion from REAL(4) to INTEGER(8) at (1) [-Wconversion]
2022-09-01T18:29:33.5474220Z sggbal.f:533:14:
2022-09-01T18:29:33.5528860Z 
2022-09-01T18:29:33.5541710Z   533 |          JC = RSCALE( I ) + SIGN( HALF, RSCALE( I ) )
2022-09-01T18:29:33.5594210Z       |              1
2022-09-01T18:29:33.5632650Z Warning: Possible change of value in conversion from REAL(4) to INTEGER(8) at (1) [-Wconversion]
2022-09-01T18:29:33.5735070Z sgges.f:409:21:
2022-09-01T18:29:33.5776050Z 
2022-09-01T18:29:33.5785510Z   409 |          WORK( 1 ) = MAXWRK
2022-09-01T18:29:33.5875790Z       |                     1
2022-09-01T18:29:33.5984170Z Warning: Possible change of value in conversion from INTEGER(8) to REAL(4) at (1) [-Wconversion]
2022-09-01T18:29:33.6067450Z sgges.f:671:18:
2022-09-01T18:29:33.6168620Z 
2022-09-01T18:29:33.6285770Z   671 |       WORK( 1 ) = MAXWRK
2022-09-01T18:29:33.6370830Z       |                  1
2022-09-01T18:29:33.6388960Z Warning: Possible change of value in conversion from INTEGER(8) to REAL(4) at (1) [-Wconversion]
2022-09-01T18:29:33.6475620Z $BUILD_PREFIX/bin/x86_64-apple-darwin13.4.0-gfortran -march=core2 -mtune=haswell -ftree-vectorize -fPIC -fstack-protector -O2 -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/openblas-0.3.21 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -frecursive -O2 -m128bit-long-double -Wall -frecursive -fno-optimize-sibling-calls -m64 -fdefault-integer-8 -fopenmp -fPIC -c -o sgges3.o sgges3.f
2022-09-01T18:29:33.6491860Z sgges3.f:411:21:
2022-09-01T18:29:33.6578700Z 
2022-09-01T18:29:33.6593730Z   411 |          WORK( 1 ) = LWKOPT
2022-09-01T18:29:33.6681530Z       |                     1
2022-09-01T18:29:33.6696390Z Warning: Possible change of value in conversion from INTEGER(8) to REAL(4) at (1) [-Wconversion]
2022-09-01T18:29:33.6768810Z sgges3.f:662:18:
2022-09-01T18:29:33.6783390Z 
2022-09-01T18:29:33.6798350Z   662 |       WORK( 1 ) = LWKOPT
2022-09-01T18:29:33.6871500Z       |                  1
2022-09-01T18:29:33.6886570Z Warning: Possible change of value in conversion from INTEGER(8) to REAL(4) at (1) [-Wconversion]
2022-09-01T18:29:33.7450620Z $BUILD_PREFIX/bin/x86_64-apple-darwin13.4.0-gfortran -march=core2 -mtune=haswell -ftree-vectorize -fPIC -fstack-protector -O2 -pipe -isystem $PREFIX/include -fdebug-prefix-map=$SRC_DIR=/usr/local/src/conda/openblas-0.3.21 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -frecursive -O2 -m128bit-long-double -Wall -frecursive -fno-optimize-sibling-calls -m64 -fdefault-integer-8 -fopenmp -fPIC -c -o sggesx.o sggesx.f
2022-09-01T18:29:33.7805580Z sggesx.f:513:21:
2022-09-01T18:29:33.7906910Z 
2022-09-01T18:29:33.7955740Z   513 |          WORK( 1 ) = LWRK
2022-09-01T18:29:33.8057420Z       |                     1
2022-09-01T18:29:33.8087500Z Warning: Possible change of value in conversion from INTEGER(8) to REAL(4) at (1) [-Wconversion]
2022-09-01T18:29:33.8106550Z sggesx.f:810:18:
2022-09-01T18:29:33.8153950Z 
2022-09-01T18:29:33.8255870Z   810 |       WORK( 1 ) = MAXWRK
2022-09-01T18:29:33.8267520Z       |                  1
2022-09-01T18:29:33.8370210Z Warning: Possible change of value in conversion from INTEGER(8) to REAL(4) at (1) [-Wconversion]

No idea if those are relevant though, such warnings are par for the course for old Fortran code.

I think the last fix in this set of wrappers was gh-13397. @ilayn any thoughts here?

@ilayn
Copy link
Member

ilayn commented Sep 5, 2022

If it was a wrapper issue we would probably see a segfault rather than precision problem And precision violation is ever so little increased . So I am cautiously looking at OpenBLAS but can't be sure. @martin-frbg is there anything happened regarding ?gges or ?tgsen or some other place lately to your knowledge?

@martin-frbg
Copy link

@ilayn nothing comes to mind except the general update to Reference-LAPACK 3.10.1 - do you see these problems only on mac ?

@rgommers
Copy link
Member Author

rgommers commented Sep 6, 2022

@martin-frbg yes, they started appearing on macOS when gfortran was upgraded from version 9 to 11. No other changes.

rgommers added a commit to rgommers/scipy that referenced this issue Sep 11, 2022
xref scipygh-16949. This doesn't fix the root cause of the problem,
but stops the CI job from failing.

[skip azp] [skip circle]
rgommers added a commit to rgommers/scipy that referenced this issue Sep 11, 2022
xref scipygh-16949. This doesn't fix the root cause of the problem,
but stops the CI job from failing.

[skip azp] [skip circle]
rgommers added a commit to rgommers/scipy that referenced this issue Sep 20, 2022
rgommers added a commit to rgommers/scipy that referenced this issue Sep 20, 2022
@rgommers rgommers added this to the 1.10.0 milestone Sep 20, 2022
@rgommers
Copy link
Member Author

The issue in CI was fixed by xfail-ing the relevant tests in gh-17057. The sgges issue needs reassessing before the next release. Things seem to work with OpenBLAS 0.3.18, but are broken with 0.3.20/21. As long as we continue to ship 0.3.18 in our own wheels we should be good there - but that doesn't help conda-forge, Debian et al.

rgommers added a commit to tylerjereddy/scipy that referenced this issue Oct 4, 2022
rgommers added a commit to tylerjereddy/scipy that referenced this issue Oct 4, 2022
@ngam
Copy link

ngam commented Feb 7, 2023

Seems like everything being tested passes now, but perhaps with two xfails

@mattip
Copy link
Contributor

mattip commented Feb 7, 2023

The failing tests were skipped. You could try backing out this change and testing again

@martin-frbg
Copy link

Trying now on the M1 in the GCC compile farm and pytest/scipy/openblas installed via pip. Both pytest runs pass with either the original (0.3.18) libopenblas.0.dylib or my own build from current develop (and the xfails in the testsuite backed out). Will try to get things installed on the local x86_64 macbook next. (@ngam can you tell what x86_64 hardware your tests were run on ? export OPENBLAS_VERBOSE=2 can be used to print the selected cpu TARGET at runtime

@martin-frbg
Copy link

test_lapack failures now reproduced on my macbook with the conda 0.3.21, have not checked older or newer version yet. (as an aside, running just "conda install numpy" without additional qualifiers appears to select a mkl build on this platform)

@martin-frbg
Copy link

That macbook is sloooow, but current indication is that the issue is already fixed in current develop branch, almost certainly by either of the PRs to handle perceived defects in the tree-vectorizer that has become active by default at O2 in recent GCC releases. Also 0.3.18 shows the same failure pattern as 0.3.21 when built with the new compiler.

@mattip
Copy link
Contributor

mattip commented Feb 8, 2023

Thanks. So if I understand correctly, this may be solved by packaging an interim OpenBLAS 0.3.21+ off the develop branch. That should be relatively easy to put together so it can be tested in SciPy.

@martin-frbg
Copy link

Yes, and/or get conda-forge to rebuild their openblas package with -fno-tree-vectorize set for gfortran.

@mattip
Copy link
Contributor

mattip commented Feb 9, 2023

Once #17950 goes in, it could be followed up by using the recently uploaded v0.3.20-571-g3dec11c6, which is the current HEAD of develop. It is hash 3dec11c6, which is 571 commits past v0.3.20. The v0.3.21 tag is not part of the develop branch, so the name is a bit off.

@martin-frbg
Copy link

Oops, I must have forgotten to do the mergeback from the release branch then, sorry. (Not sure if it is still safe to do now, or if it is just one more reason to get 0.3.22 out ASAP)

@mattip
Copy link
Contributor

mattip commented Feb 9, 2023

do the mergeback

I think changing it now has more potential to do damage than to help, I just wanted to explain why the new lib build has an old name. Thanks for digging into this.

@mattip
Copy link
Contributor

mattip commented Feb 19, 2023

Now that #17950 is merged, it should be easier to try to upgrade OpenBLAS to v0.3.20-571-g3dec11c6 and remove the xfails to see if the newer OpenBLAS fixed the problem

@andyfaff
Copy link
Contributor

I noticed that those OpenBLAS v0.3.20 tarballs were available. I don't understand what's going on with the versioning. Why are those tarballs available when 0.3.21 should be more recent?

@mattip
Copy link
Contributor

mattip commented Feb 19, 2023

I tried to explain the versioning above. TL;DR: v0.3.20-571-g3dec11c6 is after v0.3.21.

@martin-frbg
Copy link

see mattip's post above #16949 (comment) - I messed up by forgetting to merge the 0.3.21 tag from the release-0.3 branch back onto develop so packaging a snapshot of the latter will still show 0.3.20

@andyfaff
Copy link
Contributor

Does numpy/scipy have a policy on using non release versions of OpenBLAS?

I have a macosx_arm64 computer. I'll try re-enabling the tests in https://github.com/scipy/scipy/pull/17057/files#diff-447d26812e91bd5b73f8192b28b4f52c5d0a1eaf97e6d1a22657fed4a70226d5, and see if they're fixed by updating the OpenBLAS install to v0.3.20-571-g3dec11c6

@rgommers
Copy link
Member Author

Does numpy/scipy have a policy on using non release versions of OpenBLAS?

Any version that works is fine I think, no policy beyond that. A dev version with a fix we need is also not much different from picking a release version and applying some patches to it.

@andyfaff
Copy link
Contributor

I re-enabled the tests in test_decomp and test_lapack, and ran python dev.py test -t scipy.linalg.tests.test_decomp, python dev.py test -t scipy.linalg.tests.test_lapack. Both came back green. I used the most recent build of OpenBLAS, openblas-v0.3.20-571-g3dec11c6-macosx_11_0_arm64-gf_5272328.tar.gz.
This was using Python3.10.6.

A few things came to my attention during the build phase:

  • when I tried running the tests I got the Apple security warning libopenblas.... is from an untrusted source. I had to manually enable that in the security settings. I've never had that issue before.
  • there are a whole lot of ld: warning: -undefined dynamic_lookup may not work with chained fixups.
  • there are a bunch of warnings because boost uses sprintf, e.g.:
    ../scipy/_lib/boost/boost/lexical_cast/detail/converter_lexical_streams.hpp:297:21: warning: 'sprintf' is deprecated: This function is provided for compatibility reasons only. Due to security concerns inherent in the design of sprintf(3), it is highly recommended that you use snprintf(3) instead. [-Wdeprecated-declarations]

It would be nice if the latter were fixed upstream. I don't know about the former.

I'll make a PR to update the OpenBLAS version. Hopefully there are no problems with any other platform/machine.

@rgommers
Copy link
Member Author

The boost warnings will probably go away after the switch to Boost.Math. I haven't seen the ld: warning: -undefined dynamic_lookup ones before.

when I tried running the tests I got the Apple security warning libopenblas

That's bad, perhaps something went wrong with code signing in that OpenBLAS build?

@andyfaff
Copy link
Contributor

andyfaff commented Feb 19, 2023

The lexical_cast warnings are fixed in the next boost release.

To install the OpenBLAS bundle I simply expanded the tarball and copied arm64-builds into /opt. I'm not sure if there's another way to do it.

I don't know how anything is signed in scipy world. There is a whole bunch of stuff one can do to notarise/sign/etc macOS binaries. Do we do anything along those lines for the openblas libs, or indeed the wheels we make?

@mattip
Copy link
Contributor

mattip commented Feb 28, 2023

#18012 was merged. Did that solve the issue?

@andyfaff
Copy link
Contributor

Yes, it was solved for 1.11, but won't be backported to the 1.10 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg
Projects
None yet
Development

No branches or pull requests

7 participants