Fix failing steadystate tests #1694

hodgestar · 2021-10-21T09:37:47Z

Description

We have steadystate tests that fail almost always in CI on Python 3.9 with OpenMP and MKL, and sometimes with just Python 3.9 and MKL.

Issue is currently hard to reproduce locally.

Related issues or PRs

Test failures seen in Remove saving the Hamiltonian in sesolve result #1689 and elsewhere

Progress so far

Fixed a small issue in steadystate tests so that I can use pytest-repeat to run --count=100 on the steadystate tests in the hopes of reproducing the bug locally.
Removed mutable default c_ops arguments for steadystate and liouvillian.
Fix reference to method in _pseudo_inverse_sparse.
~~Only set method in pseudo_inverse if one is explicitly defined.~~ (revert)

Changelog
TODO: Write the changelog once we understand properly what is going on.

…ultiple times.

hodgestar · 2021-10-21T09:39:44Z

@Ericgig I've started this branch specifically to tackle the strange steadystate (and other) test failures.

coveralls · 2021-10-21T13:03:25Z

Coverage remained the same at 16.832% when pulling 4b027c3 on hodgestar:fix/failing-steady-state-tests into 091574d on qutip:master.

This reverts commit 9d2b95b.

…o 32-bit API).

…meter."" This reverts commit 85ba43f.

This reverts commit bc48b54.

hodgestar · 2021-10-21T17:19:54Z

I think we finally have a concrete error and it's rather mystifying to me how it can happen:

            E = spla.expm(A.toarray())
            if np.isnan(E).any():
                print("A:", A)
                print("A data:", A.indices, A.indptr, A.shape)
                print("A toarray:", A.toarray())
                print("E:", E)
>               raise RuntimeError("NaNs generated by sp_expm.")
E               RuntimeError: NaNs generated by sp_expm.

qutip/sparse.py:408: RuntimeError
----------------------------- Captured stdout call -----------------------------
A:   (0, 1)	(-0.5+0j)
  (1, 0)	(0.5+0j)
  (1, 2)	(-0.7071067811865476+0j)
  (2, 1)	(0.7071067811865476+0j)
  (2, 3)	(-0.8660254037844386+0j)
  (3, 2)	(0.8660254037844386+0j)
  (3, 4)	(-1+0j)
  (4, 3)	(1+0j)
A data: [1 0 2 1 3 2 4 3] [0 1 3 5 7 8] (5, 5)
A toarray: [[ 0.        +0.j -0.5       +0.j  0.        +0.j  0.        +0.j
   0.        +0.j]
 [ 0.5       +0.j  0.        +0.j -0.70710678+0.j  0.        +0.j
   0.        +0.j]
 [ 0.        +0.j  0.70710678+0.j  0.        +0.j -0.8660254 +0.j
   0.        +0.j]
 [ 0.        +0.j  0.        +0.j  0.8660254 +0.j  0.        +0.j
  -1.        +0.j]
 [ 0.        +0.j  0.        +0.j  0.        +0.j  1.        +0.j
   0.        +0.j]]
E: [[nan+nanj nan+nanj nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj nan+nanj nan+nanj]
 [nan+nanj nan+nanj nan+nanj nan+nanj nan+nanj]]

See https://github.com/qutip/qutip/runs/3966808806?check_suite_focus=true#step:6:1646

Only 3.9 seems to fails so migrate all test there.

Tests sometime passes and sometime don't, 1) If it is to fail, I want it to always fail. 2) Are tries independent? 3) Does size matter?

Tests seems to pass or fail together. Maybe there is a conflict with some VM configuration or cpus. So I am storing cpu, ram, and distribution info.

Distribution info fail, not pytest.

hodgestar · 2021-10-22T20:23:48Z

@Ericgig tracked this issue down to only occuring on numpy 1.21.X (and not 1.20.X) on CI workers with certain Intel CPUs (8171 and 8272). There are a number of changes in numpy 1.21 which could have caused this, but it might take awhile to track down.

The plan from here is to make a small PR for some of the tiny clean-ups from here that seem good to have anyway, and then to create a new PR off of master to try get us back onto 1.21.X somehow (probably this will require a numpy fix, but maybe there is another work around).

hodgestar · 2021-11-12T10:41:41Z

Minimal script to reproduce the error that only uses numpy:

# On a CPU with AVX512 extensions and numpy 1.21.2:
# (only tested on Ubuntu)
# It works again on numpy 1.21.4 (and maybe 1.21.3 -- I did not check because 1.21.3 was not conda installable)

import numpy as np

L = np.diag([1+0j, 1, 1, 1])
b = np.array([1+0j, 0, 0, 0])

# commenting out the line below makes everything work, with it solve returns nans.
np.exp(0)
# breakpoint()

v = np.linalg.solve(L, b)
np.testing.assert_allclose(v, b)

hodgestar · 2021-11-12T13:02:34Z

Numpy bug report -- numpy/numpy#20356

hodgestar · 2021-11-12T13:03:24Z

Even smaller script for reproducing the issue:

a = np.diag([1+0j, 1])
np.exp(0)
x = np.linalg.det(a)

Ericgig · 2021-11-25T18:13:06Z

@hodgestar do we close this now or do we wait for numpy's fix to be on conda.

hodgestar · 2021-11-25T19:20:15Z

@Ericgig I'm happy to leave this open until a new numpy is released and we can update the version of numpy used in CI tests. Probably also good to have an issue open in case users encounter this in the wild.

hodgestar · 2021-12-09T23:01:46Z

The bug fix is scheduled to be included in numpy 1.22.0 -- https://github.com/numpy/numpy/milestone/93.

Avoid modifying global kwargs so that test_driven_cavity can be run m…

e77cfcf

…ultiple times.

hodgestar added 5 commits October 21, 2021 11:46

Remove mutable default for steadystate c_op_list parameter.

9d2b95b

Remove mutable default for liouvillian c_ops parameter.

0665a3b

Fix reference to method in _pseudo_inverse_sparse.

733ef3c

Only set method in pseudo_inverse if one is explicitly defined.

bc48b54

Only run steadystate tests.

956b5bf

hodgestar added 13 commits October 21, 2021 15:16

Try drop back to pytest-cov 2.12.1.

4ecff27

Revert pytest-cov version pinning.

8815740

Remove use of not thread-safe catch_warnings.

166cda5

Revert "Remove mutable default for steadystate c_op_list parameter."

85ba43f

This reverts commit 9d2b95b.

Make np_pt explicitly a int32 pointer (since we are calling the padis…

5a9d203

…o 32-bit API).

Revert "Revert "Remove mutable default for steadystate c_op_list para…

1060c28

…meter."" This reverts commit 85ba43f.

Revert "Only set method in pseudo_inverse if one is explicitly defined."

e3e9388

This reverts commit bc48b54.

Add test_states.py to test run.

444a150

Scatter assertions to hunt for the nans in coherent_dm.

bfe2e7e

Add more asserts to coherent.

f44357f

Litter asserts on nans in sp_expm.

10a1a24

Fix asset for diagonal case.

5d94526

Print out more detailed error information.

5c041a7

hodgestar and others added 5 commits October 21, 2021 22:13

Print dtypes of expm matrices.

dba3a24

Add 3.9 tests with nomkl and old numpy

5f4ddf7

Only 3.9 seems to fails so migrate all test there.

Repeat coherent_dm test on various sizes.

4b027c3

Tests sometime passes and sometime don't, 1) If it is to fail, I want it to always fail. 2) Are tries independent? 3) Does size matter?

Print more information about the vm

baa51c7

Tests seems to pass or fail together. Maybe there is a conflict with some VM configuration or cpus. So I am storing cpu, ram, and distribution info.

remove mac tests

a874ee2

Distribution info fail, not pytest.

hodgestar mentioned this pull request Oct 22, 2021

Add support for specifying the numpy version in the CI test matrix. #1696

Merged

hodgestar mentioned this pull request Nov 15, 2021

Unstable qutip.testing.run() prompts abort traps locally (Mac, Python 3.7) #1160

Closed

hodgestar mentioned this pull request Jan 19, 2022

Add a numpy 1.22 and Python 3.10 build to the CI test matrix. #1777

Merged

hodgestar closed this in #1777 Feb 1, 2022

hodgestar deleted the fix/failing-steady-state-tests branch December 10, 2022 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix failing steadystate tests #1694

Fix failing steadystate tests #1694

hodgestar commented Oct 21, 2021 •

edited

hodgestar commented Oct 21, 2021

coveralls commented Oct 21, 2021 •

edited

hodgestar commented Oct 21, 2021

hodgestar commented Oct 22, 2021

hodgestar commented Nov 12, 2021

hodgestar commented Nov 12, 2021

hodgestar commented Nov 12, 2021

Ericgig commented Nov 25, 2021

hodgestar commented Nov 25, 2021

hodgestar commented Dec 9, 2021

Fix failing steadystate tests #1694

Fix failing steadystate tests #1694

Conversation

hodgestar commented Oct 21, 2021 • edited

hodgestar commented Oct 21, 2021

coveralls commented Oct 21, 2021 • edited

hodgestar commented Oct 21, 2021

hodgestar commented Oct 22, 2021

hodgestar commented Nov 12, 2021

hodgestar commented Nov 12, 2021

hodgestar commented Nov 12, 2021

Ericgig commented Nov 25, 2021

hodgestar commented Nov 25, 2021

hodgestar commented Dec 9, 2021

hodgestar commented Oct 21, 2021 •

edited

coveralls commented Oct 21, 2021 •

edited