Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hypothesis tests for data operators. #1957

Open
wants to merge 40 commits into
base: master
Choose a base branch
from

Conversation

hodgestar
Copy link
Contributor

@hodgestar hodgestar commented Jul 16, 2022

Description
Add hypothesis strategies for data objects and some simple property-based tests for data operations.

We aim for compatibility with numpy, but there are caveats in few different cases:

Operations Non-finites Equivalent Notes
Creation & negation No caveats
Equality checks The same tolerances must be specified
Addition & subtraction Yes
Scalar multiplication & division Yes CSR may raise an exception if the scalar is non-finite
Matrix multiplication Not tests Magnitude range is restrict to ensure precision
Trace, adjoint, transpose, conj & copy No caveats

In some cases we also need to ignore warnings raised by numpy about operations with nan and inf when calculating the expected result

Related issues or PRs

  • None

@hodgestar
Copy link
Contributor Author

@Zac-HD Any suggestions for ways we could use hypothesis better? It all looks quite clean right now.

Is there a way we can change what hypothesis prints when an example is found? E.g. we don't want the data layer object repr to include the contents of the array, but it would be nice to see that in the test failure output.

Copy link

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me!

Is there a way we can change what hypothesis prints when an example is found?

You can't change the failing example (except by changing the repr or hacking our internal pretty-printer - please don't!), but you can add a call to hypothesis.note("whatever representation you want to see") inside the test function, and you'll see that on failure.

qutip/tests/core/data/test_operators_hypothesis.py Outdated Show resolved Hide resolved
qutip/tests/core/data/test_operators_hypothesis.py Outdated Show resolved Hide resolved
Copy link
Member

@AGaliciaMartinez AGaliciaMartinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hypothesis testing approach seems to generate very clean and readable code! I really like this approach for testing as seems to simplify writing tests. I see that some tests do not fully work yet but the errors seem to be mostly related to inf and nan values not being properly handled. We can skip those tests by using the elements argument in npst.arrays but im not sure if we should. Also, absolute tolerances may need to be included in tests if we test with values close to zero.

@hodgestar What is the goal for this PR? Is this meant to ultimately replace the current testing code employed for the data layer? or is it just mean to test Data's operators?

qutip/tests/core/data/test_operators_hypothesis.py Outdated Show resolved Hide resolved
@Zac-HD
Copy link

Zac-HD commented Aug 6, 2022

(unsubscribing because I think there's not much for me to add, but please @ me if that changes)

@hodgestar
Copy link
Contributor Author

@Ericgig @AGaliciaMartinez This is ready for another round of review.

I fixed small bugs in mul and iszero -- we should discuss whether the fixes are good or not.

There are still some failing examples I'm encountering locally, that we should also decide what to do with:

scalar division:

# result = a / x
# numpy returns array([[nan+nanj]])
# our result:
result [Data, Dense]: array([[8.98846567e+307+8.98846567e+307j]])
a [Data, Dense]: array([[9.97920155e+291+1.79769313e+308j]])
x: (1+1j)

matmul:
The current failing example I'm seeing locally is just an atol issue encountered for large exponents, but I have encounter other failing examples still that I need to dig into.

@hodgestar
Copy link
Contributor Author

@Zac-HD Would you mind doing a quick sanity check of my MatrixShapesStrategy class (

class MatrixShapesStrategy(st.SearchStrategy):
""" A strategy for producing compatible matrix shapes.
Parameters
----------
shapes : list of shape specifications
Each shape specification show be a tuple of dimension labels,
e.g. ("n", "k"). Dimensions with the same labels will be
given the same length in the resulting shapes. Fixed dimensions
may be specified with an integer constant, e.g. ("n", 3).
min_side : int
The minimum length of any one dimension. Default 0.
max_side : int
The maximum length of any one dimension. Default 64.
Examples
--------
MatrixShapesStrategy([("n", "k"), ("k", "m")])
MatrixShapesStrategy([("j", 3), (3, "j")])
"""
def __init__(
self,
shapes,
min_side=1,
max_side=64,
):
super().__init__()
self.side_strat = st.integers(min_side, max_side)
self.shapes = tuple(shapes)
self.min_side = min_side
self.max_side = max_side
def do_draw(self, data):
dims = {}
shapes = []
for shape in self.shapes:
shapes.append([])
for name in shape:
if isinstance(name, int):
shapes[-1].append(name)
continue
if name not in dims:
dim = name
dims[dim] = data.draw(self.side_strat)
shapes[-1].append(dims[name])
shapes = tuple(tuple(s) for s in shapes)
return shapes
)? Mostly I want to know whether I should have been able to do this with mutually_broadcastable_shapes.

I also wouldn't mind a quick check of qobj_shared_shapes (

def qobj_shared_shapes(shapes):
"""
Return a tuple of strategies that each return one shape from
a shared MatrixShapesStrategy strategy defined by the given shapes.
Examples
--------
>>> shape_a, shape_b = qobj_shared_shapes([("n", "k"), ("k", "m")])
In the example above shape_a would return the shapes generated for
("n", "k") and shape_b the shapes generated for ("k", "m").
"""
shapes_st = MatrixShapesStrategy(shapes)
shapes_st = st.shared(shapes_st)
import operator
return tuple(
st.builds(operator.itemgetter(i), shapes_st)
for i in range(len(shapes))
)
). Is this a good thing to be doing with shared strategies?

@hodgestar
Copy link
Contributor Author

Yes, you should use mutually_broadcastable_shapes(signature=) for that. The signature syntax is a little different, since Numpy does it with strings, but we already support named dimensions, constant-size dimensions, and optional dims. Should be a bit more efficient too.

I tried with signature= quite a bit but (m, k), (k, n) always produced thing like (1, 1, 1), (1, 1) despite setting max_dims and min_dims to various things.

It does technically work? But in such a situation I'd probably just use st.data() and draw in the body of the test; see hypothesis write numpy.matmul for an example. If you expected to use it in many tests maybe a custom strategy, but for just test_data_matmul_operator that doesn't seem worth it.

I expected there to be other use cases. I started with the output of write numpy.matmul but ended up playing around with my own strategy (see difficulties with mutually_broadcastable_shapesabove) and liking the idea of not have to unpackaandbinside the test, and of being able to supply the shapes foraandb` to other strategies (e.g. maybe we will have a strategy for Qobj and QobjEvo objects and want to try various combinations of multiplying things together)

@Zac-HD
Copy link

Zac-HD commented Aug 7, 2022

I tried with signature= quite a bit but (m, k), (k, n) always produced thing like (1, 1, 1), (1, 1) despite setting max_dims and min_dims to various things.

Huh, that's surprising. Share some example code and I'll have a look?

I expected there to be other use cases. I started with the output of write numpy.matmul but ended up playing around with my own strategy (see difficulties with mutually_broadcastable_shapesabove) and liking the idea of not have to unpackaandbinside the test, and of being able to supply the shapes foraandb` to other strategies (e.g. maybe we will have a strategy for Qobj and QobjEvo objects and want to try various combinations of multiplying things together)

👍 sounds good, carry on then!

@hodgestar
Copy link
Contributor Author

hodgestar commented Aug 8, 2022

@Ericgig and I had a brief discussion about how to handle inf and nan generally. One option is forbid them entirely from the data layer -- i.e. to check and raise an error whenever a new data layer is created. A potential downside to this is we'd have to check often to be sure, since any operation can potentially create nan and inf, even simple addition:

>>> 1e308 + 1e308
inf
>>> (1e308 + 1e308) - (1e308 + 1e308)
nan

an alternative is to try propagate nan and inf only as correctly as, e.g., BLAS. What happens with BLAS is that nan and inf propagate reasonably correctly, but are considered fairly interchangeable -- i.e. nan, inf, and -inf are all just bad values.

@Zac-HD
Copy link

Zac-HD commented Aug 8, 2022

(out of my wheelhouse again; @ me again if that changes (again!))

Copy link
Member

@Ericgig Ericgig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hypothesis still feel somewhat magic, but I am starting to understand it.
Right now it seems to create data with mostly full arrays. I don't know how hard it is to create a strategy for different densities and to add the flags for Hermitian, etc. but it will be needed eventually.

The tolerances used are too tight, making some tests fail.

Also, from the old scipy test:

>>> a = np.array([[7.59109827e+14+9.96089747e+14j]])
>>> b = np.array([[7.22335902e+307+1.64618697e+308j]])
>>> np.allclose(b, a), np.allclose(a, b)
(False, True)

I am not sure we should ignore the overflow warning...

qutip/tests/core/data/test_operators_hypothesis.py Outdated Show resolved Hide resolved

@given(qst.qobj_datas(shape=same_shape), qst.qobj_datas(shape=same_shape))
def test_data_equality_operator_same_shapes(a, b):
result = (a == b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, qutip uses atol=1e-12, rtol=0 per default.

qutip/tests/core/data/test_operators_hypothesis.py Outdated Show resolved Hide resolved
qutip/tests/strategies.py Show resolved Hide resolved
qutip/tests/strategies.py Outdated Show resolved Hide resolved
qutip/tests/strategies.py Show resolved Hide resolved
@hodgestar hodgestar changed the base branch from dev.major to master April 11, 2023 14:33
@hodgestar
Copy link
Contributor Author

hodgestar commented Apr 12, 2023

@Ericgig I've expanded the tests and reworked this so that it's less of a WIP. I've also documented the expected behaviour of the data layer in the presence of nans, etc in the description of the PR. I'll find somewhere in the docs to add that before merging the PR.

Would you mind taking another look and letting me know what you think?

P.S. I also haven't had a clean test run on GitHub Actions yet, but they pass consistently on my machine now. I think I cleaned up most of the issues from the recent GitHub Actions runs in my recent commits, but I'll know in the morning. Obviously CI needs to pass consistently before merging could happen.

@coveralls
Copy link

coveralls commented Apr 12, 2023

Coverage Status

Coverage: 75.624% (+0.3%) from 75.317% when pulling 133677f on hodgestar:feature/add-hypothesis-tests-for-data-operators into fb72696 on qutip:master.

Copy link
Member

@Ericgig Ericgig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug in the 3.9 tests seems a real bug with the new tests, it should be fixed.
If it's numpy version specific, we can officially drop that version. (1.20)

I see no mention of tidyup, some of the tested operation use it, should it not be turned off for these tests? Otherwise it will cause random fails eventually.

What is the plan for other functions?
Most of the common function are tested, but some often used are missing (kron, l2, isherm, expect).

I am also curious about some of the more complex functions (inv, expm, pow) would fare, but I don't expect them to play nice with this kind of test.

I disagree with making our code worst for common use case just to handle junk the same way others do.

@hodgestar
Copy link
Contributor Author

The bug in the 3.9 tests seems a real bug with the new tests, it should be fixed. If it's numpy version specific, we can officially drop that version. (1.20)

Yes, I need to fix that.

I see no mention of tidyup, some of the tested operation use it, should it not be turned off for these tests? Otherwise it will cause random fails eventually.

Disabling tidyup during the tests is a good idea.

What is the plan for other functions? Most of the common function are tested, but some often used are missing (kron, l2, isherm, expect).

For this PR I just covered all the functions that are methods of the Data class. Some of these call out to dispatch functions, but I included those anyway since they're part of the Data class API.

I am also curious about some of the more complex functions (inv, expm, pow) would fare, but I don't expect them to play nice with this kind of test.

I suspect those will be a bit tricky, but I think we can still find a way to assert that they give the right answer even if it requires restricting a bit what examples hypothesis generates.

I disagree with making our code worst for common use case just to handle junk the same way others do.

I don't think I did this anywhere now? In some cases what QuTiP was doing was just wrong. E.g. inf * CSR-zeros is just NaNs everywhere. The previous QuTiP answer was wrong. We also don't have to handle junk the same way, but we do have to handle junk in a way that is sensible and avoids silently propagating or swallowing errors. People are going to make mistakes, that set of people includes us as core QuTiP developers.

Comment on lines -23 to -24
if value == 0:
return csr.zeros(matrix.shape[0], matrix.shape[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate the removal of this shortcut.
mul(data, 0) can happen often in QobjEvo when the coefficient is a pulse or step function.
Without this shortcut, they need to loop over the values twice, once for this mul, once for the add or matmul following.
Thus this shortcut will actually help users.
It is also one of the only places where there is a mul(data, 0) in the code.

In some theoretical case, it can shallow a NaN, but I don't see any realistic case where this could hide an error.

So everyone loses for a case that will probably never happen...

Can you write a code example where this would save us? Something that looks like it should work, but cause infinities or NaN that would be erased by a multiplication by 0 without returning garbage.

If we can't even think of such a case, let's keep the shortcut.
If you can find one, let's loop over the values to find Nans and return the zeros crs if none are found so we at least loop only once over the values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just need to make _data.zeros[type(x)](*x.shape) easier to use so that we don't need to write mul(data, 0)? I can add _data.zeros_like(x) if that would be better?

In the specific case where mul(data, 0) is intended to create a matrix of zeros of the right shape, the current fast path can never be wrong since that is exactly what it does. In general, mul(data, 0) is currently just incorrect floating point arithmetic.

I was wondering about adding has_non_finite flag to CSRs. It will be a bit of pain to keep track of everywhere but it would allow us to fast path the all finite cases without sacrificing correctness.

If this is a blocker for this PR, I'm happy to undo the current change and just make the tests pass by just explicitly stating that the expected result is 0 in those cases in the test.

I also don't want to sacrifice performance, but it is important to have a clear set of tests that show how we expect the data layer to behave in these edge cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more thinking about the use in QobjEvo. I don't think we ever use it as _data.zeros_like, in the code base, but that function would be useful, as would be an identity_like as we discussed somewhere.

I would be fine with a has_non_finite flag, but I think we should raise an error anytime we would raise that flag.
I think that we could do the check when we tidyup: we already check the scale of all values and it's done in most operation that could create zeros, thus functions at risk.

I see a CSR matrix full of zeros as wrong, if not an error, so this is a blocker to me.
This and tests should pass reliably.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using mul as _data.zeros_like in stochastic...

…checks. QuTiP's implementation does not use rtol.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants