New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hypothesis tests for data operators. #1957
base: master
Are you sure you want to change the base?
Add hypothesis tests for data operators. #1957
Conversation
@Zac-HD Any suggestions for ways we could use hypothesis better? It all looks quite clean right now. Is there a way we can change what hypothesis prints when an example is found? E.g. we don't want the data layer object repr to include the contents of the array, but it would be nice to see that in the test failure output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me!
Is there a way we can change what hypothesis prints when an example is found?
You can't change the failing example (except by changing the repr or hacking our internal pretty-printer - please don't!), but you can add a call to hypothesis.note("whatever representation you want to see")
inside the test function, and you'll see that on failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hypothesis testing approach seems to generate very clean and readable code! I really like this approach for testing as seems to simplify writing tests. I see that some tests do not fully work yet but the errors seem to be mostly related to inf and nan values not being properly handled. We can skip those tests by using the elements argument in npst.arrays
but im not sure if we should. Also, absolute tolerances may need to be included in tests if we test with values close to zero.
@hodgestar What is the goal for this PR? Is this meant to ultimately replace the current testing code employed for the data layer? or is it just mean to test Data's operators?
…ectly shape 1d arrays.
… errors associated with these.
(unsubscribing because I think there's not much for me to add, but please @ me if that changes) |
@Ericgig @AGaliciaMartinez This is ready for another round of review. I fixed small bugs in There are still some failing examples I'm encountering locally, that we should also decide what to do with: scalar division:
matmul: |
@Zac-HD Would you mind doing a quick sanity check of my qutip/qutip/tests/strategies.py Lines 15 to 62 in 2aee70c
mutually_broadcastable_shapes .
I also wouldn't mind a quick check of qutip/qutip/tests/strategies.py Lines 113 to 131 in 2aee70c
|
I tried with
I expected there to be other use cases. I started with the output of |
Huh, that's surprising. Share some example code and I'll have a look?
👍 sounds good, carry on then! |
@Ericgig and I had a brief discussion about how to handle >>> 1e308 + 1e308
inf
>>> (1e308 + 1e308) - (1e308 + 1e308)
nan an alternative is to try propagate |
(out of my wheelhouse again; @ me again if that changes (again!)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hypothesis still feel somewhat magic, but I am starting to understand it.
Right now it seems to create data with mostly full arrays. I don't know how hard it is to create a strategy for different densities and to add the flags for Hermitian, etc. but it will be needed eventually.
The tolerances used are too tight, making some tests fail.
Also, from the old scipy test:
>>> a = np.array([[7.59109827e+14+9.96089747e+14j]])
>>> b = np.array([[7.22335902e+307+1.64618697e+308j]])
>>> np.allclose(b, a), np.allclose(a, b)
(False, True)
I am not sure we should ignore the overflow warning...
|
||
@given(qst.qobj_datas(shape=same_shape), qst.qobj_datas(shape=same_shape)) | ||
def test_data_equality_operator_same_shapes(a, b): | ||
result = (a == b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, qutip uses atol=1e-12
, rtol=0
per default.
…ularly challenging for floating point precision.
…non-finite scalars, only test matmul on saner values, make reference equality check for similar to what the data layer does, clean up tolerances.
… message and simply test_trace using raises_when.
@Ericgig I've expanded the tests and reworked this so that it's less of a WIP. I've also documented the expected behaviour of the data layer in the presence of Would you mind taking another look and letting me know what you think? P.S. I also haven't had a clean test run on GitHub Actions yet, but they pass consistently on my machine now. I think I cleaned up most of the issues from the recent GitHub Actions runs in my recent commits, but I'll know in the morning. Obviously CI needs to pass consistently before merging could happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug in the 3.9 tests seems a real bug with the new tests, it should be fixed.
If it's numpy version specific, we can officially drop that version. (1.20)
I see no mention of tidyup, some of the tested operation use it, should it not be turned off for these tests? Otherwise it will cause random fails eventually.
What is the plan for other functions?
Most of the common function are tested, but some often used are missing (kron
, l2
, isherm
, expect
).
I am also curious about some of the more complex functions (inv
, expm
, pow
) would fare, but I don't expect them to play nice with this kind of test.
I disagree with making our code worst for common use case just to handle junk the same way others do.
Yes, I need to fix that.
Disabling tidyup during the tests is a good idea.
For this PR I just covered all the functions that are methods of the
I suspect those will be a bit tricky, but I think we can still find a way to assert that they give the right answer even if it requires restricting a bit what examples hypothesis generates.
I don't think I did this anywhere now? In some cases what QuTiP was doing was just wrong. E.g. |
if value == 0: | ||
return csr.zeros(matrix.shape[0], matrix.shape[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate the removal of this shortcut.
mul(data, 0)
can happen often in QobjEvo when the coefficient is a pulse or step function.
Without this shortcut, they need to loop over the values twice, once for this mul
, once for the add
or matmul
following.
Thus this shortcut will actually help users.
It is also one of the only places where there is a mul(data, 0)
in the code.
In some theoretical case, it can shallow a NaN, but I don't see any realistic case where this could hide an error.
So everyone loses for a case that will probably never happen...
Can you write a code example where this would save us? Something that looks like it should work, but cause infinities or NaN that would be erased by a multiplication by 0 without returning garbage.
If we can't even think of such a case, let's keep the shortcut.
If you can find one, let's loop over the values to find Nans and return the zeros crs if none are found so we at least loop only once over the values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we just need to make _data.zeros[type(x)](*x.shape)
easier to use so that we don't need to write mul(data, 0)
? I can add _data.zeros_like(x)
if that would be better?
In the specific case where mul(data, 0)
is intended to create a matrix of zeros of the right shape, the current fast path can never be wrong since that is exactly what it does. In general, mul(data, 0)
is currently just incorrect floating point arithmetic.
I was wondering about adding has_non_finite
flag to CSRs. It will be a bit of pain to keep track of everywhere but it would allow us to fast path the all finite cases without sacrificing correctness.
If this is a blocker for this PR, I'm happy to undo the current change and just make the tests pass by just explicitly stating that the expected result is 0
in those cases in the test.
I also don't want to sacrifice performance, but it is important to have a clear set of tests that show how we expect the data layer to behave in these edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am more thinking about the use in QobjEvo
. I don't think we ever use it as _data.zeros_like
, in the code base, but that function would be useful, as would be an identity_like
as we discussed somewhere.
I would be fine with a has_non_finite
flag, but I think we should raise an error anytime we would raise that flag.
I think that we could do the check when we tidyup: we already check the scale of all values and it's done in most operation that could create zeros, thus functions at risk.
I see a CSR matrix full of zeros as wrong, if not an error, so this is a blocker to me.
This and tests should pass reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using mul
as _data.zeros_like
in stochastic...
…checks. QuTiP's implementation does not use rtol.
Description
Add hypothesis strategies for data objects and some simple property-based tests for data operations.
We aim for compatibility with numpy, but there are caveats in few different cases:
In some cases we also need to ignore warnings raised by numpy about operations with
nan
andinf
when calculating the expected resultRelated issues or PRs