New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NDArraysRegressionFixture: regression on arrays with arbitrary shape. #72
Conversation
Fixes ESSS#71 This is a first attempt, including unit tests. Also some minor issues in DataFrameRegressionFixture were fixed, which was used as a starting point for this PR. The new fixture only uses NumPy, so also the error message makes no use for pandas for formatting the errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this new feature @tovrstra, and thanks for the small fixes in other parts, awesome work!
expected_array = expected_data.get(k) | ||
|
||
if expected_array is None: | ||
error_msg = f"Could not find key '{k}' in the expected results.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice check! What happens for the opposite case, when there's a key in expected
, but it's not present in obtained
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. At least, there should be a check both ways: when new obtained results are present and when something is missing in obtained. Both situations may indicate either some error or a change in features, for which the file with expected results should be updated. (A few other questions came to mind too, unrelated to this. See message below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed.
Looking back, the following might also be of interest:
The first two would fit well in this PR. The third one is potentially a can of worms, so I'd rather keep that for later. |
I agree. About 1: the number and fraction of mismatched elements seems to be a good metric to report in case of differences. Perhaps other interesting metrics would be the maximum and the mean error (for both absolute and relative errors, ignoring the sign). About 2, indeed these tests would make the test suite more complete! About 3, yeah, it would be OK, but perhaps it could be added when the demand for this kind of comparison arise, as there are other regression methods that support text. |
Changes: - Add statistics of differing elements. - Add support for complex and string arrays. - Add support for 0-D arrays. - Add more unit tests. - Also raise AssertionError when new items are present in the dictionary. - Simplified code and clarified error messages.
I've added support for string arrays already in this PR. My main motivation was that string arrays were mostly working already, yet with misleading results in some cases. (It was also easy to fix.) I've added a few commented lines of code to the This PR contains a lot of code that is similar to that of # Case of 1D arrays with same length:
data = {"ar1": np.array([2.3, 9.4]) , "ar2": np.array([3, 4])}
dataframe = pd.DataFrame.from_dict(data)
array_regression(data) # same as current num_regression
array_regression(dataframe) # same as current dataframe_regression, based on type
array_regression(data, format="npz") # same as ndarrays_regression
array_regression(dataframe, format="npz") # not yet possible, but easy enough
# Case of ND arrays with different shapes
data = {"ar1": np.array([2.3, 9.4]) , "ar2": np.array([[3, 4], [2, 1]])}
dataframe = pd.DataFrame.from_dict(data) # raises ValueError
array_regression(data) # raises ValueError, suggesting npz format
array_regression(data, format="npz") # same as ndarrays_regression Future extensions could include support for HDF5 or user-defined container formats. The name |
) | ||
if np.issubdtype(obtained_array.dtype, np.number) and len(diff_ids) > 1: | ||
error_msg += ( | ||
" Statistics are computed for differing elements only.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice messages for the statistics, complete and informative! 👍
@@ -238,6 +238,11 @@ def test_string_array(dataframe_regression): | |||
data1 = {"potato": ["delicious", "nutritive", "yummy"]} | |||
dataframe_regression.check(pd.DataFrame.from_dict(data1)) | |||
|
|||
# TODO: The following fails with a confusing error message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I agree with the suggestion for Nice new improvements, too, there's one failing test, most probably a file was not committed: def test_common_case_zero_expected(ndarrays_regression, no_regen):
# Most common case: Data is valid, is present and should pass
data = {"data1": np.array([0, 0, 2, 3, 0, 5, 0, 7])}
> ndarrays_regression.check(data)
E Failed: File not found in data directory, created:
E - D:\a\pytest-regressions\pytest-regressions\tests\test_ndarrays_regression\test_common_case_zero_expected.npz |
- Add missing files - Remove unused ones - Facilitate regeneration of all NPZ files.
Yes, I must have overlooked it. It should be fixed. I made some changes which make it possible to regenerate all npz files if needed. That makes it easier to remove files that were no longer used. (Just remove all and rerun the tests.) |
I'll take a look at the |
By the way, with the introduction of a new API like |
Indeed!
Not that I know of, but we can define one in the issue for adding |
Sounds all good. Thanks for pointing out the issue regarding Some input from others in an |
I think that we should merge this one first, @tarcisiofischer or @nicoddemus, could you please also take a look here before it's merged, and perhaps provide some feedback on #73? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I have some further questions though:
- What happens in the error message when the number of elements is too big? I'm guessing pytest will truncate it somewhere, right?
- Is the performance reasonable for medium size arrays? For example, how much time to check a 100x100 and a 1000x1000 array? And 100x100x100 vs 200x200x200? I'm not planing to use any of this. The main point is to check if there is any bottleneck we can antecipate or at least be aware of.
- What happens (which error message) if the npz file is corrupted?
@@ -123,8 +123,7 @@ def _check_fn(self, obtained_filename, expected_filename): | |||
self._check_data_types(k, obtained_column, expected_column) | |||
self._check_data_shapes(obtained_column, expected_column) | |||
|
|||
data_type = obtained_column.values.dtype | |||
if data_type in [float, np.float16, np.float32, np.float64]: | |||
if np.issubdtype(obtained_column.values.dtype, np.inexact): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep in mind that this will change behavior a bit, because complex numbers are also np.inexact
:
>>> a = np.array([], dtype=np.complex128)
>>> np.issubdtype(a.dtype, np.inexact)
True
Perhaps add a test for this case (ignore me if you already did, by the time I'm writing this, I didn't finished the review yet), just to make sure it doesn't crash or anything...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. I wanted to support complex numbers and there is a unit test for it.
data = {"data1": np.array([True] * 10)} | ||
with pytest.raises( | ||
AssertionError, | ||
match="Data types are not the same.\nkey: data1\nObtained: bool\nExpected: int64\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really minor, but, perhaps something like this would be a bit more readable?
match = "\n".join([
"Data types are not the same.",
"key: data1",
"Obtained: bool"
"Expected: int64"
])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
|
||
|
||
def test_arrays_of_same_shape(ndarrays_regression): | ||
same_size_int_arrays = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you meant "same_shape" in the variable name? (Since the sizes are different)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that one. There is not so much use in giving the dictionary an elaborate name. Just data
would be good enough because there is no potential for confusion with other dictionaries.
Thanks for all the reviewing! I'll first try to answer some of the questions:
|
(1) Do what you think it's best. I'm thinking that, as a developer, I would end up assuming that I have to review the results with an external tool, if the error messages are too long. Feel free to keep this as it is, no problem, IMO. (2) Thanks. I just wanted to have an overall idea. It doesn't seem necessary to optimize, given the run times you reported :) (3) Argh, yeah-- Nice to be aware of this. I'll not make you implement anything right now, as I believe this PR seems pretty nice already. It is just something nice to be aware of. Thanks! |
In the end the main change was released.
Fixes #71
This is a first attempt, including unit tests. Also some minor issues in DataFrameRegressionFixture were fixed, which was used as a starting point for this PR. The new fixture only uses NumPy, so also the error message makes no use for pandas for formatting the errors.