NDArraysRegressionFixture: regression on arrays with arbitrary shape. #72

tovrstra · 2021-09-03T22:07:15Z

Fixes #71

This is a first attempt, including unit tests. Also some minor issues in DataFrameRegressionFixture were fixed, which was used as a starting point for this PR. The new fixture only uses NumPy, so also the error message makes no use for pandas for formatting the errors.

Fixes ESSS#71 This is a first attempt, including unit tests. Also some minor issues in DataFrameRegressionFixture were fixed, which was used as a starting point for this PR. The new fixture only uses NumPy, so also the error message makes no use for pandas for formatting the errors.

tadeu

Thanks for this new feature @tovrstra, and thanks for the small fixes in other parts, awesome work!

tadeu · 2021-09-06T14:56:19Z

src/pytest_regressions/ndarrays_regression.py

+            expected_array = expected_data.get(k)
+
+            if expected_array is None:
+                error_msg = f"Could not find key '{k}' in the expected results.\n"


Nice check! What happens for the opposite case, when there's a key in expected, but it's not present in obtained?

That's a good point. At least, there should be a check both ways: when new obtained results are present and when something is missing in obtained. Both situations may indicate either some error or a change in features, for which the file with expected results should be updated. (A few other questions came to mind too, unrelated to this. See message below.)

This should be fixed.

tovrstra · 2021-09-06T17:24:51Z

Looking back, the following might also be of interest:

Reporting the number and fraction of mismatched elements.
Have int and float arrays in the test functions test_common_case_*. Now it is only int.
Comparison of arrays with (unicode) strings. Computation of a difference in this case seems not so useful. Other dtypes might be considered too, as long as they don't result in pickling the data inside the NPZ file, because the pickle format is not very robust.

The first two would fit well in this PR. The third one is potentially a can of worms, so I'd rather keep that for later.

tadeu · 2021-09-06T17:45:29Z

I agree. About 1: the number and fraction of mismatched elements seems to be a good metric to report in case of differences. Perhaps other interesting metrics would be the maximum and the mean error (for both absolute and relative errors, ignoring the sign).

About 2, indeed these tests would make the test suite more complete!

About 3, yeah, it would be OK, but perhaps it could be added when the demand for this kind of comparison arise, as there are other regression methods that support text.

Changes: - Add statistics of differing elements. - Add support for complex and string arrays. - Add support for 0-D arrays. - Add more unit tests. - Also raise AssertionError when new items are present in the dictionary. - Simplified code and clarified error messages.

tovrstra · 2021-09-07T18:51:34Z

I've added support for string arrays already in this PR. My main motivation was that string arrays were mostly working already, yet with misleading results in some cases. (It was also easy to fix.) I've added a few commented lines of code to the test_string_arrays of tests/test_dataframe_regression.py, which would show similarly confusing behavior of dataframe_regression.

This PR contains a lot of code that is similar to that of datarame_regression.py, which is not so nice. Also the API is becoming rather complicated, with several different regression fixtures for closely related use cases. It is possible and fairly easy to support the functionality of dataframe_regression, num_regression and ndarrays_regression in one regression fixture. E.g. to be used as follows:

# Case of 1D arrays with same length:
data = {"ar1": np.array([2.3, 9.4]) , "ar2": np.array([3, 4])}
dataframe = pd.DataFrame.from_dict(data)
array_regression(data)  # same as current num_regression
array_regression(dataframe)  # same as current dataframe_regression, based on type
array_regression(data, format="npz")  # same as ndarrays_regression
array_regression(dataframe, format="npz")  # not yet possible, but easy enough
# Case of ND arrays with different shapes
data = {"ar1": np.array([2.3, 9.4]) , "ar2": np.array([[3, 4], [2, 1]])}
dataframe = pd.DataFrame.from_dict(data)  # raises ValueError
array_regression(data)  # raises ValueError, suggesting npz format
array_regression(data, format="npz")  # same as ndarrays_regression

Future extensions could include support for HDF5 or user-defined container formats. The name array_regression is intended to be general, i.e. no reference to the dtype, leaving room for future support of more dtypes.

tadeu · 2021-09-08T18:32:51Z

src/pytest_regressions/ndarrays_regression.py

+                )
+                if np.issubdtype(obtained_array.dtype, np.number) and len(diff_ids) > 1:
+                    error_msg += (
+                        "  Statistics are computed for differing elements only.\n"


Nice messages for the statistics, complete and informative! 👍

tadeu · 2021-09-08T18:38:09Z

tests/test_dataframe_regression.py

@@ -238,6 +238,11 @@ def test_string_array(dataframe_regression):
    data1 = {"potato": ["delicious", "nutritive", "yummy"]}
    dataframe_regression.check(pd.DataFrame.from_dict(data1))

+    # TODO: The following fails with a confusing error message.


tadeu · 2021-09-08T19:07:25Z

I agree with the suggestion for array_regression, that would be awesome!

Nice new improvements, too, there's one failing test, most probably a file was not committed:

    def test_common_case_zero_expected(ndarrays_regression, no_regen):
        # Most common case: Data is valid, is present and should pass
        data = {"data1": np.array([0, 0, 2, 3, 0, 5, 0, 7])}
>       ndarrays_regression.check(data)
E       Failed: File not found in data directory, created:
E       - D:\a\pytest-regressions\pytest-regressions\tests\test_ndarrays_regression\test_common_case_zero_expected.npz

- Add missing files - Remove unused ones - Facilitate regeneration of all NPZ files.

tovrstra · 2021-09-08T19:13:34Z

Yes, I must have overlooked it. It should be fixed. I made some changes which make it possible to regenerate all npz files if needed. That makes it easier to remove files that were no longer used. (Just remove all and rerun the tests.)

tovrstra · 2021-09-08T19:15:12Z

I'll take a look at the array_regression idea later. I assume it is best to keep the old API alive for a while. Do you have a deprecation policy?

tadeu · 2021-09-08T19:17:23Z

By the way, with the introduction of a new API like array_regression, it would be a good moment to improve the default tolerances. Currently we use the same defaults of numpy.isclose, and having an atol that is not 0.0 by default is confusing (see numpy/numpy#10161 for example, we might even consider something different than isclose).

tadeu · 2021-09-08T19:26:18Z

I assume it is best to keep the old API alive for a while.

Indeed!

Do you have a deprecation policy?

Not that I know of, but we can define one in the issue for adding array_regression.

tovrstra · 2021-09-08T19:41:45Z

Sounds all good. Thanks for pointing out the issue regarding isclose. This is indeed something to keep in mind. At the moment, the analysis of the relative errors in this PR is consistent with np.isclose, so both should change at the same time.

Some input from others in an array_regression issue would be good before the actual coding. Do you think we should continue patching in this PR, or merge it first and open a second one for the API? I'm fine either way.

tadeu · 2021-09-08T20:09:04Z

I think that we should merge this one first, @tarcisiofischer or @nicoddemus, could you please also take a look here before it's merged, and perhaps provide some feedback on #73?

tarcisiofischer

Nice work. I have some further questions though:

What happens in the error message when the number of elements is too big? I'm guessing pytest will truncate it somewhere, right?
Is the performance reasonable for medium size arrays? For example, how much time to check a 100x100 and a 1000x1000 array? And 100x100x100 vs 200x200x200? I'm not planing to use any of this. The main point is to check if there is any bottleneck we can antecipate or at least be aware of.
What happens (which error message) if the npz file is corrupted?

tarcisiofischer · 2021-09-08T19:44:42Z

src/pytest_regressions/dataframe_regression.py

@@ -123,8 +123,7 @@ def _check_fn(self, obtained_filename, expected_filename):
            self._check_data_types(k, obtained_column, expected_column)
            self._check_data_shapes(obtained_column, expected_column)

-            data_type = obtained_column.values.dtype
-            if data_type in [float, np.float16, np.float32, np.float64]:
+            if np.issubdtype(obtained_column.values.dtype, np.inexact):


Please keep in mind that this will change behavior a bit, because complex numbers are also np.inexact:

>>> a = np.array([], dtype=np.complex128) >>> np.issubdtype(a.dtype, np.inexact) True

Perhaps add a test for this case (ignore me if you already did, by the time I'm writing this, I didn't finished the review yet), just to make sure it doesn't crash or anything...

That's correct. I wanted to support complex numbers and there is a unit test for it.

tarcisiofischer · 2021-09-08T20:03:02Z

tests/test_ndarrays_regression.py

+    data = {"data1": np.array([True] * 10)}
+    with pytest.raises(
+        AssertionError,
+        match="Data types are not the same.\nkey: data1\nObtained: bool\nExpected: int64\n",


Really minor, but, perhaps something like this would be a bit more readable?

match = "\n".join([ "Data types are not the same.", "key: data1", "Obtained: bool" "Expected: int64" ])

tarcisiofischer · 2021-09-08T20:05:11Z

tests/test_ndarrays_regression.py

+
+
+def test_arrays_of_same_shape(ndarrays_regression):
+    same_size_int_arrays = {


I believe you meant "same_shape" in the variable name? (Since the sizes are different)

I missed that one. There is not so much use in giving the dictionary an elaborate name. Just data would be good enough because there is no potential for confusion with other dictionaries.

tovrstra · 2021-09-09T18:48:38Z

Thanks for all the reviewing! I'll first try to answer some of the questions:

pytest does not seem to truncate the error messages. The length is currently limited by the THRESHOLD class attribute, which sets the maximum number of individual array elements that are compared in the error message. (It is set to 100 now, but this may be too much in practice.) We can reduce this and decide to show only the largest mismatches instead of the first ones.
For large arrays, the main performance bottleneck is writing the ZIP-compressed NPZ file. Loading is much faster. I noticed that NPZ files are written also when the reference already exists (*.obtained.npz). The biggest performance improvement would be to disable writing this file and passing the data directly to _check_fn. I'm not sure if that is an option? When filling the four suggested array sizes with random data, it takes about 3 seconds to write the 73 MB compressed ZIP file. Disabling compression makes it orders faster, but results in bigger files. Building the error message takes about 1 second for this case, which is worse than I expected. That would be the second target for optimization.
When loading a corrupt NPZ file, two error messages may appear. If the file has the ZIP prefix, but some bytes are wrong further down, one gets zipfile.BadZipFile: File is not a zip file. Without the prefix, one gets ValueError: Cannot load file containing pickled data when allow_pickle=False. Both are not too informative and can be caught and reraised with more informative error messages.

tarcisiofischer · 2021-09-13T17:54:05Z

Thanks for all the reviewing! I'll first try to answer some of the questions:

pytest does not seem to truncate the error messages. The length is currently limited by the THRESHOLD class attribute, which sets the maximum number of individual array elements that are compared in the error message. (It is set to 100 now, but this may be too much in practice.) We can reduce this and decide to show only the largest mismatches instead of the first ones.

For large arrays, the main performance bottleneck is writing the ZIP-compressed NPZ file. Loading is much faster. I noticed that NPZ files are written also when the reference already exists (*.obtained.npz). The biggest performance improvement would be to disable writing this file and passing the data directly to _check_fn. I'm not sure if that is an option? When filling the four suggested array sizes with random data, it takes about 3 seconds to write the 73 MB compressed ZIP file. Disabling compression makes it orders faster, but results in bigger files. Building the error message takes about 1 second for this case, which is worse than I expected. That would be the second target for optimization.

When loading a corrupt NPZ file, two error messages may appear. If the file has the ZIP prefix, but some bytes are wrong further down, one gets zipfile.BadZipFile: File is not a zip file. Without the prefix, one gets ValueError: Cannot load file containing pickled data when allow_pickle=False. Both are not too informative and can be caught and reraised with more informative error messages.

(1) Do what you think it's best. I'm thinking that, as a developer, I would end up assuming that I have to review the results with an external tool, if the error messages are too long. Feel free to keep this as it is, no problem, IMO.

(2) Thanks. I just wanted to have an overall idea. It doesn't seem necessary to optimize, given the run times you reported :)

(3) Argh, yeah-- Nice to be aware of this. I'll not make you implement anything right now, as I believe this PR seems pretty nice already. It is just something nice to be aware of.

Thanks!

In the end the main change was released.

tadeu approved these changes Sep 6, 2021

View reviewed changes

Improve error messages and add tests for corner cases

af561d7

tadeu reviewed Sep 8, 2021

View reviewed changes

Fix broken tests

f03cdce

- Add missing files - Remove unused ones - Facilitate regeneration of all NPZ files.

tovrstra mentioned this pull request Sep 8, 2021

array_regression combining functionality of num_regression, dataframe_regression and ndarrays_regression #73

Open

tarcisiofischer reviewed Sep 8, 2021

View reviewed changes

tarcisiofischer approved these changes Sep 13, 2021

View reviewed changes

tarcisiofischer merged commit 983a6db into ESSS:master Sep 15, 2021

This was referenced Sep 15, 2021

Pytest data regression does not recognize float64 as valid object #26

Open

Small cleanups in NDArraysRegressionFixture #75

Merged

2.2.0: pytest is failing #61

Closed

nicoddemus added a commit that referenced this pull request Jan 4, 2022

Fix CHANGELOG for #72

cc648f0

nicoddemus added a commit that referenced this pull request Jan 4, 2022

Move #72 back to 2.3.0

3151bd0

In the end the main change was released.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NDArraysRegressionFixture: regression on arrays with arbitrary shape. #72

NDArraysRegressionFixture: regression on arrays with arbitrary shape. #72

tovrstra commented Sep 3, 2021

tadeu left a comment

tadeu Sep 6, 2021

tovrstra Sep 6, 2021

tovrstra Sep 7, 2021

tovrstra commented Sep 6, 2021

tadeu commented Sep 6, 2021

tovrstra commented Sep 7, 2021

tadeu Sep 8, 2021

tadeu Sep 8, 2021

tadeu commented Sep 8, 2021

tovrstra commented Sep 8, 2021

tovrstra commented Sep 8, 2021

tadeu commented Sep 8, 2021

tadeu commented Sep 8, 2021 •

edited

tovrstra commented Sep 8, 2021

tadeu commented Sep 8, 2021

tarcisiofischer left a comment

tarcisiofischer Sep 8, 2021

tovrstra Sep 9, 2021

tarcisiofischer Sep 8, 2021

tovrstra Sep 9, 2021

tarcisiofischer Sep 8, 2021

tovrstra Sep 9, 2021

tovrstra commented Sep 9, 2021

tarcisiofischer commented Sep 13, 2021



		def test_arrays_of_same_shape(ndarrays_regression):
		same_size_int_arrays = {

NDArraysRegressionFixture: regression on arrays with arbitrary shape. #72

NDArraysRegressionFixture: regression on arrays with arbitrary shape. #72

Conversation

tovrstra commented Sep 3, 2021

tadeu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tovrstra commented Sep 6, 2021

tadeu commented Sep 6, 2021

tovrstra commented Sep 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tadeu commented Sep 8, 2021

tovrstra commented Sep 8, 2021

tovrstra commented Sep 8, 2021

tadeu commented Sep 8, 2021

tadeu commented Sep 8, 2021 • edited

tovrstra commented Sep 8, 2021

tadeu commented Sep 8, 2021

tarcisiofischer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tovrstra commented Sep 9, 2021

tarcisiofischer commented Sep 13, 2021

tadeu commented Sep 8, 2021 •

edited