`array_regression` combining functionality of `num_regression`, `dataframe_regression` and `ndarrays_regression` #73

tovrstra · 2021-09-08T19:50:51Z

I'm, copying the relevant bits from #72 below:

It is possible and fairly easy to support the functionality of dataframe_regression, num_regression and ndarrays_regression in one regression fixture. E.g. to be used as follows:

# Case of 1D arrays with same length:
data = {"ar1": np.array([2.3, 9.4]) , "ar2": np.array([3, 4])}
dataframe = pd.DataFrame.from_dict(data)
array_regression(data)  # same as current num_regression
array_regression(dataframe)  # same as current dataframe_regression, based on type
array_regression(data, format="npz")  # same as ndarrays_regression
array_regression(dataframe, format="npz")  # not yet possible, but easy enough
# Case of ND arrays with different shapes
data = {"ar1": np.array([2.3, 9.4]) , "ar2": np.array([[3, 4], [2, 1]])}
dataframe = pd.DataFrame.from_dict(data)  # raises ValueError
array_regression(data)  # raises ValueError, suggesting npz format
array_regression(data, format="npz")  # same as ndarrays_regression

Future extensions could include support for HDF5 or user-defined container formats. The name array_regression is intended to be general, i.e. no reference to the dtype, leaving room for future support of more dtypes.

This could also be a good opportunity to revise default tolerances in the new API as mentioned by @tadeu:

With the introduction of a new API like array_regression, it would be a good moment to improve the default tolerances. Currently we use the same defaults of numpy.isclose, and having an atol that is not 0.0 by default is confusing (see numpy/numpy#10161 for example, we might even consider something different than isclose).

Other changes can be considered too, obviously. (E.g. NPZ as default, because it supports more array shapes.) Feel free to comment.

For code reuse, it could be preferable to rewrite fixtures dataframe_regression, num_regression using the same underlying ArrayRegressionFixture class. This is probably not strictly necessary.

The ndarrays_regression fixture can eventually be removed because it was not part of any release yet (at the moment).

The text was updated successfully, but these errors were encountered:

tovrstra · 2021-09-15T20:34:11Z

I checked out the issue on np.isclose. I agree that the default atol should be zero, because it is a potential source of bugs and major confusion. (See NumPy issue for details)

Looking further at the issue, there are a few ways to perform the comparison:

# np.isclose
abs(a - b) <= (atol + rtol * abs(b))
# math.isclose (symmetric form)
abs(a - b) <= max(rtol * max(abs(a), abs(b)), atol)
# third option
abs(a - b) <= max(rtol * abs(b), atol)

Any of these would be acceptable, as long as the default atol is zero. Some are just slightly nicer than others.
For our use case, the third option might be conceptually ideal, because it avoids the weird sum used innp.isclose and it reflects correctly that b is somehow special, because it is the expected value. The symmetric form is the nicest when the the compared numbers have somehow equal status. The first form may occasionally confuse the user, but that risk is small.

I'd be pragmatic and not care too much about the three different forms, just impose a default atol of zero, and use np.isclose.

tadeu mentioned this issue Sep 8, 2021

NDArraysRegressionFixture: regression on arrays with arbitrary shape. #72

Merged

tovrstra mentioned this issue Sep 15, 2021

Pytest data regression does not recognize float64 as valid object #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`array_regression` combining functionality of `num_regression`, `dataframe_regression` and `ndarrays_regression` #73

`array_regression` combining functionality of `num_regression`, `dataframe_regression` and `ndarrays_regression` #73

tovrstra commented Sep 8, 2021 •

edited by tadeu

tovrstra commented Sep 15, 2021

array_regression combining functionality of num_regression, dataframe_regression and ndarrays_regression #73

array_regression combining functionality of num_regression, dataframe_regression and ndarrays_regression #73

Comments

tovrstra commented Sep 8, 2021 • edited by tadeu

tovrstra commented Sep 15, 2021

`array_regression` combining functionality of `num_regression`, `dataframe_regression` and `ndarrays_regression` #73

`array_regression` combining functionality of `num_regression`, `dataframe_regression` and `ndarrays_regression` #73

tovrstra commented Sep 8, 2021 •

edited by tadeu