Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using arg (instead of kwargs) for optional args in geo_strf_dyn_height #97

Open
rcaneill opened this issue Jun 23, 2022 · 14 comments

Comments

@rcaneill
Copy link
Contributor

Due to the transformations of arguments to arrays, if optional arguments are passed as arguments in geo_strf_dyn_height, an error is raised.
I guess that a check should be implemented in

for i, arg in enumerate(args):
if ismasked[i]:
newargs.append(masked_to_nan(arg))
elif isduck[i]:
newargs.append(arg)
else:
newargs.append(np.asarray(arg, dtype=float))
before transforming into arrays.

exemple 1

gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20], 0, 0, 1.0, 'pchip')

raises

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-88-9be1c2b56a01> in <module>
----> 1 gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20], 0, 0, 1.0, 'pchip')

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/gsw/_utilities.py in wrapper(*args, **kw)
     55                 newargs.append(arg)
     56             else:
---> 57                 newargs.append(np.asarray(arg, dtype=float))
     58 
     59         if p is not None:

ValueError: could not convert string to float: 'pchip'

exemple 2

gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20], 0, 0, )

raises

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-ced139a8dd2a> in <module>
----> 1 gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20], 0, 0, )

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/gsw/_utilities.py in wrapper(*args, **kw)
     60             kw['p'] = newargs.pop()
     61 
---> 62         ret = f(*newargs, **kw)
     63 
     64         if isinstance(ret, tuple):

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/gsw/geostrophy.py in geo_strf_dyn_height(SA, CT, p, p_ref, axis, max_dp, interp_method)
     67     with np.errstate(invalid='ignore'):
     68         # The need for this context seems to be a bug in np.ma.any.
---> 69         if np.ma.any(np.ma.diff(np.ma.masked_invalid(p), axis=axis) <= 0):
     70             raise ValueError('p must be increasing along the specified axis')
     71     p = np.broadcast_to(p, SA.shape)

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/numpy/ma/core.py in __call__(self, *args, **params)
   8200             _extras[p] = params.pop(p)
   8201         # Get the result
-> 8202         result = self._func.__call__(*args, **params).view(MaskedArray)
   8203         if "fill_value" in common_params:
   8204             result.fill_value = _extras.get("fill_value", None)

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/numpy/core/overrides.py in diff(*args, **kwargs)

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/numpy/lib/function_base.py in diff(a, n, axis, prepend, append)
   1412     if nd == 0:
   1413         raise ValueError("diff requires input that is at least one dimensional")
-> 1414     axis = normalize_axis_index(axis, nd)
   1415 
   1416     combined = []

TypeError: only integer scalar arrays can be converted to a scalar index
@ocefpaf
Copy link
Member

ocefpaf commented Jun 23, 2022

I have to confess that my preference would be to make them keyword-only arguments as in https://peps.python.org/pep-3102/ instead of changing the match_args_return. What do you think @efiring?

@efiring
Copy link
Member

efiring commented Jun 23, 2022

Agreed, I think that making optional arguments keyword-only makes good sense, in terms of general API design. I haven't looked at how this would be applied here, but I hope it would turn out to be painless.

@DocOtak
Copy link
Contributor

DocOtak commented Jun 23, 2022

How about the other way around, where we call the positional arguments using keywords?

How this came up is we are attempting to add some pint.Quantity validation to the inputs over in the gsw-xarray project. The GSW-Python library is very consistent with argument variable names and the expected input units (yay!). We found it super convenient to just convert everything to a kwargs dict and do the unit lookups (based on key) and other transformations (based on value) we need to get things compatible with GSW-Python.

As written, the match_args_return decorator ignores everything in kwargs that isn't p

@rcaneill
Copy link
Contributor Author

I opened an issue in gsw-xarray, but it is relevant to duplicate it here (DocOtak/gsw-xarray#49)

In gsw, iterables are transformed into numpy arrays when used in args, but not when used in kwargs

def match_args_return(f):
"""
Decorator for most functions that operate on profile data.
"""
@wraps(f)
def wrapper(*args, **kw):
p = kw.get('p', None)
if p is not None:
args = list(args)
args.append(p)
isarray = [hasattr(a, '__iter__') for a in args]
ismasked = [np.ma.isMaskedArray(a) for a in args]
isduck = [hasattr(a, '__array_ufunc__')
and not isinstance(a, np.ndarray) for a in args]
hasarray = np.any(isarray)
hasmasked = np.any(ismasked)
hasduck = np.any(isduck)
def fixup(ret):
if hasduck:
return ret
if hasmasked:
ret = np.ma.masked_invalid(ret)
if not hasarray and isinstance(ret, np.ndarray) and ret.size == 1:
try:
ret = ret[0]
except IndexError:
pass
return ret
newargs = []
for i, arg in enumerate(args):
if ismasked[i]:
newargs.append(masked_to_nan(arg))
elif isduck[i]:
newargs.append(arg)
else:
newargs.append(np.asarray(arg, dtype=float))

Minimal example

import gsw

# works
gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20])
# Does not work
gsw.geo_strf_dyn_height(SA=[34, 34.1, 34.2], CT=[0, 1, 2], p=[0, 10, 20])

Error raised

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/gsw/geostrophy.py in geo_strf_dyn_height(SA, CT, p, p_ref, axis, max_dp, interp_method)
     53         raise ValueError('interp_method must be one of %s'
     54                          % (interp_methods.keys(),))
---> 55     if SA.shape != CT.shape:
     56         raise ValueError('Shapes of SA and CT must match; found %s and %s'
     57                          % (SA.shape, CT.shape))

AttributeError: 'list' object has no attribute 'shape'

@ocefpaf
Copy link
Member

ocefpaf commented Jun 24, 2022

How about the other way around, where we call the positional arguments using keywords?

I'm on the fence b/c S, T, p are pretty much "default" inputs and making they kw would force users some extra typing that seems counter intuitive.

@rcaneill
Copy link
Contributor Author

How I understood Andrew's question was that the way it is done today does not allow the user to use keyword argument for T, S and p (i.e. no check is done to transform them into arrays)

@DocOtak
Copy link
Contributor

DocOtak commented Jun 24, 2022

Because the ufuncs are wrapped in python funcs, python allows you to call the functions with kwargs for the positional arguments (in whatever order you want), this behavior is as defined/outlines in PEP-3102. match_args_return does not account for this particular way of calling the functions making the kwargs effectively pass though when called this way. I think there are basically two options: restrict the argument to positional only using the / function signature syntax, or modify match_args_return to accommodate this usage.

What I think should happen is the following:

  • optional kwargs are kwarg only (using the * function signature syntax)
  • positional arguments should continue to allow kwarg usage and match_args_return is modified to work when arguments are kwargs.

@efiring
Copy link
Member

efiring commented Jun 24, 2022

I see the attraction of your proposal, but I'm initially inclined to prefer the simpler alternative (on the GSW-Python side) of using / in the signature. For GSW-Python direct users, I don't see any major advantage in letting those positional arguments be keyword arguments. Rather than make match_args_return more complicated, why not have gsw-xarray assume the responsibility of supplying the positional arguments as positional arguments?

@DocOtak
Copy link
Contributor

DocOtak commented Jun 25, 2022

@efiring can you describe the intent behind match_args_return? That is, I'd like to understand what it is meant to be doing and not assume from the implementation.

@efiring
Copy link
Member

efiring commented Jun 25, 2022

It derives from my liking for masked arrays, and I think I have a couple of versions: the original takes masked or nan-style
array inputs, normalizes to masked, passes them to the inner function, and then converts the returned array to match the inputs. So if an input was masked, then the return will be a masked array. The second, currently in GSW-Python, is the same except that it normalizes to nan-style as input to the inner function. In addition, it normalizes inputs to double precision, and returns a scalar if the inputs are all scalar. Hence the name: it tries to make the output match the inputs, while providing normalized inputs to whatever it is decorating.

@DocOtak
Copy link
Contributor

DocOtak commented Jun 25, 2022

Isn't almost all of the desired behavior done by the ufunc api already? Specifically:

  • Matching the return type is handled by the various __array_* that ndarray subclasses implement (including masked arrays), even scalars work.
  • ufuncs know what their input types are and have casting rules already, this defaults to "same_kind" but is controllable, setting casting to "unsafe" would emulate a call to .astype
  • For masked arrays, I think you can pass the inverted mask to the ufunc where argument and will skip over those positions entirely when doing calculations

The only thing I can see that is missing is setting a new mask when newly invalid data is the result. Passing in masked arrays with different mask shapes and values appears to result in a new mask that is the logical OR of the broadcasted masks.

@efiring
Copy link
Member

efiring commented Jun 25, 2022

I don't see how masked arrays can be handled properly, maintaining the greater speed of doing all the calculations with ordinary arrays, without the conversion I am doing, or additional logic to handle the where kwarg. Perhaps a decorator based on that logic could be written that would be superior to what we have. I doubt it would be better for both ndarrays and masked arrays.
Also, match_args_return is used for functions other than ufuncs; of course separate decorators could be used for ufuncs and ordinary functions.

Do you want to propose an alternative to match_args_return and try it out?

@DocOtak
Copy link
Contributor

DocOtak commented Jun 25, 2022

Here is an initial crack at it, I'm not sure the best way I can share a jupyter notebook here so I've attached a zip with the ipynb and html output versions. Let me know if there is a better way.

For now, this is only for the gsw._warpped_ufunc.* functions, I haven't looked at the more "python" funcs.

I took a look at how masked_arrays implement their own ufuncs, the two argument ufuncs are wrapped by a _MaskedBinaryOperation class, with the interesting bits being in the call method. Within a numpy error context, it calls the ufunc on the entire underlying _data array of the masked array then does some boolean ORs on the mask to figure out the new mask. For large arrays or expensive functions, this can be slow. It seems that the where argument is very speedy so I just tried inverting the mask to be compatible with where.

ufunc_proposal.zip

@efiring
Copy link
Member

efiring commented Jun 25, 2022

That looks very promising. My machine doesn't have enough memory to do the big data tests in a reasonable amount of time, but otherwise the results are similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants