Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds the functionality of Ufunc to NDCube #666

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ViciousEagle03
Copy link
Member

@ViciousEagle03 ViciousEagle03 commented Mar 10, 2024

PR Description
This PR address the issue: #591

I've tried to add the support of Arithmetic operations to NDCube using Numpy Ufunc
I have removed the test cases in ndcube/tests/test_ndcube.py in the arithmetic section which doesn't support __array_ufunc__

Note: Before proceeding to add the remaining test cases and Changelog, I wanted to ensure that the proof of concept is correct.

@DanRyanIrish
Copy link
Member

Thanks @ViciousEagle03 for this PR and for opening it earlier rather than later so we can discuss the approach.

Firstly, I'll say that I'm not an expert in ufuncs so it's possible I've misunderstood some details in your implementation. Secondly, despite my concerns below, I think supporting ufuncs is a useful feature for users. The question is how to do this, and this PR is a great chance to discuss it.

My understanding is that the API you're implementing for ufuncs is different to that for the arithmetic operators. The ufuncs basically extract the data array and then proceed as normal, whereas the arithmetic operations try to take the uncertainties and mask into account, where appropriate. For example, cube + np.ones(cube.data.shape) will return an NDCube where all elements in the data array have been incremented by 1 (same WCS, uncertainties, mask, etc.), whereas np.add(cube, np.ones(cube.data.shape)) will return a numpy array giving the result of cube.data + np.ones(cube.data.shape). Is this correct?

Assuming that it is, this opens an important discussion. My intuition as a user would be that np.add and + should behave the same when adding an array to an NDCube. By this I mean both should consider the WCS, uncertainties and mask if and where appropriate, and return an NDCube. If users are only interested in the data array without consideration of the WCS, uncertainty and mask, a simple and explicit API is already available, namely np.add(cube.data, np.ones(cube.data.shape)). I would argue the same applies to np.subtract and np.multiply. What do you think? Is there another perspective I'm not considering?

If we continue to assume that ufuncs should also consider the wcs, mask and uncertainty, it has implications for np.maximum, np.equal as well. Specifically to your question about whether np.equal should take into account the wcs, I would argue yes. In fact this is a major reason it has not yet been implemented. Determining whether two APE-14 wcs's are equivalent is not always trivial, or it can be computationally expensive. Another consideration is whether values in the data array should be the same, or just have overlapping uncertainty ranges. Furthermore, should masked values be excluded from the calculation? I would argue yes. But these are questions that we haven't agreed a consensus to.

As for np.maximum, I gave this some thought a while back and asked around to get a sense of what users' intuitions would be. Again, the handling of uncertainties means that the behaviour is ambiguous. Should the maximum be the maximum value in the data array? Or the maximum of data + uncertainty? Or even the maximum of data - uncertainty? Perhaps it should return both the maximum value and the uncertainty associated with that. And the uncertainty could be asymmetric if drawn from multiple elements in the NDCube. I realised that there wasn't an obvious, unified expectation from users, and so we decided not to implement maximum. But that isn't to say that it shouldn't be. But it is more complicated than simply np.maximum(cube.data)).

Fundamentally, the issue lies in how to treat the wcs, mask and uncertainty, whether it be arithmetic operators or ufuncs. If these are going to be ignored and a numpy array returned, users can easily perform their operation on cube.data, and there's no real need to implement operators on NDCube.

What are your thoughts? And have I misunderstood your implementation?

@ViciousEagle03
Copy link
Member Author

ViciousEagle03 commented Mar 15, 2024

Hey @DanRyanIrish, Thanks for the feedback
I think both the ufunc and arithmetic operators APIs should provide the same implementation for the ufuncs and arithmetic operators and so I have changed the __array_ufunc__ method to take in account the same.The ufuncs now also considers the uncertainties and mask where necessary.
Regarding the implementation of np.equal ufunc, I think it is ambiguous to define the implementation of it
Given the computational expense of comparing the WCSes of two NDCube ,if one is only comparing the data array of the cubes then it can be achieved by just np.equal(cube1.data, cube2.data) or if equal means comparing the reference of both the cubes then we can check it by is.So, you are right to first establish what the np.equal should achieve and if you have any particular implementation you would like me to add to np.equal, I would be happy to change the patch accordingly.
I'm a bit uncertain about np.maximum,
As far as I understand it, np.maximum typically operates as a binary ufunc, changing the convention of np.maximum could be misleading or confusing, Or did I misunderstood the above?
And if we implement the np.maximum, we are returning the two things below, right?

  • the maximum value in the NDCube.data array
  • the maximum of the data array and the corresponding uncertainty value for each corresponding data

Copy link
Member

@DanRyanIrish DanRyanIrish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ViciousEagle03. Thanks for the update to your PR and apologies for the delay in my reply. I am still consumed with another project. I am still trying to understand exactly the API you're implementing. I've left some in line comments in places where I'm confused.

Perhaps it would be a good idea to list out as a comment on this PR the exact behaviour you are trying to implement. For example, the output types of

  • np.add(cube1, array2)
  • np.add(cube1, cube2)
  • np.add(array1, cube2)
    and what attributes are considered in the calculation, i.e. .data, .mask, .uncertainty. And the same for np.subtract, and np.multiply.

@@ -886,28 +886,32 @@ def _new_instance_from_op(self, new_data, new_unit, new_uncertainty):
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
if method == '__call__':
if ufunc == np.add:
if isinstance(inputs[0], NDCube):
if isinstance(inputs[0], NDCube) and kwargs.get("dunder"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "dunder" a standard kwarg or custom for this implementation? And what does it mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used the dunder (not a standard kwarg) kwarg here to distinguish between the operators and numpy ufuncs to properly handle the operator overloading.

@@ -886,28 +886,32 @@ def _new_instance_from_op(self, new_data, new_unit, new_uncertainty):
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
if method == '__call__':
if ufunc == np.add:
if isinstance(inputs[0], NDCube):
if isinstance(inputs[0], NDCube) and kwargs.get("dunder"):
new_data = inputs[0].data + inputs[1]
return new_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that np.add(cube1, array2) would return an array?

This looks different to the implementation for np.sub which appears like it would return an NDCube.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.add(cube1, array2) would return an NDCube.
The np.subtract utilises the dunder method __sub__ and in turn the __sub__ dunder method uses the __add__ dunder method and since np.subtract utilises the __add__ so we don't need to check the way we did in np.add as __add__ dunder method will call the np.add which in turn will do the checking and hence both returns the same type.

@ViciousEagle03
Copy link
Member Author

Hey @DanRyanIrish,
Apologies for the delay, I have introduced the kwarg dunder to handle the operator overloading properly and to leverage the existing python operators. Since there is little documentation available regarding the implementation of array_ufunc functionality for complex custom objects so the way i have implemented it might be a little confusing so i will try to explain the API I have implemented here,
If array2 supports array_ufunc
• Calling np.add(cube1, array2) would return an NDCube.
• Attempting np.add(cube1, cube2) would return NotImplented, triggering a TypeError.
• Calling np.add(array1, cube2) would also return an NDCube.
By default, ufuncs triggers the array_ufunc mechanism, while arithmetic operators invoke the respective dunder methods.
To ensure both ufuncs and arithmetic operators account for masks and uncertainties appropriately and follow the same API, I've used dunder to specify when an arithmetic operator is being used.
For example in case of, np.add(cube1, array2), the control proceeds as below:
array_ufunc()[dunder=false] --> __add__() --> array_ufunc() [dunder=True] --> now it proceeds similarly as + operator.
All numpy ufuncs leverage the respective existing dunder methods, which ensures the implementation is consistent.

@nabobalis nabobalis added this to the 2.4.0 milestone May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants