Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Masked Correlation Methods #750

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

CSSFrancis
Copy link
Member

@CSSFrancis CSSFrancis commented Apr 8, 2021


name: Improving Masked Correlation Methods
about: I think that pyXem might be benefited by a couple of general correlation methods which can be worked to improved rather than having multiple different spots where correlations are calculated. This PR attempts to create a general method (or set of methods) which can be called.


Checklist

  • Add in _cross_correlate_masked function
  • Add in _autocorrelate_masked function
  • Add in _cross_correlate_unmasked function
  • Add in _autocorrelate_unmasked function
  • Add in tests

What does this PR do? Please describe and/or link to an open issue.
This code expands the ability of correlating two images to give much more flexibility in how the correlation method is called. More specifically this method allows for:

  • Normalized Cross-Correlation with a mask (or set of masks)
  • Ability to perform Correlation on N-Dimensions
  • Control on which axes are padded before the correlation (allows for wrapping in the case of a polar image)
  • Auto correlations with masks (in N-Dimensions)

I think there is also a possibility that this will help out with some of the code in indexation_utils. After this PR I want to write a template matching function that matches a full template in polar coordinates.

Work in progress?
Because these methods are mostly FFT's they could be greatly improved by finding a GPU implementation. I might try to play around with adding in GPU support but it might be that the GPU implementation has to be upstreamed to the map function.

@coveralls
Copy link

coveralls commented Apr 8, 2021

Coverage Status

Coverage remained the same at 100.0% when pulling ae48035 on CSSFrancis:improve_correlation into 2395401 on pyxem:master.

@din14970
Copy link
Contributor

din14970 commented May 6, 2021

Because these methods are mostly FFT's they could be greatly improved by finding a GPU implementation. I might try to play around with adding in GPU support but it might be that the GPU implementation has to be upstreamed to the map function.

I've been playing around a bit with GPU stuff recently; in my opinion the map function itself should be GPU/CPU independent, which it already is I think as it relies on Dask. Only the function passed to map should have a GPU equivalent implementation. If the function relies only on basic numpy and scipy functionality you might get away with dropping in cupy and re-using most of the function. You may use a construct like

to check if cupy is available to the user, and some boilerplate like https://github.com/pyxem/pyxem/blob/153083bd252a11e09aa187bac43834ca9238364c/pyxem/utils/cuda_utils.py to try to be clever in minimized code duplication. For "loopy" stuff a custom cuda kernel may be in order.

The main issue one has to keep in mind is that copying the data from memory or disk to the GPU and back is expensive, so ideally one does a copy, does all the processing on the GPU, and then copies back. Creating a cupy array from numpy/disk does this automatically, creating a numpy array from cupy sends it back. It does not make sense for each operation to send the data to the GPU and back, so hyperspy signals should support cupy arrays natively as data. One concern especially for large 4D-STEM datasets could be GPU memory management. If the dataset is 10+ Gb, some of the cheaper GPU's can't fit the entire thing into their memory. In this case Dask may come to the rescue by mapping cp.array over the data, then mapping all the operations on this lazy dataset, then once finished mapping cp.asnumpy over it. I'm not sure how this could be streamlined within hyperspy.

@CSSFrancis
Copy link
Member Author

The main issue one has to keep in mind is that copying the data from memory or disk to the GPU and back is expensive, so ideally one does a copy, does all the processing on the GPU, and then copies back. Creating a cupy array from numpy/disk does this automatically, creating a numpy array from cupy sends it back. It does not make sense for each operation to send the data to the GPU and back, so hyperspy signals should support cupy arrays natively as data. One concern especially for large 4D-STEM datasets could be GPU memory management. If the dataset is 10+ Gb, some of the cheaper GPU's can't fit the entire thing into their memory. In this case Dask may come to the rescue by mapping cp.array over the data, then mapping all the operations on this lazy dataset, then once finished mapping cp.asnumpy over it. I'm not sure how this could be streamlined within hyperspy.

@din14970 Thanks for the comment. For the most part I think I am more interested in the second case. My datasets are in the 10-100+ Gb in size so there isn't really an easy way out for me as I can't load the data entirely onto a GPU, or even into RAM for that fact. I think the best case scenario is just adding in the line x = x.map_blocks(cupy.asarray) to the map function if you want it to run on the GPU like you said here hyperspy/hyperspy#2670 (comment)

I might try and mess around with this to see if I can't get this to just run right out of the box just dropping in a Cupy array rather than a numpy one.

@CSSFrancis
Copy link
Member Author

@pc494 I think that this one should be ready to go. I'm going to hold off on adding any GPU enabled stuff for now.

Copy link
Member

@pc494 pc494 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a bit of cleaning up in terms of the docstrings.

pad_axes=(-1, -2),
inplace=False,
**kwargs):
r"""Returns cross correlation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring could do with being more specific, as there are a number of subtly different "cross correlations" in the literature.

inplace=False,
**kwargs):
r"""Returns cross correlation.
Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these no longer match the function signature.

Angular correlation with a static matst for

"""
correlation = self.map(cross_correlate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would put this inside the if clause for clarity

overlap_ratio=0.3):
"""
Masked normalized cross-correlation between two arrays.
Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these don't agree with the function signature

pad_axis=(-1),
overlap_ratio=0.3):
"""
Masked normalized cross-correlation between two arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the latex form of the equation would help clarity here

@dnjohnstone
Copy link
Member

@CSSFrancis there are just a few small things here to close this one out, are you able to do them, or should someone else pick it up?

@CSSFrancis
Copy link
Member Author

@dnjohnstone I was planning on doing this quickly today. I've got most of the changes done, just needed to find the time to clean it up some more.

@CSSFrancis
Copy link
Member Author

After looking at this a little bit more I think that this might better be added to skimage than into pyxem. I will start a PR there and see if there is any interest adding to skimage before copying the code into pyxem.

@CSSFrancis
Copy link
Member Author

CSSFrancis commented Sep 17, 2021

This PR should be closed with the addition into scikit-image below. All further correlation methods should just call the cross_correlate_masked function from skimage.

scikit-image/scikit-image#5573

Holding off for now until this is merged...

@CSSFrancis CSSFrancis added status: stalled won't fix yet Issues that will be dealt with post 1.0.0 labels Apr 27, 2022
@CSSFrancis CSSFrancis self-assigned this Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: stalled won't fix yet Issues that will be dealt with post 1.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants