Refactor find peaks 1d #3196

CSSFrancis · 2023-07-19T16:42:47Z

Description of the change

Refactor find_peaks1D to copy the find_peaks2D functionality

Add support for lazy peak finding
Removed custom dtype (for lazy plotting etc)
Renamed function for more inline

Progress of the PR

Change implemented (can be split into several points),
update docstring (if appropriate),
update user guide (if appropriate),
add an changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
Check formatting changelog entry in the readthedocs doc build of this PR (link in github checks)
add tests,
ready for review.

Minimal example of the bug fix or the new feature

import hyperspy.api as hs
import numpy as np
s = hs.signals.Signal1D(np.arange(10))
peaks = s.find_peaks()

Note that this example can be useful to update the user guide.

codecov · 2023-07-19T17:00:37Z

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (56a5db3) 81.30% compared to head (641f48a) 81.38%.
Report is 670 commits behind head on RELEASE_next_major.

Files	Patch %	Lines
hyperspy/utils/peakfinders1D.py	72.58%	10 Missing and 7 partials ⚠️

Additional details and impacted files

@@                  Coverage Diff                   @@
##           RELEASE_next_major    #3196      +/-   ##
======================================================
+ Coverage               81.30%   81.38%   +0.08%     
======================================================
  Files                     176      177       +1     
  Lines                   24406    24461      +55     
  Branches                 5681     5688       +7     
======================================================
+ Hits                    19843    19908      +65     
+ Misses                   3258     3246      -12     
- Partials                 1305     1307       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jlaehne · 2023-09-05T22:15:26Z

Good idea, but how does the ragged signal ease access to e.g. the array of peak-centre values? The data object now contains an array of 3 elements at every pixel. And even a multidimensional array if maxpeak>1.

In the ideal case, I'd like to access the peak parameters in a similar way like with the parameters of a fit results, e.g. Gaussian.center.map['values'] to get the array of center values of a Gaussian component. So something like peaks[0]['center'] for the center values of the first peak found.

CSSFrancis · 2023-09-06T01:31:16Z

Good idea, but how does the ragged signal ease access to e.g. the array of peak-centre values? The data object now contains an array of 3 elements at every pixel. And even a multidimensional array if maxpeak>1.

In the ideal case, I'd like to access the peak parameters in a similar way like with the parameters of a fit results, e.g. Gaussian.center.map['values'] to get the array of center values of a Gaussian component. So something like peaks[0]['center'] for the center values of the first peak found. As a general rule directly working with the data attribute can cause some

@jlaehne That's a good point. I would prefer to avoid the custom dtype implementations in numpy, especially when all of the column types are equivalent. You end up losing a fair bit of functionality such as slicing the data using a boolean array which is necessary if you want to say exclude some peak. There is a fairly big speed hit as well. Not being able to select some column using integers can start to be a little frustrating and it's not always very clear what the names of the indexes are. Additionally, when doing things like creating markers from a list of vectors the custom dtype starts to become problematic.

In any case custom dtypes don't really work terribly well in hyperspy. For example to get the centers at position 10,15 you would have to do peaks.data[15,10]["center"] as peaks.inav[10,15]["center"] returns an error because peaks.inav[10,15] returns a ragged signal. I'm not the biggest fan of directly dealing with the data attribute as it can cause weird things with the axes_manager

I understanding the desire to slice the data using strings rather than indexes in some cases. Unfortunately, this is currently a little difficult in hyperspy. Ragged signals are a big headache and to be honest I've never really figured out how to properly deal with them.

#3055 might help...

jlaehne · 2023-09-06T07:04:50Z

Well, the old implementation is not practical at all either. So I'm happy to change it. I like the custom dtype for the components, where you can get a whole map as it is defined for the overall array. But for the peak-finder result, the custom dtype is defined at every position, which definitely is bogus. However, if we are already breaking the api, I would like to have an easy to use solution - and we should add some examples to the documentation on how to access the data. I thought keywords of a custom dtype might help, as slicing the multi-dimensional dataset is not straightforward. I would be happy with another solution if it is well documented. Currently, I struggle with both the old and new implementation to get the information out.

What spontaneously comes to my mind is that we should be able to extract:

Peak parameters at a certain position
1st ... nth peak if maxpeaks>1
Maps of individual peak parameters for one of multiple peaks found
While allowing for slicing with boolean arrays

CSSFrancis · 2023-09-06T16:18:49Z

Well, the old implementation is not practical at all either. So I'm happy to change it. I like the custom dtype for the components, where you can get a whole map as it is defined for the overall array. But for the peak-finder result, the custom dtype is defined at every position, which definitely is bogus. However, if we are already breaking the api, I would like to have an easy to use solution - and we should add some examples to the documentation on how to access the data. I thought keywords of a custom dtype might help, as slicing the multi-dimensional dataset is not straightforward. I would be happy with another solution if it is well documented. Currently, I struggle with both the old and new implementation to get the information out.

Yea that part frustrates me as well. I find it very ambiguous about what you are actually looking at.

What spontaneously comes to my mind is that we should be able to extract:

Peak parameters at a certain position

1st ... nth peak if maxpeaks>1

Maps of individual peak parameters for one of multiple peaks found
While allowing for slicing with boolean arrays

Additionally I would like to add:

Information about pixel vs calibrated values
simple method for annotation (i.e. to peaks method)
Implicit casting from the find_peaks method.

I've played around with this a lot as well/ looked for different implementations in other packages and haven't really seen anything that I've actually liked. Honestly the fastest way to handle vectors is to flatten everything into a long sorted list, but in that case you can't really have lazy lists of vectors and the data is a little awkward.

One thing that I am going to do with pyxem is add the alias "diffraction_signal" to the DiffractionVector class which should make it so that the find_peaks method returns a DiffractionVector rather than a BaseSignal. At that point my plan is to just make all of the above options available through custom methods.

We could add a VectorSignal class if we want to share some methods:

class VectorSignal(BaseSignal):

    """A Vector signal is a Ragged array of Vectors.  At each navigation position there is some list of
    n x m vectors where n is variable but m is fixed and the dimensions of m are defined by the `.axes_manager.vector_axes`
    attribute"""
    
    def __init__(self,):
        self.ivec = SpecialVectorSlicer(self) # for slicing along the ragged dimension using the map function

class SpecialVectorSlicer:

    def __init__(self,signal):
        self.signal=signal
    
    def __getitem__(value):
        # allow getting an item based on the signal.axes_manager.vector_signals indexes (i.e. s.ivec["center"])
        # allow for more complex slicing as well?  s.ivec["center", "width" ]
        # Boolean operations?  s.ivec["center>5", "width<1"]
        # Filtering based vector slicing? s[s.ivec["center>5", "width<1"]]
        return self.signal.map(getitem_vector, value, inplace=False)

I've implemented something fairly similar a couple of times but it's never really been clean enough to add to hyperspy.

I realize that isn't the best answer but every implementation I've written has been too fragile for a good generic addition to hyperspy. That's why I've moved towards trying to shift the implementation to pyxem. At least there I can have a bit better control and as long as upstream in hyperspy:

The signal axes are saved in the metadata
The signal is automatically cast to a pyxem DiffractionVector Signal

Then it should be easy to handle things downstream in pyxem and maybe eventually we can move a more stable implementation back to hyperspy.

jlaehne · 2023-09-06T17:45:12Z

Just thinking loud, could find_peaks return a model, where each peak is a registered component. It would not result from a fit, but the type of data is pretty similar so that it would be nice to have a similar datastructure and way to access it.

CSSFrancis · 2023-09-06T20:19:34Z

So the map function could return a model or a list of components but I don't know if it would work the same way that a model would in hyperspy.

I've considered this in the past as I would like the ability to use the map function for model fitting if you already have preset idea of the fitting parameters. The problem seems to be that the model class is not really designed with multidimensionality in mind. I'm not necessarily sure that using the Model class is the exact right way to go because of that.

Another thing to consider is that Components are separate and for peaks we necessarily have all of the same components. As a result Components would unnecessarily complicate things and make things slow to manipulate and operate on as a result.

ericpre

Regarding the structure that it should return, what's about returning separate ragged signals, one for each characteristic: position, width, and height?
I am not sure if we would expect the find_peak function to be use for mapping purposes, or at least not directly, maybe find the finds on a sum/average data and parse these to know the feature of interest of the dataset and use them to create a model or to get some maps.

ericpre · 2023-09-24T16:14:08Z

hyperspy/_signals/signal1d.py

+        peaks.metadata.add_node("Peaks")  # add information about the signal Axes
+        peaks.metadata.Peaks.signal_axes = deepcopy(self.axes_manager.signal_axes)


Does it mean that the data are in "pixels"? Why not use the axes information to convert to calibration values? If in some scenarios, it is convenient to have the data in "pixel", maybe add an argument to make it optional, with the default of returning calibrated values?

That's just how the find_peaks (2D) method works so I copied it. We should have the option in returning real units in both cases. In any case saving the axes_manager in the metadata is a good backup.

Indeed, it would be very good to have both (1D and 2D) returning similar output!

Yes, but I do would want to get calibrated units without a detour through the axis manager. For me that would be the actually be the expected default behavior.

CSSFrancis · 2023-09-24T17:42:05Z

Regarding the structure that it should return, what's about returning separate ragged signals, one for each characteristic: position, width, and height?

@ericpre I don't think that is a great idea. 1st returning multiple outputs isn't (currently) supported from the map function. I've got some "workingish" code that would allow multiple outputs but we would have to add some things like a hs.compute() function to merge the task graphs so things run efficiently.

I am not sure if we would expect the find_peak function to be use for mapping purposes, or at least not directly, maybe find the finds on a sum/average data and parse these to know the feature of interest of the dataset and use them to create a model or to get some maps.

I'm not sure either. I'm mostly expecting what is returned from this function to get pushed to the sub packages. For pyxem there are so many things we want to do with vectors and it's better to just handle it there and make sure all of the information is maintained.

ericpre · 2023-09-24T17:44:49Z

If there is consensus on what would be the API, I think that it would be good (and easily feasible) to get it done for the 2.0 release as this will be the API. @jlaehne, @CSSFrancis, what do you think?

CSSFrancis · 2023-09-24T17:53:25Z

@ericpre I'm all for getting this in before the 2.0.0 release :) I'd love some suggestions on what to do here.

What do you think about creating a new ragged signal?

I think we want:

units for each column
name for each column
if each column is in pixels or calibrated units.

With some functions to:

convert from pixel units--> real units
get the values from some column

ericpre · 2023-09-24T18:15:12Z

Sorry, I didn't see your message above! Not as easy then... 😅

In principle, the calibration information should go in s.metadata.Signal.quantity:
https://hyperspy.org/hyperspy-doc/dev/reference/metadata.html#signal and in the gain (https://hyperspy.org/hyperspy-doc/dev/reference/metadata.html#variance-linear-model)? Currently, as this is for "intensity", it may not work well for 2D signal...

Maybe we need to leave to leave for after the 2.0 release... in this case, we could have the old and the new function living side by side with the old one deprecated and they would be fairly standalone.

ericpre · 2023-09-25T07:15:01Z

The Signal2D.find_peaks method has a get_intensity argument, which add a column to the ragged array. Maybe we should still use this approach for now, this is already an improvement on the current situation and will be consistent with the Signal2D counterpart?

Regarding utilities to convert units, etc. maybe it would be best to leave for later, as this will most likely not be an API break?

CSSFrancis · 2023-09-25T14:01:19Z

@ericpre this is fine. I think that I will probably start to write a VectorSignal class in pxyem and then once we figure out some of the bugs we can start to move some of the implementation upstream. As long as I set the signal_type to "diffraction" the results of the find_peaks function returns a pyxem signal. I'll just need to add a 0-D Signal for each of the signals.

The Signal2D.find_peaks method has a get_intensity argument, which add a column to the ragged array. Maybe we should still use this approach for now, this is already an improvement on the current situation and will be consistent with the Signal2D counterpart?

So what would be action here before the 2.0.0 release?

Not to crush the hopes of getting the real units to work, but currently that would be helped significantly by #3031 and #3055 as they add in features like converting arrays of points to calibrated values for all axes types. Otherwise you would have to rewrite that part.

Regarding utilities to convert units, etc. maybe it would be best to leave for later, as this will most likely not be an API break?

Yea I think this requires a better handle on the Axes classes in hyperspy. :) So I think we have come full circle.

ericpre · 2023-09-25T15:52:36Z

So what would be action here before the 2.0.0 release?

I would suggest:

use the same approach as the Signal2D.find_peaks: returns ragged array with 1, 2 or 3 columns (?) as defined by an argument. The main difference is to return a ragged array.
(good to have) Add an argument to use "pixel" or "calibrated values"? This doesn't break the API, so it can well be done later.

ericpre · 2023-10-31T21:51:05Z

@CSSFrancis, what are your thoughts for this PR?

CSSFrancis · 2023-11-01T01:46:13Z

@ericpre I don't know if I'll have time to come back to this in the next couple of days or if we have a great answer for how to handle this kind of data.

I don't really need this for anything I am planning on doing so it's a little lower priority for me :) I would say just let this slide and then we can deprecate and replace it

ericpre · 2023-11-01T09:02:20Z

Okay, let's park this for now then, particularly if you think that it would be benefit from other PRs, like #3031 and #3055. In any case, the deprecation cycle will be simple to handle as we can simply add Signal1D.find_peaks and keep Signal1D.find_peaks1D_ohaver.

ericpre · 2023-12-22T14:38:32Z

Re-opening because this has been closed automatically by mistake!

CSSFrancis added 4 commits July 19, 2023 11:14

Refactor: Renamed peak_finding in 1D to match 2D Case

c0d7bcc

Refactor: Documentation and testing to find_peaks

ef8f482

Refactor: Find peaks 1D now operated lazily

efdaea1

Refactor: EDS example working

d56b0f9

CSSFrancis changed the base branch from RELEASE_next_minor to RELEASE_next_major July 19, 2023 16:43

CSSFrancis added 2 commits July 19, 2023 11:47

Documentation: Added Changelog for hyperspy#3196

d986608

NewFeature: Added tracking of signal_axes

641f48a

CSSFrancis mentioned this pull request Jul 19, 2023

Release 1.7.x and 2.0.0 #2996

Closed

57 tasks

CSSFrancis added this to the v2.0 Split milestone Jul 20, 2023

CSSFrancis added the status: needs review label Jul 20, 2023

ericpre reviewed Sep 24, 2023

View reviewed changes

ericpre added status: WIP and removed status: needs review labels Sep 25, 2023

ericpre removed this from the v2.0 Split milestone Nov 1, 2023

CSSFrancis mentioned this pull request Nov 21, 2023

Hyperspy 2.1.0 Release Tracker #3272

Closed

10 tasks

CSSFrancis deleted the branch hyperspy:RELEASE_next_minor December 22, 2023 14:03

CSSFrancis closed this Dec 22, 2023

ericpre reopened this Dec 22, 2023

ericpre changed the base branch from RELEASE_next_major to RELEASE_next_minor December 22, 2023 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor find peaks 1d #3196

Refactor find peaks 1d #3196

CSSFrancis commented Jul 19, 2023 •

edited

codecov bot commented Jul 19, 2023 •

edited

jlaehne commented Sep 5, 2023

CSSFrancis commented Sep 6, 2023 •

edited

jlaehne commented Sep 6, 2023

CSSFrancis commented Sep 6, 2023 •

edited

jlaehne commented Sep 6, 2023

CSSFrancis commented Sep 6, 2023

ericpre left a comment

ericpre Sep 24, 2023

CSSFrancis Sep 24, 2023

ericpre Sep 24, 2023

jlaehne Sep 25, 2023

CSSFrancis commented Sep 24, 2023

ericpre commented Sep 24, 2023

CSSFrancis commented Sep 24, 2023

ericpre commented Sep 24, 2023

ericpre commented Sep 25, 2023

CSSFrancis commented Sep 25, 2023

ericpre commented Sep 25, 2023

ericpre commented Oct 31, 2023

CSSFrancis commented Nov 1, 2023 •

edited

ericpre commented Nov 1, 2023

ericpre commented Dec 22, 2023

		peaks.metadata.add_node("Peaks") # add information about the signal Axes
		peaks.metadata.Peaks.signal_axes = deepcopy(self.axes_manager.signal_axes)

Refactor find peaks 1d #3196

Are you sure you want to change the base?

Refactor find peaks 1d #3196

Conversation

CSSFrancis commented Jul 19, 2023 • edited

Description of the change

Progress of the PR

Minimal example of the bug fix or the new feature

codecov bot commented Jul 19, 2023 • edited

Codecov Report

jlaehne commented Sep 5, 2023

CSSFrancis commented Sep 6, 2023 • edited

jlaehne commented Sep 6, 2023

CSSFrancis commented Sep 6, 2023 • edited

jlaehne commented Sep 6, 2023

CSSFrancis commented Sep 6, 2023

ericpre left a comment

Choose a reason for hiding this comment

ericpre Sep 24, 2023

Choose a reason for hiding this comment

CSSFrancis Sep 24, 2023

Choose a reason for hiding this comment

ericpre Sep 24, 2023

Choose a reason for hiding this comment

jlaehne Sep 25, 2023

Choose a reason for hiding this comment

CSSFrancis commented Sep 24, 2023

ericpre commented Sep 24, 2023

CSSFrancis commented Sep 24, 2023

ericpre commented Sep 24, 2023

ericpre commented Sep 25, 2023

CSSFrancis commented Sep 25, 2023

ericpre commented Sep 25, 2023

ericpre commented Oct 31, 2023

CSSFrancis commented Nov 1, 2023 • edited

ericpre commented Nov 1, 2023

ericpre commented Dec 22, 2023

CSSFrancis commented Jul 19, 2023 •

edited

codecov bot commented Jul 19, 2023 •

edited

CSSFrancis commented Sep 6, 2023 •

edited

CSSFrancis commented Sep 6, 2023 •

edited

CSSFrancis commented Nov 1, 2023 •

edited