-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor find peaks 1d #3196
base: RELEASE_next_minor
Are you sure you want to change the base?
Refactor find peaks 1d #3196
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## RELEASE_next_major #3196 +/- ##
======================================================
+ Coverage 81.30% 81.38% +0.08%
======================================================
Files 176 177 +1
Lines 24406 24461 +55
Branches 5681 5688 +7
======================================================
+ Hits 19843 19908 +65
+ Misses 3258 3246 -12
- Partials 1305 1307 +2 ☔ View full report in Codecov by Sentry. |
Good idea, but how does the ragged signal ease access to e.g. the array of peak-centre values? The data object now contains an array of 3 elements at every pixel. And even a multidimensional array if In the ideal case, I'd like to access the peak parameters in a similar way like with the parameters of a fit results, e.g. |
@jlaehne That's a good point. I would prefer to avoid the custom dtype implementations in numpy, especially when all of the column types are equivalent. You end up losing a fair bit of functionality such as slicing the data using a boolean array which is necessary if you want to say exclude some peak. There is a fairly big speed hit as well. Not being able to select some column using integers can start to be a little frustrating and it's not always very clear what the names of the indexes are. Additionally, when doing things like creating markers from a list of vectors the custom dtype starts to become problematic. In any case custom dtypes don't really work terribly well in hyperspy. For example to get the centers at position 10,15 you would have to do I understanding the desire to slice the data using strings rather than indexes in some cases. Unfortunately, this is currently a little difficult in hyperspy. Ragged signals are a big headache and to be honest I've never really figured out how to properly deal with them. #3055 might help... |
Well, the old implementation is not practical at all either. So I'm happy to change it. I like the custom dtype for the components, where you can get a whole map as it is defined for the overall array. But for the peak-finder result, the custom dtype is defined at every position, which definitely is bogus. However, if we are already breaking the api, I would like to have an easy to use solution - and we should add some examples to the documentation on how to access the data. I thought keywords of a custom dtype might help, as slicing the multi-dimensional dataset is not straightforward. I would be happy with another solution if it is well documented. Currently, I struggle with both the old and new implementation to get the information out. What spontaneously comes to my mind is that we should be able to extract:
|
Yea that part frustrates me as well. I find it very ambiguous about what you are actually looking at.
Additionally I would like to add:
I've played around with this a lot as well/ looked for different implementations in other packages and haven't really seen anything that I've actually liked. Honestly the fastest way to handle vectors is to flatten everything into a long sorted list, but in that case you can't really have lazy lists of vectors and the data is a little awkward. One thing that I am going to do with pyxem is add the alias "diffraction_signal" to the We could add a VectorSignal class if we want to share some methods: class VectorSignal(BaseSignal):
"""A Vector signal is a Ragged array of Vectors. At each navigation position there is some list of
n x m vectors where n is variable but m is fixed and the dimensions of m are defined by the `.axes_manager.vector_axes`
attribute"""
def __init__(self,):
self.ivec = SpecialVectorSlicer(self) # for slicing along the ragged dimension using the map function
class SpecialVectorSlicer:
def __init__(self,signal):
self.signal=signal
def __getitem__(value):
# allow getting an item based on the signal.axes_manager.vector_signals indexes (i.e. s.ivec["center"])
# allow for more complex slicing as well? s.ivec["center", "width" ]
# Boolean operations? s.ivec["center>5", "width<1"]
# Filtering based vector slicing? s[s.ivec["center>5", "width<1"]]
return self.signal.map(getitem_vector, value, inplace=False) I've implemented something fairly similar a couple of times but it's never really been clean enough to add to hyperspy. I realize that isn't the best answer but every implementation I've written has been too fragile for a good generic addition to hyperspy. That's why I've moved towards trying to shift the implementation to pyxem. At least there I can have a bit better control and as long as upstream in hyperspy:
Then it should be easy to handle things downstream in pyxem and maybe eventually we can move a more stable implementation back to hyperspy. |
Just thinking loud, could |
So the map function could return a model or a list of components but I don't know if it would work the same way that a model would in I've considered this in the past as I would like the ability to use the Another thing to consider is that Components are separate and for peaks we necessarily have all of the same components. As a result Components would unnecessarily complicate things and make things slow to manipulate and operate on as a result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the structure that it should return, what's about returning separate ragged signals, one for each characteristic: position, width, and height?
I am not sure if we would expect the find_peak
function to be use for mapping purposes, or at least not directly, maybe find the finds on a sum/average data and parse these to know the feature of interest of the dataset and use them to create a model or to get some maps.
peaks.metadata.add_node("Peaks") # add information about the signal Axes | ||
peaks.metadata.Peaks.signal_axes = deepcopy(self.axes_manager.signal_axes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that the data are in "pixels"? Why not use the axes information to convert to calibration values? If in some scenarios, it is convenient to have the data in "pixel", maybe add an argument to make it optional, with the default of returning calibrated values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just how the find_peaks (2D) method works so I copied it. We should have the option in returning real units in both cases. In any case saving the axes_manager
in the metadata is a good backup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it would be very good to have both (1D and 2D) returning similar output!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I do would want to get calibrated units without a detour through the axis manager. For me that would be the actually be the expected default behavior.
@ericpre I don't think that is a great idea. 1st returning multiple outputs isn't (currently) supported from the map function. I've got some "workingish" code that would allow multiple outputs but we would have to add some things like a hs.compute() function to merge the task graphs so things run efficiently.
I'm not sure either. I'm mostly expecting what is returned from this function to get pushed to the sub packages. For |
If there is consensus on what would be the API, I think that it would be good (and easily feasible) to get it done for the 2.0 release as this will be the API. @jlaehne, @CSSFrancis, what do you think? |
@ericpre I'm all for getting this in before the 2.0.0 release :) I'd love some suggestions on what to do here. What do you think about creating a new ragged signal? I think we want:
With some functions to:
|
Sorry, I didn't see your message above! Not as easy then... 😅 In principle, the calibration information should go in Maybe we need to leave to leave for after the 2.0 release... in this case, we could have the old and the new function living side by side with the old one deprecated and they would be fairly standalone. |
The Regarding utilities to convert units, etc. maybe it would be best to leave for later, as this will most likely not be an API break? |
@ericpre this is fine. I think that I will probably start to write a
So what would be action here before the 2.0.0 release? Not to crush the hopes of getting the real units to work, but currently that would be helped significantly by #3031 and #3055 as they add in features like converting arrays of points to calibrated values for all axes types. Otherwise you would have to rewrite that part.
Yea I think this requires a better handle on the |
I would suggest:
|
@CSSFrancis, what are your thoughts for this PR? |
@ericpre I don't know if I'll have time to come back to this in the next couple of days or if we have a great answer for how to handle this kind of data. I don't really need this for anything I am planning on doing so it's a little lower priority for me :) I would say just let this slide and then we can deprecate and replace it |
Re-opening because this has been closed automatically by mistake! |
Description of the change
Refactor find_peaks1D to copy the find_peaks2D functionality
Progress of the PR
upcoming_changes
folder (seeupcoming_changes/README.rst
),readthedocs
doc build of this PR (link in github checks)Minimal example of the bug fix or the new feature
Note that this example can be useful to update the user guide.