-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling Axis attributes with @property decorator #3031
base: RELEASE_next_minor
Are you sure you want to change the base?
Handling Axis attributes with @property decorator #3031
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## RELEASE_next_minor #3031 +/- ##
======================================================
+ Coverage 80.54% 80.59% +0.05%
======================================================
Files 147 147
Lines 21868 21883 +15
Branches 5147 5185 +38
======================================================
+ Hits 17613 17637 +24
+ Misses 3037 3024 -13
- Partials 1218 1222 +4 ☔ View full report in Codecov by Sentry. |
The next PR which handles the slicing is found here https://github.com/CSSFrancis/hyperspy/tree/FancySlicingRefactor if that helps understand some of the motivations for these changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @CSSFrancis, this is very good!
There are a few details, which are worth tweaking further, but this is significant simplification of the axes implementation.
I guess this is one of these things which grew over time with more functionalities and ended up being a bit of a mess!
hyperspy/axes.py
Outdated
|
||
def update_axis(self): | ||
@property | ||
def axis(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth caching it in case the function calculation is slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be potentially useful but I am a little hesitant about caching the axis as I think then we start going down the rabbit hole of adding a bunch of extra traits and listeners just to make sure things work correctly and even there it could get complicated and isn't guaranteed . I would rather that some function which repeatedly calls the axis property use something like ax1 = FunctionalDataAxis.axis
and then use that instance repeatedly.
I can add properties which directly calculate the high and low values. Rather than getting the values from the axis which might help some.
Either way it might be worth thinking about a case where caching speeds up things significantly. Even for a size of 1,000,000 values it only takes about 10 ms to calculate something like the sin or log of each value. I think in this case the trade off of speed for simplicity might be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so I think the way to do this if we want to go down this path would be to keep the _axis
attribute for all axes.
Then we could define a function like this:
@observe('scale', postinit=True)
@observe('offset', postinit=True)
@observe('size', postinit=True)
def update_axis(event):
self._axis = np.arange(self.size)*self.scale+self.offset
Then for the FunctionalDataAxis we could create a function
@observe(trait("x._axis", notify=False).list_items(), postinit=True)
@observe(self.parameters_list, postinit=True)
def update_axis(event):
kwargs = {}
for kwarg in self.parameters_list:
kwargs[kwarg] = getattr(self, kwarg)
new_axis = self._function(x=self.x.axis, **kwargs)
# Set not valid values to np.nan
new_axis[np.logical_not(np.isfinite(new_axis))] = np.nan
self._axis = new_axis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the reason I don't really like this is the fact that even though the ._axis
property is protected you could still directly modify the values for a UniformDataAxis and FunctionalDataAxis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way it might be worth thinking about a case where caching speeds up things significantly. Even for a size of 1,000,000 values it only takes about 10 ms to calculate something like the sin or log of each value. I think in this case the trade off of speed for simplicity might be worth it.
The calculation itself may not take very long but when it is called repetitively, it can slow down other operations. An example of this is fitting (#2949) but in this specific case, I haven't check if it introduces a performance regression!
Better to leave it as it is for now and come back to it later.
Then we could define a function like this:
@observe('scale', postinit=True) @observe('offset', postinit=True) @observe('size', postinit=True) def update_axis(event): self._axis = np.arange(self.size)*self.scale+self.offset
Using traits.api.observe
is good, as the old one is deprecated. It seems that it comes advantages, that may be beneficial for us.
I added an item to the 2.0 release list to migrate the new Traits API in other place of the code.
Part of the reason I don't really like this is the fact that even though the
._axis
property is protected you could still directly modify the values for a UniformDataAxis and FunctionalDataAxis
I wouldn't worry about it, because it is a common pattern in python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking through the documentation for Traits I think it should be fairly easy to cache the property. I hadn't looked into the additional capabilities for caching.
from traits.api import cached_property, Property
axis = Property(observe=['scale", "size", "offset"])
@cached_property
def _get_axis(self):
return np.arange(self.size)*self.scale+self.offset
https://docs.enthought.com/traits/traits_user_manual/advanced.html#caching-a-property-value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it isn't quite this easy... Nothing with the traits package has been quite as intuitive as I first thought....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericpre So there are a few problems with this. Mostly that inplace editing of traits is not supported. We have been operating by adding traits to each object individually rather than adding traits to the Class as a whole which is not well supported for . The problem is that when we change the __class__
variable and call __init__
we skip over the __new__
function where the listeners are set for some class.
Unfortunately I think that means that in order to cache axes we would have copy axes when converting from one type to the next.
That means that s.axes_manager[0].convert_to_nonuniform_axis()
--> s.axes_manager[0] = s.axes_manager[0].convert_to_nonuniform_axis()
It isn't a huge change but would allow us to define the Axes classes like this.
from traits.api import (cached_property, HasTraits, Str,Float, Int,observe, Property,Array, Undefined)
class BaseDataAxis(HasTraits):
name=Str(Undefined)
units=Str(Undefined)
class DataAxis(BaseDataAxis):
axis = Array()
class FunctionalDataAxis(DataAxis):
x = Instance(BaseDatatAxis)
axis = Property(observe=["x.axis"] # overwrites axis in DataAxis
@cached_property
def _get_axis(self):
print("calculating axis")
return self.func(x.axis, self.parameters)
class UniformDataAxis(DataAxis):
scale = Float(1)
offset = Float(0)
size = Int(1)
axis = Property(observe=["scale", "offset", "size"] # overwrites axis in DataAxis
@cached_property
def _get_axis(self):
print("calculating axis")
return np.arange(self.size)*self.scale+self.offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be not to use @cached_property
and to do the cache manually. I can have a look at it.
hyperspy/axes.py
Outdated
navigate=navigate, | ||
is_binned=is_binned, | ||
**kwargs) | ||
self.add_trait("_axis", t.Array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to be a trait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this just because I didn't want anyone trying to set the _axis with anything other than an array as that would complicate the fancy slicing quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should note that the self.events.any_axis_changed.trigger(obj=self)
should probably trigger when the _axis
is changed. I am not entirely sure what in the drawing is linked to that event. That does argue for keeping the _axis
attribute as a trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this just because I didn't want anyone trying to set the _axis with anything other than an array as that would complicate the fancy slicing quite a bit.
Okay, maybe worth adding a short comment.
I should note that the
self.events.any_axis_changed.trigger(obj=self)
should probably trigger when the_axis
is changed. I am not entirely sure what in the drawing is linked to that event. That does argue for keeping the_axis
attribute as a trait.
Yes.
hyperspy/axes.py
Outdated
navigate = t.Bool(False) | ||
is_binned = t.Bool(False) | ||
index = t.Range('low_index', 'high_index') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index
or _index
still need to be a trait to be synchronise with the GUI widgets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the index as a trait for the DataAxis and not for the BaseDataAxis. Is that okay? My idea is that the BaseDataAxis would only ever be used if you are trying to define something like a ragged dimension. In which case the index doesn't have a physical sense. I can set it as if that works better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it would have been less confusing if my comment was in a better place. What I meant is that the index
or _index
needs to be a traiits
type, e.g. an instance of traits.api.Range
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that makes more sense. That requires that the low and high indexes be traits as well. Or at least to my understanding that is the case. I currently have it so that the index is set with a @index.setter
function which won't update outside of the index high and low. I'm not terribly familiar with how else this might interact with the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that makes more sense. That requires that the low and high indexes be traits as well. Or at least to my understanding that is the case.
Yes, this is correct.
I currently have it so that the index is set with a
@index.setter
function which won't update outside of the index high and low. I'm not terribly familiar with how else this might interact with the API.
This is a details, which will be to sorted in this PR and we can come back to it toward the end of this PR when other things are settled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this runs into some complications as t.Range needs to be set with a value and not a Property initially.
We could do this:
class DataAxis:
axis = t.Array() # Axis added as trait to track changes to axis
high_index = t.Int(1)
low_index = 0
_index = t.Range(low_index, high_index)
@observe("axis")
def update_high_index(self, event):
self.high_index = self.size-1
Or keep it the way it is. I don't think it is too confusing right now but either works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should take this opportunity to add comments describing the expected use of properties and the reasoning for the relationship between the different axis and why different attributes/properties are defined in this specific way, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have done a review but forgot to submit it... I updated my comments with the latest changes.
hyperspy/axes.py
Outdated
navigate=navigate, | ||
is_binned=is_binned, | ||
**kwargs) | ||
self.add_trait("_axis", t.Array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this just because I didn't want anyone trying to set the _axis with anything other than an array as that would complicate the fancy slicing quite a bit.
Okay, maybe worth adding a short comment.
I should note that the
self.events.any_axis_changed.trigger(obj=self)
should probably trigger when the_axis
is changed. I am not entirely sure what in the drawing is linked to that event. That does argue for keeping the_axis
attribute as a trait.
Yes.
hyperspy/axes.py
Outdated
|
||
def update_axis(self): | ||
@property | ||
def axis(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way it might be worth thinking about a case where caching speeds up things significantly. Even for a size of 1,000,000 values it only takes about 10 ms to calculate something like the sin or log of each value. I think in this case the trade off of speed for simplicity might be worth it.
The calculation itself may not take very long but when it is called repetitively, it can slow down other operations. An example of this is fitting (#2949) but in this specific case, I haven't check if it introduces a performance regression!
Better to leave it as it is for now and come back to it later.
Then we could define a function like this:
@observe('scale', postinit=True) @observe('offset', postinit=True) @observe('size', postinit=True) def update_axis(event): self._axis = np.arange(self.size)*self.scale+self.offset
Using traits.api.observe
is good, as the old one is deprecated. It seems that it comes advantages, that may be beneficial for us.
I added an item to the 2.0 release list to migrate the new Traits API in other place of the code.
Part of the reason I don't really like this is the fact that even though the
._axis
property is protected you could still directly modify the values for a UniformDataAxis and FunctionalDataAxis
I wouldn't worry about it, because it is a common pattern in python.
@@ -2001,18 +2011,16 @@ def update_axes_attributes_from(self, axes, | |||
if changes: | |||
self.events.any_axis_changed.trigger(obj=self) | |||
|
|||
def _update_attributes(self): | |||
@observe("_axes") | |||
@observe(trait("_axes", notify=False).list_items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure that I am following correctly, @observe(trait("_axes", notify=False).list_items())
is to call the add_axis_manger
method when the attribute, which are of traits
instances, are changed? If so, I don't see why we need it here? The axes_manager
attribute of the axis needs to be set only when the axis is added to a AxesManager
(currently covered by @observe("_axes")
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the first case is triggered by someone setting the axes value. Something like am.axes=[axis1, axis2]
Because lists are mutable in place though that observation isn't triggered by a inplace operation. For that we need the second observation which triggers when something inplace occurs. Ie. am.axes[1]= axis1
See https://docs.enthought.com/traits/traits_user_manual/notification.html#notification-handler for more details.
hyperspy/axes.py
Outdated
navigate = t.Bool(False) | ||
is_binned = t.Bool(False) | ||
index = t.Range('low_index', 'high_index') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that makes more sense. That requires that the low and high indexes be traits as well. Or at least to my understanding that is the case.
Yes, this is correct.
I currently have it so that the index is set with a
@index.setter
function which won't update outside of the index high and low. I'm not terribly familiar with how else this might interact with the API.
This is a details, which will be to sorted in this PR and we can come back to it toward the end of this PR when other things are settled.
All good :) thanks for the review. To be honest I did the same thing the first time around when I replied... |
@ericpre This should be in a little bit better of shape now. I added in the ability to cache the axis value for both the The new changes improve the self documentation a little bit as well and are more in line with how the Traits package should be used. The downside of this is that there is less no way to properly handle (at least to my knowledge) the inplace conversion from one axis to another. This type of operation doens't properly set up the listeners for the different Axes types. Additionally adding traits to the FunctionalDataAxis is not well supported. Different parameters will have to be adjusted using For converting some axis we need to do:
Not a huge api change but this can get a little clunky at times. It might be better to add something along the lines of :
|
hyperspy/axes.py
Outdated
self.__class__ = UniformDataAxis | ||
self.__init__(**d, size=self.size, scale=scale, offset=self.low_value) | ||
self.axes_manager = axes_manager | ||
new_axis = UniformDataAxis(**d, size=size, scale=scale, offset=offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is convenient to convert inplace and it would be good to avoid returning a different object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see if there is a way to do it. I think it means that we have manage how the changes to different attributes are handled which is a bit more involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll become an expert at the Traits package by the end of this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericpre Well it only took me a couple of days but I think the solution was a lot easier than I was making it. How do you feel about this?
def convert_to_non_uniform_axis(self):
"""Convert to a non-uniform axis."""
d = super().get_axis_dictionary()
d["_type"] = 'DataAxis'
new_axis = DataAxis(**d)
if self.axes_manager is not None:
self.axes_manager[self] = new_axis
else:
return new_axis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with returning a new object is that the events connected to the old one will be lost.
Nice job @CSSFrancis ! This is a much needed refreshment of the axes code. What do you mean with "unordered (or unorderable axes)"? Is it velocity_axis = hs.axes.UniformDataAxis(name="velocity", units="m/s", size=3)
velocity_axis.labels = {0: "v_x", 1: "v_y", 2: "v_z"} The advantage of using an extra attribute is that The advantage of using a temp = hs.axes.UniformDataAxis(name="temperature", units="degC", size=100)
temp.labels = {0: "Water freezing point", 100: "Water boiling point"} |
@francisco-dlp Thanks for the response. This was exactly the kind of idea I was hoping this PR would generate.
Yes I would say that there are both unordered axes, something like hs.axes.DataAxis(axis=[hs.axes.LabeledColumn(name="x", units="nm", scale=.1),
hs.axes.LabeledColumn(name="y", units="nm", scale=0.1),
hs.axes.LabeledColumn(name="kx", units="nm^-1", scale=0.01),
hs.axes.LabeledColumn(name="ky", units="nm^-1", scale=0.01)]) There are a couple of points for allowing each of these types but I do like the idea of adding a labels dictionary attribute. Let's start with the unordered axis. Which is numeric array but not necessarily increasing or decreasing.
data = np.ones((10,10,10))
s = hs.signals.Signal2D(data)
s.axes_manager.signal_axes[1].scale=.1
data[:, :, [1,2,5,4]] # returns the 1st, 2nd, 5th and 4th indexes
data.shape # (10, 10, 4)
s.isig[:. [1,2,5,4]] # shape == (10,10,4)
s.isig[:,[.1,.2,.5,.4]] # equivalent to above This requires the support of unordered axes. I don't think that this is too hard of an ask. There is already a fair bit of missing support for the DataAxis and functions which assume that axes have a scale or offset attribute, this is just an additional extension of that. In this case any time an unordered axis is asked to do something that it can't it will throw an error. This might be something we can be more explicit about but I don't think this is necessarily a bad thing. For the unorderable axis this is an array of strings or some other object.I think in this case I go back and forth. You might be right about adding in a labels attribute. I like the concept of not having labels for each index and I think has a fair bit of potential for indexing using custom labels. The hard part is managing both parts of slicing; Slicing with a label and slicing the labels object. I will also say that the opposite syntax makes more sense to me. i.e. 1: Slicing with a label:
def label2index(label, label_dict):
if label in label_dict:
return label_dict[label]
else:
return label
2: Slicing the label dictionary:
def slice_labels(self, slice, labels):
new_indexes = np.arrange(self.size)[slice]
for key, value in labels.items:
if isinstance(value, float):
value = self.value2index(float)
if not value in new_indexes:
labels.pop(key)
elif value in new_indexes and isinstance(value, ind):
labels[key]=np.where(value==new_indexes)[0][0]
return labels The case where we might have a long list if labels is after something like slicing with an n-dimensinal boolean array. In that case any of the sliced dimensions are flattened and the resultant axis should be an index of the values for each of the axes. Lets consider a 4 dimensional array sliced with a 4 dimensional boolean array. This results in a 1-Dimensional array of values. If we call that one axis the index_axis then
I feel like this could be extremely useful in many cases! I think the general concept would be that labels would be an easy way to annotate and manipulate data without using ROI's. |
Not sure about it: there are things that you can do with abstract arrays that you shouldn't do with the sort of data that HyperSpy deals with. HyperSpy is about data with axes that can be separated into "signal" and "navigation" axes. These simple constraints are HyperSpy's raison d'etre. They enable things like the Technically we could support: s.isig[:. [1,2,5,4]] # shape == (10,10,4)
s.isig[:,[.1,.2,.5,.4]] # equivalent to above but what for? And how to implement it in a way that makes sense?
I see your point.
Nice idea. Maybe this should operate at the level of the > s["liquid_water_range"] # Get's the right ROI from `s.metadata` and returnts `roi(s)`.
We have to be careful not to implement several ways of performing the same thing. Your comment has made me realize that the |
I agree that a lot of what makes hyperspy work is the constraints, that being said I think that the above code does fall under those constraints and is easily implemented using the existing infastructure. (simply changing from a UniformDataAxis --> DataAxis and slicing or keeping an axis a DataAxis and slicing works). From a hyperspy standpoint supporting slicing with multi-indexes or boolean arrays I would like to think of as very much on the brand of manipulating data but saving the underlying metadata and information about the axis. I use these kinds of functions many times to mask some data, take the sum of some area, compare two data points, getting the intensity at a set of points on the data set. Additionally, in the case of having a list of vectors this type of slicing is very necessary to down select from a large list. Unfortunately, this often requires me to operate on the I think that my workflow is about the worst thing we can have our users doing as it undermines the whole persistence of metadata, axes etc. That is why I think supporting as much of numpy as reasonable is important as it stops much of this behavior. The implementation is all in #3055 if you are interested in taking a look. The diff might be large because it includes this PR but I submitted it as a draft because I think it helps make sense of some of these changes.
I think that the distinction is that the labels attribute is associated with some axis rather than with some signal where as an ROI is related or even independent of a signal. The Point1D is just another way of directly indexing the data but we support both cases. I would rather see this as a way to annotate some axis or provide a secondary view. Maybe another way to think about this is in terms of how we support conversions between units. This is just an extension of that to a custom mapping. Ultimately, I just think something that worked like this:
Is very intuitive and uses the nice new capabilities that hyperspy has added with the non-uniform axis pretty effectively. |
@francisco-dlp The relevant change is here for the uniform axis. if isinstance(_slice, (np.ndarray, list)):
new_axis = self.convert_to_non_uniform_axis()
return new_axis._slice_me(_slice) In the slicing code then we just treat arrays and lists differently than tuples. numpy/numpy#9686 This is the relevant change to numpy whose syntax I really think we should follow as that is what many of our users understand. I really don't think we want to diverge too much from this also because then we lose a lot of the functionality of a Signal being a duck array. (If numpy/ scipy ever figures that out 😃) Currently something like |
@francisco-dlp did we ever fall either way on this? I can move some of the code downstream to pyxem, but I do want to get as much into hyperspy as I can. Let me know if there is any part of this that might need some more discussion/ information. |
Hi @CSSFrancis. Sorry to be slow to reply. I have some further thoughts on this and #3055, but I am too busy at the moment. I'll come back to you in the next few days. |
@francisco-dlp No worries! Whenever you get around to it is fine, thanks for taking your time to review this. |
Thinking about it, I backtrack from my suggestion to use a |
@francisco-dlp Thanks for the comments.
I think that makes sense. Maybe this does bring up a point that our idea of ROI's and markers are pretty similar in some cases. Maybe we can start to think of markers as a collection of ROI's or even as a baseSignal with a collection of ROI's? I sometimes think that the makers code can be a little obtuse, especially when you want different markers as a function of navigation position. Maybe an alternative is to make a
I think this is a pretty good compromise and I should be able to implement this fairly easily. That being said I am slightly hesitant about if someone adjusts the For example how can we handle |
Pitching in as an extension developer on some points raised by @CSSFrancis. (Feel free to move this comment to another issue/discussion.)
kikuchipy uses HyperSpy's markers actively to draw Kikuchi bands on patterns (see this figures in this tutorial). They are used as visual aid to confirm a correct indexing of electron backscatter diffraction (EBSD) patterns or a particular position of a pattern feature. I don't think this use fits the description of them as ROIs, at least not as mentioned here?
The implementation in kikuchipy uses the same set of bands for all nav positions, but sets those not visible in a certain pattern to NaN. Each band's line coordinates vary per nav position. The current implementation of markers in HyperSpy fits this use perfectly. However, I have plans to make the marker coordinates calculated on-the-fly (lazily using Dask). Might be you've thought along these lines yourself, @CSSFrancis. |
I see your point there. I was just commenting on some of the similarities between same markers and ROIs. There might even be some cases like creating a virtual dark field image from a set of ROI's where the line starts to get a little blurred. In that case we want the slicing capabilities of a ROI and the labeling and plotting capabilities of a marker. It's hard to say that something like a arrow or a text marker should be an ROI. The way things are is good but this conversation sparked that thought.
Sorry that was kind of a throwaway line and I should have explained better. I also realize that this should probably be moved to a separate issue. Maybe this isn't as straight forward as I thought :) I always considered having to create as many markers as you needed for the signal with the most markers and setting some markers to nan a bug and not a feature. In your case I guess I could go either way. I will say that the way we are currently doing it is slow. When I want to make a set of markers that change with every navigation dimension I have to create a set of markers that is as long as the longest set of vectors(which can be very long) and add a bunch of np.nan markers. Once you add a bunch of these markers to a signal this slows down the plotting a lot. This is because each marker is plotted as a separate artist, so having 30+ artists really slows down Matplotlib and makes navigating a pain. A better way would be to be able to define all the markers as a ragged array of vectors. Then just add one marker to the signal with the data equal to that ragged array. But to that point when does this ragged array of vectors defining some set of markers start to look suspiciously like a ragged signal. We can even think of that set of makers as having a defined navigation dimension which can now be sliced alongside the signal or think about returning a set of markers from the |
Thank you for elaborating. I agree with all your points made, and I'd say your experience with markers describes mine well. The creation of markers (geometrical simulations) in kikuchipy is currently using too much memory, so they cannot be added to a large dataset (> a couple GB) anyway. If we ever fix that, I hope to find time to look more into using ragged arrays as you say. |
hyperspy/axes.py
Outdated
|
||
def update_axis(self): | ||
@property | ||
def axis(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be not to use @cached_property
and to do the cache manually. I can have a look at it.
dcshape = dc.shape | ||
for _axis in self.axes_manager._axes: | ||
if _axis.index_in_array < len(dcshape): | ||
_axis.size = int(dcshape[_axis.index_in_array]) | ||
if _axis.size != int(dcshape[_axis.index_in_array]): | ||
if _axis.is_uniform: | ||
_axis.size = int(dcshape[_axis.index_in_array]) | ||
else: | ||
_logger.warning(f"Axis {_axis} converted to a UniformDataAxis") | ||
_axis.convert_to_uniform_axis() | ||
_axis.size = int(dcshape[_axis.index_in_array]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is about getting chunks and it is surprise that to do that it needs to edit the axis! Could it be a workaround for the data shape and the axis are out of sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is related to the fact that the axis and the array shape can be out of sync. This causes some issues if they are out of sync and try to chunk the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use BaseSignal.get_dimensions_from_data
instead?
hyperspy/axes.py
Outdated
else: | ||
# the axis is not ordered | ||
return None | ||
except np.core._exceptions._UFuncNoLoopError: # subtracting array of objects/strings will fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using numpy private API: np.core._exceptions._UFuncNoLoopError
is a subclass of TypeError
:
except np.core._exceptions._UFuncNoLoopError: # subtracting array of objects/strings will fail | |
except TypeError: # subtracting array of objects/strings will fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericpre Good point! I'll make the substitution
hyperspy/axes.py
Outdated
self.__class__ = UniformDataAxis | ||
self.__init__(**d, size=self.size, scale=scale, offset=self.low_value) | ||
self.axes_manager = axes_manager | ||
new_axis = UniformDataAxis(**d, size=size, scale=scale, offset=offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with returning a new object is that the events connected to the old one will be lost.
@ericpre I'll make sure that this isn't the case... Seems like the things we need to make sure are handled correctly
Any other things to check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the changes with convert_to_uniform_axis
, before remove_trait
/add_trait
were used to handle trait dynamically but why is it not possible anymore?
dcshape = dc.shape | ||
for _axis in self.axes_manager._axes: | ||
if _axis.index_in_array < len(dcshape): | ||
_axis.size = int(dcshape[_axis.index_in_array]) | ||
if _axis.size != int(dcshape[_axis.index_in_array]): | ||
if _axis.is_uniform: | ||
_axis.size = int(dcshape[_axis.index_in_array]) | ||
else: | ||
_logger.warning(f"Axis {_axis} converted to a UniformDataAxis") | ||
_axis.convert_to_uniform_axis() | ||
_axis.size = int(dcshape[_axis.index_in_array]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use BaseSignal.get_dimensions_from_data
instead?
@@ -933,25 +969,20 @@ def __init__(self, | |||
size=1, | |||
is_binned=False, | |||
**parameters): | |||
super().__init__( | |||
super(DataAxis, self).__init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to add DataAxis, self
in the super
call here, is it not old python 2.x syntax?
@observe("_axes") | ||
@observe(trait("_axes", notify=False).list_items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@observe("_axes") | |
@observe(trait("_axes", notify=False).list_items()) | |
# These two `observe` decorators are necessary to make sure that an item of the list is changed in place | |
@observe("_axes") | |
@observe(trait("_axes", notify=False).list_items()) |
Nothing that I can think of but even if something is broken after this PR, this is a good time to do it! |
@CSSFrancis, are you happy if I take over this PR? I suspect you may be busy with others things! 😃 |
@ericpre yea I kind of am trying to take a step back to finish up some pyxem stuff so if you pick this up I'd be very happy! Let me know if you help with anything! And I can make some time to look over this again. |
cc21fbc
to
c89e8c5
Compare
Description of the change
This set of changes supersedes #3012 and #2876 in the hope of creating small incremental changes to how Axes and slicing occurs before fully supporting vector signals. @ericpre sorry about the confusion, but the changes to how the slicing works are rather larger than anticipated so I wanted to do this slow and incrementally.
This change better defines what are the key attributes of an axes and focuses on only allowing changes to those attributes.
These changes help to stop a lot of small bugs within the axes code. Things like being able to set the size of a DataAxis don't make sense when the size is directly related to the axis. Similarly with the UniformDataAxis, directly setting the Axis should throw an error.
This also adds in some limited support for unordered (or unorderable axes) which are necessary for vector applications.
Progress of the PR
Minimal example of the bug fix or the new feature