Handling Axis attributes with @property decorator #3031

CSSFrancis · 2022-09-23T15:27:22Z

Description of the change

This set of changes supersedes #3012 and #2876 in the hope of creating small incremental changes to how Axes and slicing occurs before fully supporting vector signals. @ericpre sorry about the confusion, but the changes to how the slicing works are rather larger than anticipated so I wanted to do this slow and incrementally.

This change better defines what are the key attributes of an axes and focuses on only allowing changes to those attributes.

BaseDataAxis: The base class it has only a name and unit as well as attributes related to interfacing with the AxesManager.
DataAxis: The most flexible of the Axes. It has an axes attribute which is a 1 dimensional array. This can be increasing order, decreasing order or arbitrarily ordered. (Note the last case does not support any kind of indexing yet)
FunctionalDataAxis: Has an axis property that is calculated from x and the given function.
UniformDataAxis: Has an axis property calculated from the size, offset and scale.
AxesManager: Properties in the axes manager are directly pulled from the axes in the list.

These changes help to stop a lot of small bugs within the axes code. Things like being able to set the size of a DataAxis don't make sense when the size is directly related to the axis. Similarly with the UniformDataAxis, directly setting the Axis should throw an error.

This also adds in some limited support for unordered (or unorderable axes) which are necessary for vector applications.

Progress of the PR

Change implemented (can be split into several points),
add tests,
ready for review.

Minimal example of the bug fix or the new feature

import hyperspy.api as hs

ax = hs.axes.DataAxis(["a", "b", "c"]) # No longer throws an error
ax.size = 10 # Throws AttributeError

ax = hs.axes.UniformDataAxis(size=10, scale=3, offset=1) 
ax.size = 10
ax.axis = [10,12,13,14] # Throws AttributeError

codecov · 2022-09-23T15:39:54Z

Codecov Report

Attention: Patch coverage is 94.65649% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 80.59%. Comparing base (18caceb) to head (c89e8c5).

Files	Patch %	Lines
hyperspy/axes.py	97.81%	1 Missing and 4 partials ⚠️
hyperspy/_signals/lazy.py	33.33%	3 Missing and 1 partial ⚠️
hyperspy/signal.py	50.00%	2 Missing and 1 partial ⚠️
hyperspy/models/model1d.py	89.47%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@                  Coverage Diff                   @@
##           RELEASE_next_minor    #3031      +/-   ##
======================================================
+ Coverage               80.54%   80.59%   +0.05%     
======================================================
  Files                     147      147              
  Lines                   21868    21883      +15     
  Branches                 5147     5185      +38     
======================================================
+ Hits                    17613    17637      +24     
+ Misses                   3037     3024      -13     
- Partials                 1218     1222       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

CSSFrancis · 2022-09-28T18:46:50Z

The next PR which handles the slicing is found here https://github.com/CSSFrancis/hyperspy/tree/FancySlicingRefactor if that helps understand some of the motivations for these changes.

ericpre

Thank you @CSSFrancis, this is very good!
There are a few details, which are worth tweaking further, but this is significant simplification of the axes implementation.
I guess this is one of these things which grew over time with more functionalities and ended up being a bit of a mess!

hyperspy/signal.py

hyperspy/axes.py

hyperspy/signal.py

hyperspy/axes.py

ericpre · 2022-09-29T20:13:30Z

hyperspy/axes.py


-    def update_axis(self):
+    @property
+    def axis(self):


It may be worth caching it in case the function calculation is slow.

This could be potentially useful but I am a little hesitant about caching the axis as I think then we start going down the rabbit hole of adding a bunch of extra traits and listeners just to make sure things work correctly and even there it could get complicated and isn't guaranteed . I would rather that some function which repeatedly calls the axis property use something like ax1 = FunctionalDataAxis.axis and then use that instance repeatedly.

I can add properties which directly calculate the high and low values. Rather than getting the values from the axis which might help some.

Either way it might be worth thinking about a case where caching speeds up things significantly. Even for a size of 1,000,000 values it only takes about 10 ms to calculate something like the sin or log of each value. I think in this case the trade off of speed for simplicity might be worth it.

Okay so I think the way to do this if we want to go down this path would be to keep the _axis attribute for all axes.

Then we could define a function like this:

@observe('scale', postinit=True) @observe('offset', postinit=True) @observe('size', postinit=True) def update_axis(event): self._axis = np.arange(self.size)*self.scale+self.offset

Then for the FunctionalDataAxis we could create a function

@observe(trait("x._axis", notify=False).list_items(), postinit=True) @observe(self.parameters_list, postinit=True) def update_axis(event): kwargs = {} for kwarg in self.parameters_list: kwargs[kwarg] = getattr(self, kwarg) new_axis = self._function(x=self.x.axis, **kwargs) # Set not valid values to np.nan new_axis[np.logical_not(np.isfinite(new_axis))] = np.nan self._axis = new_axis

Part of the reason I don't really like this is the fact that even though the ._axis property is protected you could still directly modify the values for a UniformDataAxis and FunctionalDataAxis

Either way it might be worth thinking about a case where caching speeds up things significantly. Even for a size of 1,000,000 values it only takes about 10 ms to calculate something like the sin or log of each value. I think in this case the trade off of speed for simplicity might be worth it.

The calculation itself may not take very long but when it is called repetitively, it can slow down other operations. An example of this is fitting (#2949) but in this specific case, I haven't check if it introduces a performance regression!
Better to leave it as it is for now and come back to it later.

Then we could define a function like this:

@observe('scale', postinit=True) @observe('offset', postinit=True) @observe('size', postinit=True) def update_axis(event): self._axis = np.arange(self.size)*self.scale+self.offset

Using traits.api.observe is good, as the old one is deprecated. It seems that it comes advantages, that may be beneficial for us.
I added an item to the 2.0 release list to migrate the new Traits API in other place of the code.

Part of the reason I don't really like this is the fact that even though the ._axis property is protected you could still directly modify the values for a UniformDataAxis and FunctionalDataAxis

I wouldn't worry about it, because it is a common pattern in python.

After looking through the documentation for Traits I think it should be fairly easy to cache the property. I hadn't looked into the additional capabilities for caching.

from traits.api import cached_property, Property axis = Property(observe=['scale", "size", "offset"]) @cached_property def _get_axis(self): return np.arange(self.size)*self.scale+self.offset

https://docs.enthought.com/traits/traits_user_manual/advanced.html#caching-a-property-value

Or maybe it isn't quite this easy... Nothing with the traits package has been quite as intuitive as I first thought....

@ericpre So there are a few problems with this. Mostly that inplace editing of traits is not supported. We have been operating by adding traits to each object individually rather than adding traits to the Class as a whole which is not well supported for . The problem is that when we change the __class__ variable and call __init__ we skip over the __new__ function where the listeners are set for some class.

Unfortunately I think that means that in order to cache axes we would have copy axes when converting from one type to the next.

That means that s.axes_manager[0].convert_to_nonuniform_axis() --> s.axes_manager[0] = s.axes_manager[0].convert_to_nonuniform_axis()

It isn't a huge change but would allow us to define the Axes classes like this.

from traits.api import (cached_property, HasTraits, Str,Float, Int,observe, Property,Array, Undefined) class BaseDataAxis(HasTraits): name=Str(Undefined) units=Str(Undefined) class DataAxis(BaseDataAxis): axis = Array() class FunctionalDataAxis(DataAxis): x = Instance(BaseDatatAxis) axis = Property(observe=["x.axis"] # overwrites axis in DataAxis @cached_property def _get_axis(self): print("calculating axis") return self.func(x.axis, self.parameters) class UniformDataAxis(DataAxis): scale = Float(1) offset = Float(0) size = Int(1) axis = Property(observe=["scale", "offset", "size"] # overwrites axis in DataAxis @cached_property def _get_axis(self): print("calculating axis") return np.arange(self.size)*self.scale+self.offset

An alternative would be not to use @cached_property and to do the cache manually. I can have a look at it.

ericpre · 2022-09-29T20:16:31Z

hyperspy/axes.py

+            navigate=navigate,
+            is_binned=is_binned,
+            **kwargs)
+        self.add_trait("_axis", t.Array)


Why does it need to be a trait?

I kept this just because I didn't want anyone trying to set the _axis with anything other than an array as that would complicate the fancy slicing quite a bit.

I should note that the self.events.any_axis_changed.trigger(obj=self) should probably trigger when the _axis is changed. I am not entirely sure what in the drawing is linked to that event. That does argue for keeping the _axis attribute as a trait.

I kept this just because I didn't want anyone trying to set the _axis with anything other than an array as that would complicate the fancy slicing quite a bit.

Okay, maybe worth adding a short comment.

I should note that the self.events.any_axis_changed.trigger(obj=self) should probably trigger when the _axis is changed. I am not entirely sure what in the drawing is linked to that event. That does argue for keeping the _axis attribute as a trait.

Yes.

ericpre · 2022-09-29T20:32:25Z

hyperspy/axes.py

    navigate = t.Bool(False)
    is_binned = t.Bool(False)
-    index = t.Range('low_index', 'high_index')


index or _index still need to be a trait to be synchronise with the GUI widgets.

I have the index as a trait for the DataAxis and not for the BaseDataAxis. Is that okay? My idea is that the BaseDataAxis would only ever be used if you are trying to define something like a ragged dimension. In which case the index doesn't have a physical sense. I can set it as if that works better

Sorry, it would have been less confusing if my comment was in a better place. What I meant is that the index or _index needs to be a traiits type, e.g. an instance of traits.api.Range.

Ahh, that makes more sense. That requires that the low and high indexes be traits as well. Or at least to my understanding that is the case. I currently have it so that the index is set with a @index.setter function which won't update outside of the index high and low. I'm not terribly familiar with how else this might interact with the API.

Ahh, that makes more sense. That requires that the low and high indexes be traits as well. Or at least to my understanding that is the case.

Yes, this is correct.

I currently have it so that the index is set with a @index.setter function which won't update outside of the index high and low. I'm not terribly familiar with how else this might interact with the API.

This is a details, which will be to sorted in this PR and we can come back to it toward the end of this PR when other things are settled.

So this runs into some complications as t.Range needs to be set with a value and not a Property initially.

We could do this:

class DataAxis: axis = t.Array() # Axis added as trait to track changes to axis high_index = t.Int(1) low_index = 0 _index = t.Range(low_index, high_index) @observe("axis") def update_high_index(self, event): self.high_index = self.size-1

Or keep it the way it is. I don't think it is too confusing right now but either works!

hyperspy/axes.py

ericpre

We should take this opportunity to add comments describing the expected use of properties and the reasoning for the relationship between the different axis and why different attributes/properties are defined in this specific way, etc.

ericpre

Sorry, I have done a review but forgot to submit it... I updated my comments with the latest changes.

ericpre · 2022-10-08T08:18:09Z

hyperspy/axes.py

+            navigate=navigate,
+            is_binned=is_binned,
+            **kwargs)
+        self.add_trait("_axis", t.Array)


I kept this just because I didn't want anyone trying to set the _axis with anything other than an array as that would complicate the fancy slicing quite a bit.

Okay, maybe worth adding a short comment.

I should note that the self.events.any_axis_changed.trigger(obj=self) should probably trigger when the _axis is changed. I am not entirely sure what in the drawing is linked to that event. That does argue for keeping the _axis attribute as a trait.

Yes.

ericpre · 2022-10-08T08:21:36Z

hyperspy/axes.py


-    def update_axis(self):
+    @property
+    def axis(self):


Either way it might be worth thinking about a case where caching speeds up things significantly. Even for a size of 1,000,000 values it only takes about 10 ms to calculate something like the sin or log of each value. I think in this case the trade off of speed for simplicity might be worth it.

The calculation itself may not take very long but when it is called repetitively, it can slow down other operations. An example of this is fitting (#2949) but in this specific case, I haven't check if it introduces a performance regression!
Better to leave it as it is for now and come back to it later.

Then we could define a function like this:

@observe('scale', postinit=True) @observe('offset', postinit=True) @observe('size', postinit=True) def update_axis(event): self._axis = np.arange(self.size)*self.scale+self.offset

Using traits.api.observe is good, as the old one is deprecated. It seems that it comes advantages, that may be beneficial for us.
I added an item to the 2.0 release list to migrate the new Traits API in other place of the code.

Part of the reason I don't really like this is the fact that even though the ._axis property is protected you could still directly modify the values for a UniformDataAxis and FunctionalDataAxis

I wouldn't worry about it, because it is a common pattern in python.

hyperspy/signal.py

hyperspy/axes.py

ericpre · 2022-10-08T09:20:56Z

hyperspy/axes.py

@@ -2001,18 +2011,16 @@ def update_axes_attributes_from(self, axes,
        if changes:
            self.events.any_axis_changed.trigger(obj=self)

-    def _update_attributes(self):
+    @observe("_axes")
+    @observe(trait("_axes", notify=False).list_items())


Just to make sure that I am following correctly, @observe(trait("_axes", notify=False).list_items()) is to call the add_axis_manger method when the attribute, which are of traits instances, are changed? If so, I don't see why we need it here? The axes_manager attribute of the axis needs to be set only when the axis is added to a AxesManager (currently covered by @observe("_axes"))?

So the first case is triggered by someone setting the axes value. Something like am.axes=[axis1, axis2]

Because lists are mutable in place though that observation isn't triggered by a inplace operation. For that we need the second observation which triggers when something inplace occurs. Ie. am.axes[1]= axis1

See https://docs.enthought.com/traits/traits_user_manual/notification.html#notification-handler for more details.

ericpre · 2022-10-08T09:22:58Z

hyperspy/axes.py

    navigate = t.Bool(False)
    is_binned = t.Bool(False)
-    index = t.Range('low_index', 'high_index')


Ahh, that makes more sense. That requires that the low and high indexes be traits as well. Or at least to my understanding that is the case.

Yes, this is correct.

I currently have it so that the index is set with a @index.setter function which won't update outside of the index high and low. I'm not terribly familiar with how else this might interact with the API.

This is a details, which will be to sorted in this PR and we can come back to it toward the end of this PR when other things are settled.

hyperspy/axes.py

CSSFrancis · 2022-10-08T11:42:40Z

Sorry, I have done a review but forgot to submit it... I updated my comments with the latest changes.

All good :) thanks for the review. To be honest I did the same thing the first time around when I replied...

CSSFrancis · 2022-10-14T17:02:55Z

@ericpre This should be in a little bit better of shape now. I added in the ability to cache the axis value for both the FunctionalDataAxis and the UniformDataAxis. Let me know what still needs to be done to get this merged. There is some documentation lacking but for the User Guide I was intending to do that all at once. Once this is merged I will submit the PR for fancy slicing and we can go from there.

The new changes improve the self documentation a little bit as well and are more in line with how the Traits package should be used. The downside of this is that there is less no way to properly handle (at least to my knowledge) the inplace conversion from one axis to another. This type of operation doens't properly set up the listeners for the different Axes types. Additionally adding traits to the FunctionalDataAxis is not well supported. Different parameters will have to be adjusted using ax.parameters["power"]=3.

For converting some axis we need to do:

s.axes_manager[0] = s.axes_manager[0] .convert_to_non_uniform_axis()

Not a huge api change but this can get a little clunky at times.

It might be better to add something along the lines of :

a.axes_manager.convert(axis=0, "non_uniform"]

ericpre · 2022-10-16T20:05:59Z

hyperspy/axes.py

-        self.__class__ = UniformDataAxis
-        self.__init__(**d, size=self.size, scale=scale, offset=self.low_value)
-        self.axes_manager = axes_manager
+        new_axis = UniformDataAxis(**d, size=size, scale=scale, offset=offset)


It is convenient to convert inplace and it would be good to avoid returning a different object.

I can see if there is a way to do it. I think it means that we have manage how the changes to different attributes are handled which is a bit more involved.

I think I'll become an expert at the Traits package by the end of this :)

@ericpre Well it only took me a couple of days but I think the solution was a lot easier than I was making it. How do you feel about this?

def convert_to_non_uniform_axis(self): """Convert to a non-uniform axis.""" d = super().get_axis_dictionary() d["_type"] = 'DataAxis' new_axis = DataAxis(**d) if self.axes_manager is not None: self.axes_manager[self] = new_axis else: return new_axis

The problem with returning a new object is that the events connected to the old one will be lost.

francisco-dlp · 2022-10-28T14:39:35Z

Nice job @CSSFrancis ! This is a much needed refreshment of the axes code.

What do you mean with "unordered (or unorderable axes)"? Is it hs.axes.DataAxis(["a", "b", "c"])? It it is what it seems to me, I wonder if it wouldn't be better to implement this feature as simply axis labels. For example, an axis could have an extra optional attribute, labels, that could be a dictionary, containing labels. For example:

velocity_axis = hs.axes.UniformDataAxis(name="velocity", units="m/s", size=3)
velocity_axis.labels = {0: "v_x", 1: "v_y", 2: "v_z"}

The advantage of using an extra attribute is that .axis remains a numeric array. Therefore, all code assuming this doesn't break. Also this enables adding labels that are not uniformly distributed over the axis (using DataAxis or NonUniformDataAxis.

The advantage of using a dict is that it enables adding labels to a subset of the axis points e.g.:

temp = hs.axes.UniformDataAxis(name="temperature", units="degC", size=100)
temp.labels = {0: "Water freezing point", 100: "Water boiling point"}

CSSFrancis · 2022-10-28T18:05:29Z

Nice job @CSSFrancis ! This is a much needed refreshment of the axes code.

@francisco-dlp Thanks for the response. This was exactly the kind of idea I was hoping this PR would generate.

What do you mean with "unordered (or unorderable axes)"? Is it hs.axes.DataAxis(["a", "b", "c"])? It it is what it seems to me, I wonder if it wouldn't be better to implement this feature as simply axis labels. For example, an axis could have an extra optional attribute, labels, that could be a dictionary, containing labels. For example:

Yes I would say that there are both unordered axes, something like hs.axes.DataAxis(axis=[1,3,2,5,4,7]) and unorderable axes which would be something like hs.axes.DataAxis(axis=["a","b","c","d","e"]) or even something like

hs.axes.DataAxis(axis=[hs.axes.LabeledColumn(name="x", units="nm", scale=.1),
                       hs.axes.LabeledColumn(name="y", units="nm", scale=0.1),
                       hs.axes.LabeledColumn(name="kx", units="nm^-1", scale=0.01),
                       hs.axes.LabeledColumn(name="ky", units="nm^-1", scale=0.01)])

There are a couple of points for allowing each of these types but I do like the idea of adding a labels dictionary attribute.

Let's start with the unordered axis. Which is numeric array but not necessarily increasing or decreasing.

I think that hyperspy should try to support as much numpy as possible. This includes something like:

data = np.ones((10,10,10))
s = hs.signals.Signal2D(data)
s.axes_manager.signal_axes[1].scale=.1
data[:, :, [1,2,5,4]] # returns the 1st, 2nd, 5th and 4th indexes 
data.shape # (10, 10, 4)
s.isig[:. [1,2,5,4]] # shape == (10,10,4)
s.isig[:,[.1,.2,.5,.4]] # equivalent to above

This requires the support of unordered axes. I don't think that this is too hard of an ask. There is already a fair bit of missing support for the DataAxis and functions which assume that axes have a scale or offset attribute, this is just an additional extension of that. In this case any time an unordered axis is asked to do something that it can't it will throw an error. This might be something we can be more explicit about but I don't think this is necessarily a bad thing.

For the unorderable axis this is an array of strings or some other object.

I think in this case I go back and forth. You might be right about adding in a labels attribute. I like the concept of not having labels for each index and I think has a fair bit of potential for indexing using custom labels. The hard part is managing both parts of slicing; Slicing with a label and slicing the labels object. I will also say that the opposite syntax makes more sense to me. i.e. velocity_axis.labels = {0: "v_x", 1: "v_y", 2: "v_z"} --> velocity_axis.labels = {"v_x":0, "v_y":1, "v_z":2}

1: Slicing with a label: velocity_signal.isig["v_x"]

In order to implement this would could add in a simple method:

def label2index(label, label_dict):
    if label in label_dict:
       return label_dict[label]
    else:
        return label

There is very low overhead for this, the implementation is trivial, and it would even allow things like temp.labels["liquid_water_range"]=slice(0.,100.)

2: Slicing the label dictionary:

This is slightly more difficult and could be computationally difficult if you have many different labels. Ideally we want our labels to support at least indexes and real values (and possibly slices/arrays/ROI's?)

def slice_labels(self, slice, labels):
    new_indexes = np.arrange(self.size)[slice]
    for key, value in labels.items:
        if isinstance(value, float):
            value = self.value2index(float)
        if not value in new_indexes:
           labels.pop(key)
        elif value in new_indexes and isinstance(value, ind):
           labels[key]=np.where(value==new_indexes)[0][0]
    return labels

The case where we might have a long list if labels is after something like slicing with an n-dimensinal boolean array. In that case any of the sliced dimensions are flattened and the resultant axis should be an index of the values for each of the axes. Lets consider a 4 dimensional array sliced with a 4 dimensional boolean array. This results in a 1-Dimensional array of values. If we call that one axis the index_axis then index_axis.values= {(1,1,2,1):0, (1,1,2,2):1, ..., (x,y,kx, ky):n} This way we maintain the axis information in the form of labeling each axis.

The advantage of using a dict is that it enables adding labels to a subset of the axis points e.g.:
temp = hs.axes.UniformDataAxis(name="temperature", units="degC", size=100)
temp.labels = {0: "Water freezing point", 100: "Water boiling point"}

I feel like this could be extremely useful in many cases! I think the general concept would be that labels would be an easy way to annotate and manipulate data without using ROI's.

francisco-dlp · 2022-10-30T07:42:35Z

I think that hyperspy should try to support as much numpy as possible.

Not sure about it: there are things that you can do with abstract arrays that you shouldn't do with the sort of data that HyperSpy deals with. HyperSpy is about data with axes that can be separated into "signal" and "navigation" axes. These simple constraints are HyperSpy's raison d'etre. They enable things like the map and plot methods, and, at the same time, limit the (e.g. numpy) features that it can support.

Technically we could support:

s.isig[:. [1,2,5,4]] # shape == (10,10,4)
s.isig[:,[.1,.2,.5,.4]] # equivalent to above

but what for? And how to implement it in a way that makes sense?

I will also say that the opposite syntax makes more sense to me. i.e. velocity_axis.labels = {0: "v_x", 1: "v_y", 2: "v_z"} --> velocity_axis.labels = {"v_x":0, "v_y":1, "v_z":2}

I see your point.

it would even allow things like temp.labels["liquid_water_range"]=slice(0.,100.)

Nice idea. Maybe this should operate at the level of the Signal, because we are talking about ROIs and ROIs are classes that operate on Signals. Currently we can save markers but not ROIs. I can see the appeal of storing ROIs in a way that they are easy to use. What about:

> s["liquid_water_range"] # Get's the right ROI from `s.metadata` and returnts `roi(s)`.

I feel like this could be extremely useful in many cases! I think the general concept would be that labels would be an easy way to annotate and manipulate data without using ROI's.

We have to be careful not to implement several ways of performing the same thing. Your comment has made me realize that the labels dictionary is somehow an alternative to Point1D ROIs.

CSSFrancis · 2022-10-30T21:38:27Z

I think that hyperspy should try to support as much numpy as possible.

Not sure about it: there are things that you can do with abstract arrays that you shouldn't do with the sort of data that HyperSpy deals with. HyperSpy is about data with axes that can be separated into "signal" and "navigation" axes. These simple constraints are HyperSpy's raison d'etre. They enable things like the map and plot methods, and, at the same time, limit the (e.g. numpy) features that it can support.

Technically we could support:
s.isig[:. [1,2,5,4]] # shape == (10,10,4)
s.isig[:,[.1,.2,.5,.4]] # equivalent to above
but what for? And how to implement it in a way that makes sense?

I agree that a lot of what makes hyperspy work is the constraints, that being said I think that the above code does fall under those constraints and is easily implemented using the existing infastructure. (simply changing from a UniformDataAxis --> DataAxis and slicing or keeping an axis a DataAxis and slicing works). From a hyperspy standpoint supporting slicing with multi-indexes or boolean arrays I would like to think of as very much on the brand of manipulating data but saving the underlying metadata and information about the axis.

I use these kinds of functions many times to mask some data, take the sum of some area, compare two data points, getting the intensity at a set of points on the data set. Additionally, in the case of having a list of vectors this type of slicing is very necessary to down select from a large list.

Unfortunately, this often requires me to operate on the .data attribute. Which turns into creating a new signal rather than dealing with the hassle of editing the data attribute and changing the axes myself.

I think that my workflow is about the worst thing we can have our users doing as it undermines the whole persistence of metadata, axes etc. That is why I think supporting as much of numpy as reasonable is important as it stops much of this behavior.

The implementation is all in #3055 if you are interested in taking a look. The diff might be large because it includes this PR but I submitted it as a draft because I think it helps make sense of some of these changes.

I will also say that the opposite syntax makes more sense to me. i.e. velocity_axis.labels = {0: "v_x", 1: "v_y", 2: "v_z"} --> velocity_axis.labels = {"v_x":0, "v_y":1, "v_z":2}

I see your point.

it would even allow things like temp.labels["liquid_water_range"]=slice(0.,100.)

Nice idea. Maybe this should operate at the level of the Signal, because we are talking about ROIs and ROIs are classes that operate on Signals. Currently we can save markers but not ROIs. I can see the appeal of storing ROIs in a way that they are easy to use. What about:
> s["liquid_water_range"] # Get's the right ROI from `s.metadata` and returnts `roi(s)`.
I feel like this could be extremely useful in many cases! I think the general concept would be that labels would be an easy way to annotate and manipulate data without using ROI's.

We have to be careful not to implement several ways of performing the same thing. Your comment has made me realize that the labels dictionary is somehow an alternative to Point1D ROIs.

I think that the distinction is that the labels attribute is associated with some axis rather than with some signal where as an ROI is related or even independent of a signal. The Point1D is just another way of directly indexing the data but we support both cases. I would rather see this as a way to annotate some axis or provide a secondary view. Maybe another way to think about this is in terms of how we support conversions between units. This is just an extension of that to a custom mapping.

Ultimately, I just think something that worked like this:

inten = s.isig["Cu_K", "Ni_Ka", "Si_Ka"]
print(s.axes_manager.signal_axes[0].axis) # [8.040, 7.477, 1.74]
inten.plot(labels=True) # Add in labels to graph

Is very intuitive and uses the nice new capabilities that hyperspy has added with the non-uniform axis pretty effectively.

CSSFrancis · 2022-10-31T10:52:38Z

@francisco-dlp The relevant change is here for the uniform axis.

if isinstance(_slice, (np.ndarray, list)):
    new_axis = self.convert_to_non_uniform_axis()
    return new_axis._slice_me(_slice)

In the slicing code then we just treat arrays and lists differently than tuples. numpy/numpy#9686 This is the relevant change to numpy whose syntax I really think we should follow as that is what many of our users understand.

I really don't think we want to diverge too much from this also because then we lose a lot of the functionality of a Signal being a duck array. (If numpy/ scipy ever figures that out 😃)

Currently something like s.isig[[1,2]] works but has a different behavior from numpy as it gets cast to a tuple. That seems like a recipe for confusion that is only going to get worse.

CSSFrancis · 2022-11-07T15:20:40Z

@francisco-dlp did we ever fall either way on this? I can move some of the code downstream to pyxem, but I do want to get as much into hyperspy as I can. Let me know if there is any part of this that might need some more discussion/ information.

francisco-dlp · 2022-11-07T17:34:12Z

Hi @CSSFrancis. Sorry to be slow to reply. I have some further thoughts on this and #3055, but I am too busy at the moment. I'll come back to you in the next few days.

CSSFrancis · 2022-11-07T18:28:51Z

@francisco-dlp No worries! Whenever you get around to it is fine, thanks for taking your time to review this.

francisco-dlp · 2022-11-09T14:45:06Z

Thinking about it, I backtrack from my suggestion to use a dict to store axis labels. As you mention, that would provide an alternative way to annotate data, but I don't think that that's a good idea. To annotate data we already have markers. Therefore, according to the Zen of Python, we shouldn't provide an alternative. Beside, flexible axes labels cannot offer the level of customization that markers do. So, what about simply storing labels in a numpy array? This is still different from your current implementation that stores the labels in the axis array. I think that it is better to store them in their own array so that i) axis items remain numbers (no need to consider a different case) ii) slicing simply slices the labels array too if present (simpler implementation) iii) the placement of the axis labels in a plot can we adjusted by adjusting the axis values.

CSSFrancis · 2022-11-14T15:27:57Z

@francisco-dlp Thanks for the comments.

Thinking about it, I backtrack from my suggestion to use a dict to store axis labels. As you mention, that would provide an alternative way to annotate data, but I don't think that that's a good idea. To annotate data we already have markers. Therefore, according to the Zen of Python, we shouldn't provide an alternative.

I think that makes sense. Maybe this does bring up a point that our idea of ROI's and markers are pretty similar in some cases. Maybe we can start to think of markers as a collection of ROI's or even as a baseSignal with a collection of ROI's? I sometimes think that the makers code can be a little obtuse, especially when you want different markers as a function of navigation position. Maybe an alternative is to make a MarkerSignal1D or (ROISignal1D) which can either have no navigation dimensions and be applied to the full signal or the same number of navigation dimensions which means it iterates alongside the signal when plotted.
I can maybe move this discussion elsewhere.

Beside, flexible axes labels cannot offer the level of customization that markers do. So, what about simply storing labels in a numpy array? This is still different from your current implementation that stores the labels in the axis array. I think that it is better to store them in their own array so that i) axis items remain numbers (no need to consider a different case) ii) slicing simply slices the labels array too if present (simpler implementation) iii) the placement of the axis labels in a plot can we adjusted by adjusting the axis values.

I think this is a pretty good compromise and I should be able to implement this fairly easily. That being said I am slightly hesitant about if someone adjusts the axis value it is difficult to track the changes with the label and that could cause confusion.

For example how can we handle s.axes_manager[0].axis = np.array([1,2,4,5,6]). We would probably have to drop the labels and then throw a warning. That's not the worst thing but no exactly ideal.

hakonanes · 2022-11-14T16:35:40Z

Pitching in as an extension developer on some points raised by @CSSFrancis. (Feel free to move this comment to another issue/discussion.)

Maybe we can start to think of markers as a collection of ROI's or even as a baseSignal with a collection of ROI's?

kikuchipy uses HyperSpy's markers actively to draw Kikuchi bands on patterns (see this figures in this tutorial). They are used as visual aid to confirm a correct indexing of electron backscatter diffraction (EBSD) patterns or a particular position of a pattern feature. I don't think this use fits the description of them as ROIs, at least not as mentioned here?

especially when you want different markers as a function of navigation position

The implementation in kikuchipy uses the same set of bands for all nav positions, but sets those not visible in a certain pattern to NaN. Each band's line coordinates vary per nav position. The current implementation of markers in HyperSpy fits this use perfectly. However, I have plans to make the marker coordinates calculated on-the-fly (lazily using Dask). Might be you've thought along these lines yourself, @CSSFrancis.

CSSFrancis · 2022-11-15T04:52:57Z

Pitching in as an extension developer on some points raised by @CSSFrancis. (Feel free to move this comment to another issue/discussion.)

Maybe we can start to think of markers as a collection of ROI's or even as a baseSignal with a collection of ROI's?

kikuchipy uses HyperSpy's markers actively to draw Kikuchi bands on patterns (see this figures in this tutorial). They are used as visual aid to confirm a correct indexing of electron backscatter diffraction (EBSD) patterns or a particular position of a pattern feature. I don't think this use fits the description of them as ROIs, at least not as mentioned here?

I see your point there. I was just commenting on some of the similarities between same markers and ROIs. There might even be some cases like creating a virtual dark field image from a set of ROI's where the line starts to get a little blurred. In that case we want the slicing capabilities of a ROI and the labeling and plotting capabilities of a marker. It's hard to say that something like a arrow or a text marker should be an ROI.

The way things are is good but this conversation sparked that thought.

especially when you want different markers as a function of navigation position

The implementation in kikuchipy uses the same set of bands for all nav positions, but sets those not visible in a certain pattern to NaN. Each band's line coordinates vary per nav position. The current implementation of markers in HyperSpy fits this use perfectly. However, I have plans to make the marker coordinates calculated on-the-fly (lazily using Dask). Might be you've thought along these lines yourself, @CSSFrancis.

Sorry that was kind of a throwaway line and I should have explained better. I also realize that this should probably be moved to a separate issue.

Maybe this isn't as straight forward as I thought :) I always considered having to create as many markers as you needed for the signal with the most markers and setting some markers to nan a bug and not a feature. In your case I guess I could go either way. I will say that the way we are currently doing it is slow.

When I want to make a set of markers that change with every navigation dimension I have to create a set of markers that is as long as the longest set of vectors(which can be very long) and add a bunch of np.nan markers. Once you add a bunch of these markers to a signal this slows down the plotting a lot. This is because each marker is plotted as a separate artist, so having 30+ artists really slows down Matplotlib and makes navigating a pain.

A better way would be to be able to define all the markers as a ragged array of vectors. Then just add one marker to the signal with the data equal to that ragged array.

But to that point when does this ragged array of vectors defining some set of markers start to look suspiciously like a ragged signal. We can even think of that set of makers as having a defined navigation dimension which can now be sliced alongside the signal or think about returning a set of markers from the map function. The idea of lazy markers is much easier to consider as well in this case as you don't need to know how many sets of markers you need.

hakonanes · 2022-11-15T16:37:40Z

Thank you for elaborating. I agree with all your points made, and I'd say your experience with markers describes mine well. The creation of markers (geometrical simulations) in kikuchipy is currently using too much memory, so they cannot be added to a large dataset (> a couple GB) anyway. If we ever fix that, I hope to find time to look more into using ragged arrays as you say.

ericpre · 2022-10-16T20:10:49Z

hyperspy/axes.py


-    def update_axis(self):
+    @property
+    def axis(self):


An alternative would be not to use @cached_property and to do the cache manually. I can have a look at it.

ericpre · 2024-01-09T18:32:39Z

hyperspy/_signals/lazy.py

        dcshape = dc.shape
        for _axis in self.axes_manager._axes:
            if _axis.index_in_array < len(dcshape):
-                _axis.size = int(dcshape[_axis.index_in_array])
+                if _axis.size != int(dcshape[_axis.index_in_array]):
+                    if _axis.is_uniform:
+                        _axis.size = int(dcshape[_axis.index_in_array])
+                    else:
+                        _logger.warning(f"Axis {_axis} converted to a UniformDataAxis")
+                        _axis.convert_to_uniform_axis()
+                        _axis.size = int(dcshape[_axis.index_in_array])


This function is about getting chunks and it is surprise that to do that it needs to edit the axis! Could it be a workaround for the data shape and the axis are out of sync?

The problem here is related to the fact that the axis and the array shape can be out of sync. This causes some issues if they are out of sync and try to chunk the data.

Use BaseSignal.get_dimensions_from_data instead?

ericpre · 2024-01-11T18:58:12Z

hyperspy/axes.py

+            else:
+                # the axis is not ordered
+                return None
+        except np.core._exceptions._UFuncNoLoopError:  # subtracting array of objects/strings will fail


Avoid using numpy private API: np.core._exceptions._UFuncNoLoopError is a subclass of TypeError:

Suggested change

except np.core._exceptions._UFuncNoLoopError: # subtracting array of objects/strings will fail

except TypeError: # subtracting array of objects/strings will fail

@ericpre Good point! I'll make the substitution

ericpre · 2024-01-11T20:43:43Z

hyperspy/axes.py

-        self.__class__ = UniformDataAxis
-        self.__init__(**d, size=self.size, scale=scale, offset=self.low_value)
-        self.axes_manager = axes_manager
+        new_axis = UniformDataAxis(**d, size=size, scale=scale, offset=offset)


The problem with returning a new object is that the events connected to the old one will be lost.

CSSFrancis · 2024-01-12T03:24:35Z

The problem with returning a new object is that the events connected to the old one will be lost.

@ericpre I'll make sure that this isn't the case...

Seems like the things we need to make sure are handled correctly

Cases where non ordered nonUniform axes don't work (i.e. models) should raise errors.
Cases where the data.shape != the axes_manager axes shapes. Should raise errors or... more ideally be automatically fixed
Events and object propagation. Axes should maintain events when changing types. Axes in axes managers should be adjusted inplace

Any other things to check?

ericpre

Regarding the changes with convert_to_uniform_axis, before remove_trait/add_trait were used to handle trait dynamically but why is it not possible anymore?

ericpre · 2024-01-12T19:10:41Z

hyperspy/_signals/lazy.py

        dcshape = dc.shape
        for _axis in self.axes_manager._axes:
            if _axis.index_in_array < len(dcshape):
-                _axis.size = int(dcshape[_axis.index_in_array])
+                if _axis.size != int(dcshape[_axis.index_in_array]):
+                    if _axis.is_uniform:
+                        _axis.size = int(dcshape[_axis.index_in_array])
+                    else:
+                        _logger.warning(f"Axis {_axis} converted to a UniformDataAxis")
+                        _axis.convert_to_uniform_axis()
+                        _axis.size = int(dcshape[_axis.index_in_array])


Use BaseSignal.get_dimensions_from_data instead?

ericpre · 2024-01-12T19:14:37Z

hyperspy/axes.py

@@ -933,25 +969,20 @@ def __init__(self,
                 size=1,
                 is_binned=False,
                 **parameters):
-        super().__init__(
+        super(DataAxis, self).__init__(


Is it necessary to add DataAxis, self in the super call here, is it not old python 2.x syntax?

ericpre · 2024-01-12T19:21:11Z

hyperspy/axes.py

+    @observe("_axes")
+    @observe(trait("_axes", notify=False).list_items())


Suggested change

@observe("_axes")

@observe(trait("_axes", notify=False).list_items())

# These two `observe` decorators are necessary to make sure that an item of the list is changed in place

@observe("_axes")

@observe(trait("_axes", notify=False).list_items())

ericpre · 2024-01-13T12:43:46Z

Any other things to check?

Nothing that I can think of but even if something is broken after this PR, this is a good time to do it!

ericpre · 2024-02-15T19:44:48Z

@CSSFrancis, are you happy if I take over this PR? I suspect you may be busy with others things! 😃

CSSFrancis · 2024-02-16T04:41:35Z

@ericpre yea I kind of am trying to take a step back to finish up some pyxem stuff so if you pick this up I'd be very happy!

Let me know if you help with anything! And I can make some time to look over this again.

CSSFrancis mentioned this pull request Sep 23, 2022

Axes Refactor #3012

Closed

16 tasks

ericpre added the run-extension-tests Run extension test suites label Sep 24, 2022

ericpre removed the run-extension-tests Run extension test suites label Sep 29, 2022

ericpre reviewed Sep 29, 2022

View reviewed changes

ericpre reviewed Sep 30, 2022

View reviewed changes

ericpre added this to the v2.0 Split milestone Oct 1, 2022

ericpre added the status: WIP label Oct 1, 2022

CSSFrancis mentioned this pull request Oct 7, 2022

Handling data and AxesManager Agreement #3041

Open

ericpre reviewed Oct 8, 2022

View reviewed changes

ericpre reviewed Oct 16, 2022

View reviewed changes

CSSFrancis mentioned this pull request Oct 24, 2022

Support slicing with Boolean Arrays #3051

Open

CSSFrancis requested a review from ericpre October 27, 2022 13:06

CSSFrancis mentioned this pull request Oct 27, 2022

Fancy Slicing Refactor #3055

Open

12 tasks

ericpre reviewed Jan 11, 2024

View reviewed changes

ericpre reviewed Jan 12, 2024

View reviewed changes

ericpre mentioned this pull request Mar 2, 2024

ROI pointer #3325

Open

ericpre added this to the v2.1 milestone Apr 7, 2024

CSSFrancis and others added 19 commits April 12, 2024 20:50

Refactor: Refactored BaseDataAxis

78639f3

Refactor: DataAxis attributes--> Properties

e21015d

Refactor: DataAxis to include logic for slicing

0c24a50

Refactor: Refactor initializing FunctionalDataAxis

5e0a1d6

Refactor: Refactor Cleaned up the rest of the FunctionalDataAxis

f40ad34

Refactor: Refactor cleaned up UniformDataAxis

49674a6

Refactor: Refactor cleaned up AxesManager

fb3d3a3

Refactor: Refactor removed update_axis code

a3b32a7

TestRefactor: Cleaned up tests for axes/axes_manager

ba4a078

TestRefactor: Cleaned signal to account for automatically-updating axis

4b568ca

TestRefactor: Updated tests to passing and increased coverage

c478b4b

TestRefactor: Fixed failing tests when getting dimensions from data

cd99894

Testing: Increased testing for axes

c570e30

Testing: Added tests for axes conversion without axes manager

83aa877

Testing: Added tests for axes manager

0ff593d

Testing: Fixed failing test which illegally set the axis value.

b8a42a3

Bugfix: Fixed non-increasing axis model error.

ed7b184

Documentation: Clean up docstrings

fc53072

Fix linting and formatting

c89e8c5

ericpre force-pushed the PropertyAttributeAxisRefactor branch from cc21fbc to c89e8c5 Compare April 12, 2024 20:01

ericpre modified the milestones: v2.1, v2.2 May 8, 2024

	except np.core._exceptions._UFuncNoLoopError: # subtracting array of objects/strings will fail
	except TypeError: # subtracting array of objects/strings will fail

		@observe("_axes")
		@observe(trait("_axes", notify=False).list_items())

Handling Axis attributes with @property decorator #3031

Are you sure you want to change the base?

Handling Axis attributes with @property decorator #3031

Conversation

CSSFrancis commented Sep 23, 2022 • edited

Description of the change

Progress of the PR

Minimal example of the bug fix or the new feature

codecov bot commented Sep 23, 2022 • edited

Codecov Report

CSSFrancis commented Sep 28, 2022

ericpre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CSSFrancis Oct 13, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CSSFrancis Oct 7, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CSSFrancis Oct 6, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericpre left a comment

Choose a reason for hiding this comment

ericpre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CSSFrancis commented Oct 8, 2022

CSSFrancis commented Oct 14, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CSSFrancis Oct 17, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

francisco-dlp commented Oct 28, 2022

CSSFrancis commented Oct 28, 2022 • edited

Let's start with the unordered axis. Which is numeric array but not necessarily increasing or decreasing.

For the unorderable axis this is an array of strings or some other object.

francisco-dlp commented Oct 30, 2022

CSSFrancis commented Oct 30, 2022 • edited

CSSFrancis commented Oct 31, 2022

CSSFrancis commented Nov 7, 2022

francisco-dlp commented Nov 7, 2022

CSSFrancis commented Nov 7, 2022

francisco-dlp commented Nov 9, 2022 • edited

CSSFrancis commented Nov 14, 2022 • edited

hakonanes commented Nov 14, 2022

CSSFrancis commented Nov 15, 2022 • edited

hakonanes commented Nov 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CSSFrancis commented Jan 12, 2024

ericpre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CSSFrancis commented Sep 23, 2022 •

edited

codecov bot commented Sep 23, 2022 •

edited

CSSFrancis Oct 13, 2022 •

edited

CSSFrancis Oct 7, 2022 •

edited

CSSFrancis Oct 6, 2022 •

edited

CSSFrancis commented Oct 14, 2022 •

edited

CSSFrancis Oct 17, 2022 •

edited

CSSFrancis commented Oct 28, 2022 •

edited

CSSFrancis commented Oct 30, 2022 •

edited

francisco-dlp commented Nov 9, 2022 •

edited

CSSFrancis commented Nov 14, 2022 •

edited

CSSFrancis commented Nov 15, 2022 •

edited