Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Axis attributes with @property decorator #3031

Open
wants to merge 19 commits into
base: RELEASE_next_minor
Choose a base branch
from

Conversation

CSSFrancis
Copy link
Member

@CSSFrancis CSSFrancis commented Sep 23, 2022

Description of the change

This set of changes supersedes #3012 and #2876 in the hope of creating small incremental changes to how Axes and slicing occurs before fully supporting vector signals. @ericpre sorry about the confusion, but the changes to how the slicing works are rather larger than anticipated so I wanted to do this slow and incrementally.

This change better defines what are the key attributes of an axes and focuses on only allowing changes to those attributes.

  • BaseDataAxis: The base class it has only a name and unit as well as attributes related to interfacing with the AxesManager.
  • DataAxis: The most flexible of the Axes. It has an axes attribute which is a 1 dimensional array. This can be increasing order, decreasing order or arbitrarily ordered. (Note the last case does not support any kind of indexing yet)
  • FunctionalDataAxis: Has an axis property that is calculated from x and the given function.
  • UniformDataAxis: Has an axis property calculated from the size, offset and scale.
  • AxesManager: Properties in the axes manager are directly pulled from the axes in the list.

These changes help to stop a lot of small bugs within the axes code. Things like being able to set the size of a DataAxis don't make sense when the size is directly related to the axis. Similarly with the UniformDataAxis, directly setting the Axis should throw an error.

This also adds in some limited support for unordered (or unorderable axes) which are necessary for vector applications.

Progress of the PR

  • Change implemented (can be split into several points),
  • add tests,
  • ready for review.

Minimal example of the bug fix or the new feature

import hyperspy.api as hs

ax = hs.axes.DataAxis(["a", "b", "c"]) # No longer throws an error
ax.size = 10 # Throws AttributeError

ax = hs.axes.UniformDataAxis(size=10, scale=3, offset=1) 
ax.size = 10
ax.axis = [10,12,13,14] # Throws AttributeError

@CSSFrancis CSSFrancis mentioned this pull request Sep 23, 2022
16 tasks
@codecov
Copy link

codecov bot commented Sep 23, 2022

Codecov Report

Attention: Patch coverage is 94.65649% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 80.59%. Comparing base (18caceb) to head (c89e8c5).

Files Patch % Lines
hyperspy/axes.py 97.81% 1 Missing and 4 partials ⚠️
hyperspy/_signals/lazy.py 33.33% 3 Missing and 1 partial ⚠️
hyperspy/signal.py 50.00% 2 Missing and 1 partial ⚠️
hyperspy/models/model1d.py 89.47% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@                  Coverage Diff                   @@
##           RELEASE_next_minor    #3031      +/-   ##
======================================================
+ Coverage               80.54%   80.59%   +0.05%     
======================================================
  Files                     147      147              
  Lines                   21868    21883      +15     
  Branches                 5147     5185      +38     
======================================================
+ Hits                    17613    17637      +24     
+ Misses                   3037     3024      -13     
- Partials                 1218     1222       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ericpre ericpre added the run-extension-tests Run extension test suites label Sep 24, 2022
@CSSFrancis
Copy link
Member Author

The next PR which handles the slicing is found here https://github.com/CSSFrancis/hyperspy/tree/FancySlicingRefactor if that helps understand some of the motivations for these changes.

@ericpre ericpre removed the run-extension-tests Run extension test suites label Sep 29, 2022
Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @CSSFrancis, this is very good!
There are a few details, which are worth tweaking further, but this is significant simplification of the axes implementation.
I guess this is one of these things which grew over time with more functionalities and ended up being a bit of a mess!

hyperspy/signal.py Show resolved Hide resolved
hyperspy/axes.py Outdated Show resolved Hide resolved
hyperspy/signal.py Outdated Show resolved Hide resolved
hyperspy/signal.py Outdated Show resolved Hide resolved
hyperspy/signal.py Outdated Show resolved Hide resolved
hyperspy/axes.py Outdated Show resolved Hide resolved
hyperspy/axes.py Outdated

def update_axis(self):
@property
def axis(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth caching it in case the function calculation is slow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be potentially useful but I am a little hesitant about caching the axis as I think then we start going down the rabbit hole of adding a bunch of extra traits and listeners just to make sure things work correctly and even there it could get complicated and isn't guaranteed . I would rather that some function which repeatedly calls the axis property use something like ax1 = FunctionalDataAxis.axis and then use that instance repeatedly.

I can add properties which directly calculate the high and low values. Rather than getting the values from the axis which might help some.

Either way it might be worth thinking about a case where caching speeds up things significantly. Even for a size of 1,000,000 values it only takes about 10 ms to calculate something like the sin or log of each value. I think in this case the trade off of speed for simplicity might be worth it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I think the way to do this if we want to go down this path would be to keep the _axis attribute for all axes.

Then we could define a function like this:

@observe('scale', postinit=True)
@observe('offset', postinit=True)
@observe('size', postinit=True)
def update_axis(event):
    self._axis  = np.arange(self.size)*self.scale+self.offset

Then for the FunctionalDataAxis we could create a function

@observe(trait("x._axis", notify=False).list_items(), postinit=True)
@observe(self.parameters_list, postinit=True)
def update_axis(event):
    kwargs = {}
    for kwarg in self.parameters_list:
        kwargs[kwarg] = getattr(self, kwarg)
    new_axis = self._function(x=self.x.axis, **kwargs)
    # Set not valid values to np.nan
    new_axis[np.logical_not(np.isfinite(new_axis))] = np.nan
    self._axis = new_axis

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the reason I don't really like this is the fact that even though the ._axis property is protected you could still directly modify the values for a UniformDataAxis and FunctionalDataAxis

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way it might be worth thinking about a case where caching speeds up things significantly. Even for a size of 1,000,000 values it only takes about 10 ms to calculate something like the sin or log of each value. I think in this case the trade off of speed for simplicity might be worth it.

The calculation itself may not take very long but when it is called repetitively, it can slow down other operations. An example of this is fitting (#2949) but in this specific case, I haven't check if it introduces a performance regression!
Better to leave it as it is for now and come back to it later.

Then we could define a function like this:

@observe('scale', postinit=True)
@observe('offset', postinit=True)
@observe('size', postinit=True)
def update_axis(event):
    self._axis  = np.arange(self.size)*self.scale+self.offset

Using traits.api.observe is good, as the old one is deprecated. It seems that it comes advantages, that may be beneficial for us.
I added an item to the 2.0 release list to migrate the new Traits API in other place of the code.

Part of the reason I don't really like this is the fact that even though the ._axis property is protected you could still directly modify the values for a UniformDataAxis and FunctionalDataAxis

I wouldn't worry about it, because it is a common pattern in python.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking through the documentation for Traits I think it should be fairly easy to cache the property. I hadn't looked into the additional capabilities for caching.

from traits.api import cached_property, Property

axis =  Property(observe=['scale", "size", "offset"])


@cached_property
def _get_axis(self):
    return np.arange(self.size)*self.scale+self.offset

https://docs.enthought.com/traits/traits_user_manual/advanced.html#caching-a-property-value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it isn't quite this easy... Nothing with the traits package has been quite as intuitive as I first thought....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericpre So there are a few problems with this. Mostly that inplace editing of traits is not supported. We have been operating by adding traits to each object individually rather than adding traits to the Class as a whole which is not well supported for . The problem is that when we change the __class__ variable and call __init__ we skip over the __new__ function where the listeners are set for some class.

Unfortunately I think that means that in order to cache axes we would have copy axes when converting from one type to the next.

That means that s.axes_manager[0].convert_to_nonuniform_axis() --> s.axes_manager[0] = s.axes_manager[0].convert_to_nonuniform_axis()

It isn't a huge change but would allow us to define the Axes classes like this.

from traits.api import (cached_property, HasTraits, Str,Float, Int,observe, Property,Array, Undefined)

class BaseDataAxis(HasTraits):
    name=Str(Undefined)
    units=Str(Undefined)  

class DataAxis(BaseDataAxis):
    axis = Array()

class FunctionalDataAxis(DataAxis):
    x = Instance(BaseDatatAxis)
    axis = Property(observe=["x.axis"] # overwrites axis in DataAxis

    @cached_property
    def _get_axis(self):
        print("calculating axis")
        return self.func(x.axis, self.parameters)

class UniformDataAxis(DataAxis):
    scale = Float(1)
    offset = Float(0)
    size = Int(1)
    axis = Property(observe=["scale", "offset", "size"] # overwrites axis in DataAxis

    @cached_property
    def _get_axis(self):
        print("calculating axis")
        return np.arange(self.size)*self.scale+self.offset

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be not to use @cached_property and to do the cache manually. I can have a look at it.

hyperspy/axes.py Outdated
navigate=navigate,
is_binned=is_binned,
**kwargs)
self.add_trait("_axis", t.Array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to be a trait?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this just because I didn't want anyone trying to set the _axis with anything other than an array as that would complicate the fancy slicing quite a bit.

Copy link
Member Author

@CSSFrancis CSSFrancis Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that the self.events.any_axis_changed.trigger(obj=self) should probably trigger when the _axis is changed. I am not entirely sure what in the drawing is linked to that event. That does argue for keeping the _axis attribute as a trait.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this just because I didn't want anyone trying to set the _axis with anything other than an array as that would complicate the fancy slicing quite a bit.

Okay, maybe worth adding a short comment.

I should note that the self.events.any_axis_changed.trigger(obj=self) should probably trigger when the _axis is changed. I am not entirely sure what in the drawing is linked to that event. That does argue for keeping the _axis attribute as a trait.

Yes.

hyperspy/axes.py Outdated
navigate = t.Bool(False)
is_binned = t.Bool(False)
index = t.Range('low_index', 'high_index')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index or _index still need to be a trait to be synchronise with the GUI widgets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the index as a trait for the DataAxis and not for the BaseDataAxis. Is that okay? My idea is that the BaseDataAxis would only ever be used if you are trying to define something like a ragged dimension. In which case the index doesn't have a physical sense. I can set it as if that works better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it would have been less confusing if my comment was in a better place. What I meant is that the index or _index needs to be a traiits type, e.g. an instance of traits.api.Range.

Copy link
Member Author

@CSSFrancis CSSFrancis Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that makes more sense. That requires that the low and high indexes be traits as well. Or at least to my understanding that is the case. I currently have it so that the index is set with a @index.setter function which won't update outside of the index high and low. I'm not terribly familiar with how else this might interact with the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that makes more sense. That requires that the low and high indexes be traits as well. Or at least to my understanding that is the case.

Yes, this is correct.

I currently have it so that the index is set with a @index.setter function which won't update outside of the index high and low. I'm not terribly familiar with how else this might interact with the API.

This is a details, which will be to sorted in this PR and we can come back to it toward the end of this PR when other things are settled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this runs into some complications as t.Range needs to be set with a value and not a Property initially.

We could do this:

class DataAxis:
    axis = t.Array()  # Axis added as trait to track changes to axis
    high_index = t.Int(1)
    low_index = 0
    _index = t.Range(low_index, high_index)

    @observe("axis")
    def update_high_index(self, event):
        self.high_index = self.size-1

Or keep it the way it is. I don't think it is too confusing right now but either works!

hyperspy/axes.py Outdated Show resolved Hide resolved
Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should take this opportunity to add comments describing the expected use of properties and the reasoning for the relationship between the different axis and why different attributes/properties are defined in this specific way, etc.

Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have done a review but forgot to submit it... I updated my comments with the latest changes.

hyperspy/axes.py Outdated
navigate=navigate,
is_binned=is_binned,
**kwargs)
self.add_trait("_axis", t.Array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this just because I didn't want anyone trying to set the _axis with anything other than an array as that would complicate the fancy slicing quite a bit.

Okay, maybe worth adding a short comment.

I should note that the self.events.any_axis_changed.trigger(obj=self) should probably trigger when the _axis is changed. I am not entirely sure what in the drawing is linked to that event. That does argue for keeping the _axis attribute as a trait.

Yes.

hyperspy/axes.py Outdated

def update_axis(self):
@property
def axis(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way it might be worth thinking about a case where caching speeds up things significantly. Even for a size of 1,000,000 values it only takes about 10 ms to calculate something like the sin or log of each value. I think in this case the trade off of speed for simplicity might be worth it.

The calculation itself may not take very long but when it is called repetitively, it can slow down other operations. An example of this is fitting (#2949) but in this specific case, I haven't check if it introduces a performance regression!
Better to leave it as it is for now and come back to it later.

Then we could define a function like this:

@observe('scale', postinit=True)
@observe('offset', postinit=True)
@observe('size', postinit=True)
def update_axis(event):
    self._axis  = np.arange(self.size)*self.scale+self.offset

Using traits.api.observe is good, as the old one is deprecated. It seems that it comes advantages, that may be beneficial for us.
I added an item to the 2.0 release list to migrate the new Traits API in other place of the code.

Part of the reason I don't really like this is the fact that even though the ._axis property is protected you could still directly modify the values for a UniformDataAxis and FunctionalDataAxis

I wouldn't worry about it, because it is a common pattern in python.

hyperspy/signal.py Show resolved Hide resolved
hyperspy/axes.py Show resolved Hide resolved
hyperspy/axes.py Show resolved Hide resolved
@@ -2001,18 +2011,16 @@ def update_axes_attributes_from(self, axes,
if changes:
self.events.any_axis_changed.trigger(obj=self)

def _update_attributes(self):
@observe("_axes")
@observe(trait("_axes", notify=False).list_items())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure that I am following correctly, @observe(trait("_axes", notify=False).list_items()) is to call the add_axis_manger method when the attribute, which are of traits instances, are changed? If so, I don't see why we need it here? The axes_manager attribute of the axis needs to be set only when the axis is added to a AxesManager (currently covered by @observe("_axes"))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the first case is triggered by someone setting the axes value. Something like am.axes=[axis1, axis2]

Because lists are mutable in place though that observation isn't triggered by a inplace operation. For that we need the second observation which triggers when something inplace occurs. Ie. am.axes[1]= axis1

See https://docs.enthought.com/traits/traits_user_manual/notification.html#notification-handler for more details.

hyperspy/axes.py Outdated
navigate = t.Bool(False)
is_binned = t.Bool(False)
index = t.Range('low_index', 'high_index')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that makes more sense. That requires that the low and high indexes be traits as well. Or at least to my understanding that is the case.

Yes, this is correct.

I currently have it so that the index is set with a @index.setter function which won't update outside of the index high and low. I'm not terribly familiar with how else this might interact with the API.

This is a details, which will be to sorted in this PR and we can come back to it toward the end of this PR when other things are settled.

hyperspy/axes.py Show resolved Hide resolved
@CSSFrancis
Copy link
Member Author

Sorry, I have done a review but forgot to submit it... I updated my comments with the latest changes.

All good :) thanks for the review. To be honest I did the same thing the first time around when I replied...

@CSSFrancis
Copy link
Member Author

CSSFrancis commented Oct 14, 2022

@ericpre This should be in a little bit better of shape now. I added in the ability to cache the axis value for both the FunctionalDataAxis and the UniformDataAxis. Let me know what still needs to be done to get this merged. There is some documentation lacking but for the User Guide I was intending to do that all at once. Once this is merged I will submit the PR for fancy slicing and we can go from there.

The new changes improve the self documentation a little bit as well and are more in line with how the Traits package should be used. The downside of this is that there is less no way to properly handle (at least to my knowledge) the inplace conversion from one axis to another. This type of operation doens't properly set up the listeners for the different Axes types. Additionally adding traits to the FunctionalDataAxis is not well supported. Different parameters will have to be adjusted using ax.parameters["power"]=3.

For converting some axis we need to do:

s.axes_manager[0] = s.axes_manager[0] .convert_to_non_uniform_axis()

Not a huge api change but this can get a little clunky at times.

It might be better to add something along the lines of :

a.axes_manager.convert(axis=0, "non_uniform"]

hyperspy/axes.py Outdated
self.__class__ = UniformDataAxis
self.__init__(**d, size=self.size, scale=scale, offset=self.low_value)
self.axes_manager = axes_manager
new_axis = UniformDataAxis(**d, size=size, scale=scale, offset=offset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is convenient to convert inplace and it would be good to avoid returning a different object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see if there is a way to do it. I think it means that we have manage how the changes to different attributes are handled which is a bit more involved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll become an expert at the Traits package by the end of this :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericpre Well it only took me a couple of days but I think the solution was a lot easier than I was making it. How do you feel about this?

    def convert_to_non_uniform_axis(self):
        """Convert to a non-uniform axis."""
        d = super().get_axis_dictionary()
        d["_type"] = 'DataAxis'
        new_axis = DataAxis(**d)
        if self.axes_manager is not None:
            self.axes_manager[self] = new_axis
        else:
            return new_axis

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with returning a new object is that the events connected to the old one will be lost.

@francisco-dlp
Copy link
Member

Nice job @CSSFrancis ! This is a much needed refreshment of the axes code.

What do you mean with "unordered (or unorderable axes)"? Is it hs.axes.DataAxis(["a", "b", "c"])? It it is what it seems to me, I wonder if it wouldn't be better to implement this feature as simply axis labels. For example, an axis could have an extra optional attribute, labels, that could be a dictionary, containing labels. For example:

velocity_axis = hs.axes.UniformDataAxis(name="velocity", units="m/s", size=3)
velocity_axis.labels = {0: "v_x", 1: "v_y", 2: "v_z"}

The advantage of using an extra attribute is that .axis remains a numeric array. Therefore, all code assuming this doesn't break. Also this enables adding labels that are not uniformly distributed over the axis (using DataAxis or NonUniformDataAxis.

The advantage of using a dict is that it enables adding labels to a subset of the axis points e.g.:

temp = hs.axes.UniformDataAxis(name="temperature", units="degC", size=100)
temp.labels = {0: "Water freezing point", 100: "Water boiling point"}

@CSSFrancis
Copy link
Member Author

CSSFrancis commented Oct 28, 2022

Nice job @CSSFrancis ! This is a much needed refreshment of the axes code.

@francisco-dlp Thanks for the response. This was exactly the kind of idea I was hoping this PR would generate.

What do you mean with "unordered (or unorderable axes)"? Is it hs.axes.DataAxis(["a", "b", "c"])? It it is what it seems to me, I wonder if it wouldn't be better to implement this feature as simply axis labels. For example, an axis could have an extra optional attribute, labels, that could be a dictionary, containing labels. For example:

Yes I would say that there are both unordered axes, something like hs.axes.DataAxis(axis=[1,3,2,5,4,7]) and unorderable axes which would be something like hs.axes.DataAxis(axis=["a","b","c","d","e"]) or even something like

hs.axes.DataAxis(axis=[hs.axes.LabeledColumn(name="x", units="nm", scale=.1),
                       hs.axes.LabeledColumn(name="y", units="nm", scale=0.1),
                       hs.axes.LabeledColumn(name="kx", units="nm^-1", scale=0.01),
                       hs.axes.LabeledColumn(name="ky", units="nm^-1", scale=0.01)])

There are a couple of points for allowing each of these types but I do like the idea of adding a labels dictionary attribute.

Let's start with the unordered axis. Which is numeric array but not necessarily increasing or decreasing.

  • I think that hyperspy should try to support as much numpy as possible. This includes something like:
data = np.ones((10,10,10))
s = hs.signals.Signal2D(data)
s.axes_manager.signal_axes[1].scale=.1
data[:, :, [1,2,5,4]] # returns the 1st, 2nd, 5th and 4th indexes 
data.shape # (10, 10, 4)
s.isig[:. [1,2,5,4]] # shape == (10,10,4)
s.isig[:,[.1,.2,.5,.4]] # equivalent to above

This requires the support of unordered axes. I don't think that this is too hard of an ask. There is already a fair bit of missing support for the DataAxis and functions which assume that axes have a scale or offset attribute, this is just an additional extension of that. In this case any time an unordered axis is asked to do something that it can't it will throw an error. This might be something we can be more explicit about but I don't think this is necessarily a bad thing.

For the unorderable axis this is an array of strings or some other object.

I think in this case I go back and forth. You might be right about adding in a labels attribute. I like the concept of not having labels for each index and I think has a fair bit of potential for indexing using custom labels. The hard part is managing both parts of slicing; Slicing with a label and slicing the labels object. I will also say that the opposite syntax makes more sense to me. i.e. velocity_axis.labels = {0: "v_x", 1: "v_y", 2: "v_z"} --> velocity_axis.labels = {"v_x":0, "v_y":1, "v_z":2}

1: Slicing with a label: velocity_signal.isig["v_x"]

  • In order to implement this would could add in a simple method:
def label2index(label, label_dict):
    if label in label_dict:
       return label_dict[label]
    else:
        return label
  • There is very low overhead for this, the implementation is trivial, and it would even allow things like temp.labels["liquid_water_range"]=slice(0.,100.)

2: Slicing the label dictionary:

  • This is slightly more difficult and could be computationally difficult if you have many different labels. Ideally we want our labels to support at least indexes and real values (and possibly slices/arrays/ROI's?)
def slice_labels(self, slice, labels):
    new_indexes = np.arrange(self.size)[slice]
    for key, value in labels.items:
        if isinstance(value, float):
            value = self.value2index(float)
        if not value in new_indexes:
           labels.pop(key)
        elif value in new_indexes and isinstance(value, ind):
           labels[key]=np.where(value==new_indexes)[0][0]
    return labels

The case where we might have a long list if labels is after something like slicing with an n-dimensinal boolean array. In that case any of the sliced dimensions are flattened and the resultant axis should be an index of the values for each of the axes. Lets consider a 4 dimensional array sliced with a 4 dimensional boolean array. This results in a 1-Dimensional array of values. If we call that one axis the index_axis then index_axis.values= {(1,1,2,1):0, (1,1,2,2):1, ..., (x,y,kx, ky):n} This way we maintain the axis information in the form of labeling each axis.

The advantage of using a dict is that it enables adding labels to a subset of the axis points e.g.:

temp = hs.axes.UniformDataAxis(name="temperature", units="degC", size=100)
temp.labels = {0: "Water freezing point", 100: "Water boiling point"}

I feel like this could be extremely useful in many cases! I think the general concept would be that labels would be an easy way to annotate and manipulate data without using ROI's.

@francisco-dlp
Copy link
Member

I think that hyperspy should try to support as much numpy as possible.

Not sure about it: there are things that you can do with abstract arrays that you shouldn't do with the sort of data that HyperSpy deals with. HyperSpy is about data with axes that can be separated into "signal" and "navigation" axes. These simple constraints are HyperSpy's raison d'etre. They enable things like the map and plot methods, and, at the same time, limit the (e.g. numpy) features that it can support.

Technically we could support:

s.isig[:. [1,2,5,4]] # shape == (10,10,4)
s.isig[:,[.1,.2,.5,.4]] # equivalent to above

but what for? And how to implement it in a way that makes sense?

I will also say that the opposite syntax makes more sense to me. i.e. velocity_axis.labels = {0: "v_x", 1: "v_y", 2: "v_z"} --> velocity_axis.labels = {"v_x":0, "v_y":1, "v_z":2}

I see your point.

it would even allow things like temp.labels["liquid_water_range"]=slice(0.,100.)

Nice idea. Maybe this should operate at the level of the Signal, because we are talking about ROIs and ROIs are classes that operate on Signals. Currently we can save markers but not ROIs. I can see the appeal of storing ROIs in a way that they are easy to use. What about:

> s["liquid_water_range"] # Get's the right ROI from `s.metadata` and returnts `roi(s)`.

I feel like this could be extremely useful in many cases! I think the general concept would be that labels would be an easy way to annotate and manipulate data without using ROI's.

We have to be careful not to implement several ways of performing the same thing. Your comment has made me realize that the labels dictionary is somehow an alternative to Point1D ROIs.

@CSSFrancis
Copy link
Member Author

CSSFrancis commented Oct 30, 2022

I think that hyperspy should try to support as much numpy as possible.

Not sure about it: there are things that you can do with abstract arrays that you shouldn't do with the sort of data that HyperSpy deals with. HyperSpy is about data with axes that can be separated into "signal" and "navigation" axes. These simple constraints are HyperSpy's raison d'etre. They enable things like the map and plot methods, and, at the same time, limit the (e.g. numpy) features that it can support.

Technically we could support:

s.isig[:. [1,2,5,4]] # shape == (10,10,4)
s.isig[:,[.1,.2,.5,.4]] # equivalent to above

but what for? And how to implement it in a way that makes sense?

I agree that a lot of what makes hyperspy work is the constraints, that being said I think that the above code does fall under those constraints and is easily implemented using the existing infastructure. (simply changing from a UniformDataAxis --> DataAxis and slicing or keeping an axis a DataAxis and slicing works). From a hyperspy standpoint supporting slicing with multi-indexes or boolean arrays I would like to think of as very much on the brand of manipulating data but saving the underlying metadata and information about the axis.

I use these kinds of functions many times to mask some data, take the sum of some area, compare two data points, getting the intensity at a set of points on the data set. Additionally, in the case of having a list of vectors this type of slicing is very necessary to down select from a large list.

Unfortunately, this often requires me to operate on the .data attribute. Which turns into creating a new signal rather than dealing with the hassle of editing the data attribute and changing the axes myself.

I think that my workflow is about the worst thing we can have our users doing as it undermines the whole persistence of metadata, axes etc. That is why I think supporting as much of numpy as reasonable is important as it stops much of this behavior.

The implementation is all in #3055 if you are interested in taking a look. The diff might be large because it includes this PR but I submitted it as a draft because I think it helps make sense of some of these changes.

I will also say that the opposite syntax makes more sense to me. i.e. velocity_axis.labels = {0: "v_x", 1: "v_y", 2: "v_z"} --> velocity_axis.labels = {"v_x":0, "v_y":1, "v_z":2}

I see your point.

it would even allow things like temp.labels["liquid_water_range"]=slice(0.,100.)

Nice idea. Maybe this should operate at the level of the Signal, because we are talking about ROIs and ROIs are classes that operate on Signals. Currently we can save markers but not ROIs. I can see the appeal of storing ROIs in a way that they are easy to use. What about:

> s["liquid_water_range"] # Get's the right ROI from `s.metadata` and returnts `roi(s)`.

I feel like this could be extremely useful in many cases! I think the general concept would be that labels would be an easy way to annotate and manipulate data without using ROI's.

We have to be careful not to implement several ways of performing the same thing. Your comment has made me realize that the labels dictionary is somehow an alternative to Point1D ROIs.

I think that the distinction is that the labels attribute is associated with some axis rather than with some signal where as an ROI is related or even independent of a signal. The Point1D is just another way of directly indexing the data but we support both cases. I would rather see this as a way to annotate some axis or provide a secondary view. Maybe another way to think about this is in terms of how we support conversions between units. This is just an extension of that to a custom mapping.

Ultimately, I just think something that worked like this:

inten = s.isig["Cu_K", "Ni_Ka", "Si_Ka"]
print(s.axes_manager.signal_axes[0].axis) # [8.040, 7.477, 1.74]
inten.plot(labels=True) # Add in labels to graph

Is very intuitive and uses the nice new capabilities that hyperspy has added with the non-uniform axis pretty effectively.

@CSSFrancis
Copy link
Member Author

@francisco-dlp The relevant change is here for the uniform axis.

if isinstance(_slice, (np.ndarray, list)):
    new_axis = self.convert_to_non_uniform_axis()
    return new_axis._slice_me(_slice)

In the slicing code then we just treat arrays and lists differently than tuples. numpy/numpy#9686 This is the relevant change to numpy whose syntax I really think we should follow as that is what many of our users understand.

I really don't think we want to diverge too much from this also because then we lose a lot of the functionality of a Signal being a duck array. (If numpy/ scipy ever figures that out 😃)

Currently something like s.isig[[1,2]] works but has a different behavior from numpy as it gets cast to a tuple. That seems like a recipe for confusion that is only going to get worse.

@CSSFrancis
Copy link
Member Author

@francisco-dlp did we ever fall either way on this? I can move some of the code downstream to pyxem, but I do want to get as much into hyperspy as I can. Let me know if there is any part of this that might need some more discussion/ information.

@francisco-dlp
Copy link
Member

Hi @CSSFrancis. Sorry to be slow to reply. I have some further thoughts on this and #3055, but I am too busy at the moment. I'll come back to you in the next few days.

@CSSFrancis
Copy link
Member Author

@francisco-dlp No worries! Whenever you get around to it is fine, thanks for taking your time to review this.

@francisco-dlp
Copy link
Member

francisco-dlp commented Nov 9, 2022

Thinking about it, I backtrack from my suggestion to use a dict to store axis labels. As you mention, that would provide an alternative way to annotate data, but I don't think that that's a good idea. To annotate data we already have markers. Therefore, according to the Zen of Python, we shouldn't provide an alternative. Beside, flexible axes labels cannot offer the level of customization that markers do. So, what about simply storing labels in a numpy array? This is still different from your current implementation that stores the labels in the axis array. I think that it is better to store them in their own array so that i) axis items remain numbers (no need to consider a different case) ii) slicing simply slices the labels array too if present (simpler implementation) iii) the placement of the axis labels in a plot can we adjusted by adjusting the axis values.

@CSSFrancis
Copy link
Member Author

CSSFrancis commented Nov 14, 2022

@francisco-dlp Thanks for the comments.

Thinking about it, I backtrack from my suggestion to use a dict to store axis labels. As you mention, that would provide an alternative way to annotate data, but I don't think that that's a good idea. To annotate data we already have markers. Therefore, according to the Zen of Python, we shouldn't provide an alternative.

I think that makes sense. Maybe this does bring up a point that our idea of ROI's and markers are pretty similar in some cases. Maybe we can start to think of markers as a collection of ROI's or even as a baseSignal with a collection of ROI's? I sometimes think that the makers code can be a little obtuse, especially when you want different markers as a function of navigation position. Maybe an alternative is to make a MarkerSignal1D or (ROISignal1D) which can either have no navigation dimensions and be applied to the full signal or the same number of navigation dimensions which means it iterates alongside the signal when plotted.
I can maybe move this discussion elsewhere.

Beside, flexible axes labels cannot offer the level of customization that markers do. So, what about simply storing labels in a numpy array? This is still different from your current implementation that stores the labels in the axis array. I think that it is better to store them in their own array so that i) axis items remain numbers (no need to consider a different case) ii) slicing simply slices the labels array too if present (simpler implementation) iii) the placement of the axis labels in a plot can we adjusted by adjusting the axis values.

I think this is a pretty good compromise and I should be able to implement this fairly easily. That being said I am slightly hesitant about if someone adjusts the axis value it is difficult to track the changes with the label and that could cause confusion.

For example how can we handle s.axes_manager[0].axis = np.array([1,2,4,5,6]). We would probably have to drop the labels and then throw a warning. That's not the worst thing but no exactly ideal.

@hakonanes
Copy link
Contributor

Pitching in as an extension developer on some points raised by @CSSFrancis. (Feel free to move this comment to another issue/discussion.)

Maybe we can start to think of markers as a collection of ROI's or even as a baseSignal with a collection of ROI's?

kikuchipy uses HyperSpy's markers actively to draw Kikuchi bands on patterns (see this figures in this tutorial). They are used as visual aid to confirm a correct indexing of electron backscatter diffraction (EBSD) patterns or a particular position of a pattern feature. I don't think this use fits the description of them as ROIs, at least not as mentioned here?

especially when you want different markers as a function of navigation position

The implementation in kikuchipy uses the same set of bands for all nav positions, but sets those not visible in a certain pattern to NaN. Each band's line coordinates vary per nav position. The current implementation of markers in HyperSpy fits this use perfectly. However, I have plans to make the marker coordinates calculated on-the-fly (lazily using Dask). Might be you've thought along these lines yourself, @CSSFrancis.

@CSSFrancis
Copy link
Member Author

CSSFrancis commented Nov 15, 2022

Pitching in as an extension developer on some points raised by @CSSFrancis. (Feel free to move this comment to another issue/discussion.)

Maybe we can start to think of markers as a collection of ROI's or even as a baseSignal with a collection of ROI's?

kikuchipy uses HyperSpy's markers actively to draw Kikuchi bands on patterns (see this figures in this tutorial). They are used as visual aid to confirm a correct indexing of electron backscatter diffraction (EBSD) patterns or a particular position of a pattern feature. I don't think this use fits the description of them as ROIs, at least not as mentioned here?

I see your point there. I was just commenting on some of the similarities between same markers and ROIs. There might even be some cases like creating a virtual dark field image from a set of ROI's where the line starts to get a little blurred. In that case we want the slicing capabilities of a ROI and the labeling and plotting capabilities of a marker. It's hard to say that something like a arrow or a text marker should be an ROI.

The way things are is good but this conversation sparked that thought.

especially when you want different markers as a function of navigation position

The implementation in kikuchipy uses the same set of bands for all nav positions, but sets those not visible in a certain pattern to NaN. Each band's line coordinates vary per nav position. The current implementation of markers in HyperSpy fits this use perfectly. However, I have plans to make the marker coordinates calculated on-the-fly (lazily using Dask). Might be you've thought along these lines yourself, @CSSFrancis.

Sorry that was kind of a throwaway line and I should have explained better. I also realize that this should probably be moved to a separate issue.

Maybe this isn't as straight forward as I thought :) I always considered having to create as many markers as you needed for the signal with the most markers and setting some markers to nan a bug and not a feature. In your case I guess I could go either way. I will say that the way we are currently doing it is slow.

When I want to make a set of markers that change with every navigation dimension I have to create a set of markers that is as long as the longest set of vectors(which can be very long) and add a bunch of np.nan markers. Once you add a bunch of these markers to a signal this slows down the plotting a lot. This is because each marker is plotted as a separate artist, so having 30+ artists really slows down Matplotlib and makes navigating a pain.

A better way would be to be able to define all the markers as a ragged array of vectors. Then just add one marker to the signal with the data equal to that ragged array.

But to that point when does this ragged array of vectors defining some set of markers start to look suspiciously like a ragged signal. We can even think of that set of makers as having a defined navigation dimension which can now be sliced alongside the signal or think about returning a set of markers from the map function. The idea of lazy markers is much easier to consider as well in this case as you don't need to know how many sets of markers you need.

@hakonanes
Copy link
Contributor

Thank you for elaborating. I agree with all your points made, and I'd say your experience with markers describes mine well. The creation of markers (geometrical simulations) in kikuchipy is currently using too much memory, so they cannot be added to a large dataset (> a couple GB) anyway. If we ever fix that, I hope to find time to look more into using ragged arrays as you say.

hyperspy/axes.py Outdated

def update_axis(self):
@property
def axis(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be not to use @cached_property and to do the cache manually. I can have a look at it.

Comment on lines 335 to +342
dcshape = dc.shape
for _axis in self.axes_manager._axes:
if _axis.index_in_array < len(dcshape):
_axis.size = int(dcshape[_axis.index_in_array])
if _axis.size != int(dcshape[_axis.index_in_array]):
if _axis.is_uniform:
_axis.size = int(dcshape[_axis.index_in_array])
else:
_logger.warning(f"Axis {_axis} converted to a UniformDataAxis")
_axis.convert_to_uniform_axis()
_axis.size = int(dcshape[_axis.index_in_array])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is about getting chunks and it is surprise that to do that it needs to edit the axis! Could it be a workaround for the data shape and the axis are out of sync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is related to the fact that the axis and the array shape can be out of sync. This causes some issues if they are out of sync and try to chunk the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BaseSignal.get_dimensions_from_data instead?

hyperspy/axes.py Outdated
else:
# the axis is not ordered
return None
except np.core._exceptions._UFuncNoLoopError: # subtracting array of objects/strings will fail
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using numpy private API: np.core._exceptions._UFuncNoLoopError is a subclass of TypeError:

Suggested change
except np.core._exceptions._UFuncNoLoopError: # subtracting array of objects/strings will fail
except TypeError: # subtracting array of objects/strings will fail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericpre Good point! I'll make the substitution

hyperspy/axes.py Outdated
self.__class__ = UniformDataAxis
self.__init__(**d, size=self.size, scale=scale, offset=self.low_value)
self.axes_manager = axes_manager
new_axis = UniformDataAxis(**d, size=size, scale=scale, offset=offset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with returning a new object is that the events connected to the old one will be lost.

@CSSFrancis
Copy link
Member Author

The problem with returning a new object is that the events connected to the old one will be lost.

@ericpre I'll make sure that this isn't the case...

Seems like the things we need to make sure are handled correctly

  1. Cases where non ordered nonUniform axes don't work (i.e. models) should raise errors.
  2. Cases where the data.shape != the axes_manager axes shapes. Should raise errors or... more ideally be automatically fixed
  3. Events and object propagation. Axes should maintain events when changing types. Axes in axes managers should be adjusted inplace

Any other things to check?

Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the changes with convert_to_uniform_axis, before remove_trait/add_trait were used to handle trait dynamically but why is it not possible anymore?

Comment on lines 335 to +342
dcshape = dc.shape
for _axis in self.axes_manager._axes:
if _axis.index_in_array < len(dcshape):
_axis.size = int(dcshape[_axis.index_in_array])
if _axis.size != int(dcshape[_axis.index_in_array]):
if _axis.is_uniform:
_axis.size = int(dcshape[_axis.index_in_array])
else:
_logger.warning(f"Axis {_axis} converted to a UniformDataAxis")
_axis.convert_to_uniform_axis()
_axis.size = int(dcshape[_axis.index_in_array])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BaseSignal.get_dimensions_from_data instead?

@@ -933,25 +969,20 @@ def __init__(self,
size=1,
is_binned=False,
**parameters):
super().__init__(
super(DataAxis, self).__init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to add DataAxis, self in the super call here, is it not old python 2.x syntax?

Comment on lines +2063 to +2136
@observe("_axes")
@observe(trait("_axes", notify=False).list_items())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@observe("_axes")
@observe(trait("_axes", notify=False).list_items())
# These two `observe` decorators are necessary to make sure that an item of the list is changed in place
@observe("_axes")
@observe(trait("_axes", notify=False).list_items())

@ericpre
Copy link
Member

ericpre commented Jan 13, 2024

Any other things to check?

Nothing that I can think of but even if something is broken after this PR, this is a good time to do it!

@ericpre
Copy link
Member

ericpre commented Feb 15, 2024

@CSSFrancis, are you happy if I take over this PR? I suspect you may be busy with others things! 😃

@CSSFrancis
Copy link
Member Author

@ericpre yea I kind of am trying to take a step back to finish up some pyxem stuff so if you pick this up I'd be very happy!

Let me know if you help with anything! And I can make some time to look over this again.

@ericpre ericpre mentioned this pull request Mar 2, 2024
@ericpre ericpre added this to the v2.1 milestone Apr 7, 2024
@ericpre ericpre force-pushed the PropertyAttributeAxisRefactor branch from cc21fbc to c89e8c5 Compare April 12, 2024 20:01
@ericpre ericpre modified the milestones: v2.1, v2.2 May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Protect axis attribute for uniform data axes
5 participants