Adding in a VectorSignal #2876

CSSFrancis · 2022-01-22T00:26:32Z

Description of the change

This change is motivated by the fact that the ragged signal is kind of lacking these days. #2856 Talked about this but also this has come up in other places pyxem/pyxem#789.

In a broader sense the find_peaks function should probably return a list of peak positions in pixel units, but should also have access to the calibration. My thought is that any dimension of vector should be accepted.

What I have implemented is a Vector signal class which is just an extension from a ragged signal except that it has a number of special VectorDataAxes for the signal_axes rather than not having any signal axes.

I also implemented a couple of methods for changing the vector into real units and transforming the navigation axis into vectors as well. The thought process behind this is because this allows the user to use the cluster() method to combine vectors along whatever dimensions they want. The idea here is that there are lots of tools from sklearn that label vectors which we could utilize to compare vectors along multiple dimensions.

Progress of the PR

Minimal example of the bug fix or the new feature

import hyperspy.api as hs
import numpy as np

v = np.random.random((7,2))
v2 = np.random.random((5,2))*10
a = np.array([v, v,v2,v])
vect = hs.signals.BaseSignal(a).T
vect.vector=True
vect.set_signal_type("vector")
print(vect)

<VectorSignal, title: , dimensions: (4|1, 1)>

import hyperspy.api as hs
import numpy as np

v = np.random.random((7,2))
v2 = np.random.random((5,2))*10
a = np.array([v, v,v2,v])
vect = hs.signals.BaseSignal(a).T
vect.vector=True
vect.set_signal_type("vector")
print(vect)

<VectorSignal, title: , dimensions: (4|1, 1)>

@pc494 @hakonanes @magnunor Any suggestions or features you are interested in would be helpful.

@ericpre any advice implementing a VectorDataAxis would be nice as well.

Long term my goal is to replace the diffraction_vectors class in pyxem with something that is more flexible and intuitive as well as something that can save all of the data.

codecov · 2022-01-22T00:40:02Z

Codecov Report

Merging #2876 (9a152d8) into RELEASE_next_minor (b625c5c) will increase coverage by 0.12%.
The diff coverage is 96.62%.

@@                  Coverage Diff                   @@
##           RELEASE_next_minor    #2876      +/-   ##
======================================================
+ Coverage               80.80%   80.92%   +0.12%     
======================================================
  Files                     209      210       +1     
  Lines                   32706    32950     +244     
  Branches                 7329     7405      +76     
======================================================
+ Hits                    26428    26665     +237     
- Misses                   4512     4515       +3     
- Partials                 1766     1770       +4

Impacted Files	Coverage Δ
hyperspy/_signals/signal2d.py	`81.17% <ø> (ø)`
hyperspy/signal.py	`77.17% <90.62%> (+0.16%)`	⬆️
hyperspy/_signals/vector_signal.py	`95.96% <95.96%> (ø)`
hyperspy/axes.py	`91.68% <98.46%> (+0.42%)`	⬆️
hyperspy/_signals/lazy.py	`92.00% <100.00%> (+0.17%)`	⬆️
hyperspy/drawing/_markers/point.py	`100.00% <100.00%> (ø)`
hyperspy/drawing/marker.py	`89.20% <100.00%> (+0.15%)`	⬆️
hyperspy/io.py	`86.95% <100.00%> (+0.13%)`	⬆️
hyperspy/misc/slicing.py	`87.30% <100.00%> (+2.10%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

hakonanes · 2022-01-23T12:47:44Z

Thanks for pinging me, @CSSFrancis. I'm interested in this class, although I don't have any use for it myself at the moment.

The thought process behind this is because this allows the user to use the cluster() method to combine vectors along whatever dimensions they want. The idea here is that there are lots of tools from sklearn that label vectors which we could utilize to compare vectors along multiple dimensions.

Instead of diffraction spots we have band positions and zone axes in EBSD patterns. In kikuchipy's geometrical simulations, which are used to project band centers and zone axes as HyperSpy markers to plot on top of experimental patterns for visual inspection, we allow for varying numbers of bands (and zone axes) per navigation position. Band positions for the whole 4D data set are stored in an array of fixed shape (n rows, n columns, n unique bands, detector row coordinate, detector column coordinate), and coordinates of bands not present in a pattern are set to NaN. It looks like the VectorSignal class could replace this array. I don't know how clustering could be used since we already have orientations to cluster if patterns are indexed. Perhaps band clustering (detected via the Hough transform) could aid in Hough indexing in some way.

magnunor · 2022-01-24T09:40:11Z

This is really nice! As I mentioned in pyxem/pyxem#789 (comment), better handling of vector data would be a big improvement for a range of applications.

With regards to this pull request, I think keeping it within a reasonable size is a good idea. With this, I mean adding the basics, where the more advanced features can be added in future pull requests. Code review of very large pull requests is quite difficult.

For functionality, a basic set could be:

Some type of VectorDataAxis
A plotting function, at least for 2D vectors

I agree with any vector dimensionality should be possible, but I think the 2D-kind will be the most applicable. Both for pyxem diffraction spots, and potentially atomap atomic column positions.

So I'm not sure exactly how best to structure this, maybe similarly as the current BaseSignal, Signal1D, Signal2D structure? With a BaseVectorSignal, VectorSignal2D?

CSSFrancis · 2022-01-24T15:39:58Z

Instead of diffraction spots we have band positions and zone axes in EBSD patterns. In kikuchipy's geometrical simulations, which are used to project band centers and zone axes as HyperSpy markers to plot on top of experimental patterns for visual inspection, we allow for varying numbers of bands (and zone axes) per navigation position. Band positions for the whole 4D data set are stored in an array of fixed shape (n rows, n columns, n unique bands, detector row coordinate, detector column coordinate), and coordinates of bands not present in a pattern are set to NaN. It looks like the VectorSignal class could replace this array. I don't know how clustering could be used since we already have orientations to cluster if patterns are indexed. Perhaps band clustering (detected via the Hough transform) could aid in Hough indexing in some way.

@hakonanes I would hope that this is used more as an easy way for positions in a pixel-wise image to be converted to real units. I'm still working on the saving aspect as well, but that would be another potential application where you could easily save and reload analysis. The idea there being the ability to publish you results in a fashion that could be interactive. The clustering might be interesting if you wanted to cluster in in both reciprocal space and real space. It also just gives many different options to try with little overhead which I think is important.

@magnunor

With regards to this pull request, I think keeping it within a reasonable size is a good idea. With this, I mean adding the basics, where the more advanced features can be added in future pull requests. Code review of very large pull requests is quite difficult.

That is a good reminder before I expand this too much. I'll try to keep it pretty basic.

For functionality, a basic set could be:
* Some type of `VectorDataAxis`

* A plotting function, at least for 2D vectors

I'm not sure about how to deal with the plotting. I have implemented a to_marker function that works pretty well. The only problem is that you have to plot it on a signal. I could probably just make a plot function though as well, it might just take me a while to figure out how hyperspy plots signals.

I agree with any vector dimensionality should be possible, but I think the 2D-kind will be the most applicable. Both for pyxem diffraction spots, and potentially atomap atomic column positions.

I agree with you to a certain extent, I should probably create a BaseVectorSignal class and then a VectorSignal2D. I'm kind of struggling with how something like a 4 dimensional vector (x, y, kx, ky) or a ?5 dimensional vector?(group, x, y, kx, ky) should be represented. right now I have it as a 4-D signal axes. This probably works the best because it doesn't mess with the map_reduce behavior in hyperspy. That being said I realize it might be a little confusing and I could do something like allow navigation axes to be vector axes, but I would rather not do something like that.

As to why I think that the 4-D vector is important:

At a higher level I think that in a 4 dimensional datasets we are limiting ourselves by operating on each (x,y) position separately. There are lots of ML methods which cluster, filter and work very well on high dimensional vectors. I think that our community has been rather slow to implement these because we are stuck working on each diffraction pattern separately. We have lots of redundancies in our data that aren't exploited. That is initially what motivated this work in the first part. As data gets larger and larger, identifying a reduced set of diffraction features, peak positions etc, that can be analyzed becomes very powerful.

CSSFrancis · 2022-02-26T05:56:46Z

I was working on this tonight and was wondering how people feel about slicing vectors? I got it working so you can use .isig to select only vectors in a certain area. I think this would be an easy way to only look at some of the signal vectors. This code needs to be streamlined a little but this is what I have now.

import numpy as np
import hyperspy.api as hs

data = np.array([np.sin(np.linspace(0,i*np.pi,500)) for i in range(4,10)])
data = np.array([np.multiply(d[:,np.newaxis], d[np.newaxis,:])for d in data])
s =hs.signals.Signal2D(data)

peaks = s.find_peaks(method="local_max",interactive=False)
peaks = BaseVectorSignal(peaks)
peaks.vector=True

print(peaks)  # <BaseVectorSignal, title: , dimensions: (6|Vect , Vect)>

markers = peaks.to_markers()
s.plot()
s.add_marker(markers)

And then after slicing the vectors:

markers = peaks.isig[100:,100:].to_markers()
s.plot()
s.add_marker(markers)

This slicing should still work with real units etc.

I am still working on a better way to just plot the vectors without a signal to plot them on. I have thought about just kind of making a hacky signal with a signal dim of 1 and then plotting the makers on that. That might be easier than creating a whole other plotting tools for dealing with vectors.

magnunor · 2022-03-21T17:04:10Z

I'm not sure about how to deal with the plotting. I have implemented a to_marker function that works pretty well. The only problem is that you have to plot it on a signal. I could probably just make a plot function though as well, it might just take me a while to figure out how hyperspy plots signals.

The to_marker function is a good idea! And having a plot function which just plots the marker should be possible as well, however the plotting code can be a bit difficult to approach, as there are many "levels" of different objects calling each other.

I was working on this tonight and was wondering how people feel about slicing vectors? I got it working so you can use .isig to select only vectors in a certain area. I think this would be an easy way to only look at some of the signal vectors.

I think that sounds really good and useful.

CSSFrancis · 2022-03-22T03:43:29Z

The to_marker function is a good idea! And having a plot function which just plots the marker should be possible as well, however the plotting code can be a bit difficult to approach, as there are many "levels" of different objects calling each other.

So my quick fix to plotting a 4D Diffraction vector would be to make a signal that is [Navshape|1,1] and then just set the scale, offset, etc so that when you plot the markers on the signal it would display them like it would on a normal signal.

There is also the option of messing with the plotting code, but like you said it is a little more difficult. I would have to look at that a little more to get it to work.

I was working on this tonight and was wondering how people feel about slicing vectors? I got it working so you can use .isig to select only vectors in a certain area. I think this would be an easy way to only look at some of the signal vectors.

I think that sounds really good and useful.

The slicing works pretty well. I might consider adding in support for ROI's and the such. The idea being to support most of the things that are currently in pyxem.

This would be excluding edges filter_detector_edge --> replaced with slicing
Focusing on some area --> replaced with slicing
Filtering by some magnitude filter_magnitude --> replaced with slicing using an roi

magnunor · 2022-03-22T16:16:01Z

With regards to the plot function, I can have a look at that at some point. Probably after this current release. I have done some previous work with the hyperspy plotting code, which should be useful.

CSSFrancis · 2022-03-22T17:29:46Z

With regards to the plot function, I can have a look at that at some point. Probably after this current release. I have done some previous work with the hyperspy plotting code, which should be useful.

@magnunor That would be very helpful. I think that this PR is pretty solidly in the Next version release after 1.7 just considering that should go out sooner rather than later and this PR still isn't where I would like it to be. Everything seems to work fairly well at the moment but it will take some additional thought as to how we should handle the AxesManager especially considering #2830 which I think should be helpful.

My goal was to kind of work on this in parallel to some changes in pyxem over the next couple of months. After the 1.7 release we could consider making a separate branch specifically related to the BaseVector in hyperspy and the DiffractionVector class in pyxem just so there is a central place to submit smaller PRs to.

magnunor · 2022-03-22T17:42:22Z

Yeah, I agree. Getting the basics right in the initial implementation is going to save us a lot of hassle later, since a lot of functionality is going to be built on top of this.

For example: this functionality would be really nice for a reworking of Atomap. But there, the vectors signal would not need a navigation dimension. However, for pyxem, with scanning diffraction, having navigation dimensions for all the vector signals would probably be the best.

CSSFrancis · 2022-05-19T15:47:57Z

@ericpre is there a reason that the ragged property for a signal needs to be set rather than making the ragged property more of a strict type checking. If the dtype for the array is and object then the array has to be ragged.

ericpre · 2022-05-22T15:36:19Z

@ericpre is there a reason that the ragged property for a signal needs to be set rather than making the ragged property more of a strict type checking. If the dtype for the array is and object then the array has to be ragged.

See #2944 (comment).

CSSFrancis · 2022-08-19T19:23:30Z

If anyone has time to review this that would be much appreciated. I tried to change as little as possible to how hyperspy functions with this and the overall code changes should be rather small. This is probably a fairly stable base that we can continue to work with down stream at pyxem. I'm hoping that will help to find any bugs before the 1.8 release.

I know that @magnunor you had some interest in the past so any input you have would be appricated.

There are a lot of places in hyperspy where we could consider returning vectors (any of the find peaks methods etc.) I think that should be a seperate PR however. While I don't think it should break anyones workflows there are a couple of differences which could be problematic. I would suggest something like a kwarg return_vectors added to start and then go from there.

CSSFrancis · 2022-08-19T21:28:36Z

Failing tests are related to equality in the traits package. ax.size == t.Undefined is evaluating to False instead of True for the VectorDataAxis on some of the test instances. Does anyone know of a different function to test this using the traits package or I cn change ax.size=None for a VectorDataAxis

…vectors

NewFeature: Added in index in vector axis property to `VectorDataAxis` class

…i.Undefined

…l class to vector_signal.rst

ericpre

Thanks @CSSFrancis for making good progress on this!

VectorDataAxis is a misleading name because the only specificity of this axis type is that the size is variable for different navigation position, as in the case of ragged signal. It is used for vector signal but intrinsically it is not a "vector". Actually, would it be more accurate to rename VectorSignal to VariableSizeSignal, as it is more generic?

I suspect that it should be possible to structure the relationship between different type of axis more consistently. Here is a suggestion:

class BaseDataAxis:
   # size is not defined

class VariableSizeUniformDataAxis(BaseDataAxis):
   # construct from offset and scale
   # size is not defined

class DataAxis(BaseDataAxis):
   # construct from an array
   # size is defined

class FunctionalDataAxis(BaseDataAxis):
   # construct from an expression
   # size is defined

class UniformDataAxis(BaseDataAxis):
   # construct from an expression
   # size is defined

Then we could make ragged and vector signal more consistent and defined in the AxesManager:

a vector signal has one or several VariableSizeUniformDataAxis signal axis, all signal_axes must have the same size at each navigation position
a ragged signal has a single VariableSizeUniformDataAxis signal axis

I would expect that if we improve the structure and the consistency of how we defined the axis and signals, the implement of depending features would be more simple.

I didn't review the slicing and drawing code, because I think that we need to focus on the relationship between axes, axes_manager and signal first. Would it make sense to make another PR with only change related to the latter and then we can rebase/cherry pick the changes of this PR?

ericpre · 2022-09-04T10:32:39Z

doc/user_guide/vector_signal.rst

+is organized as a collection of vectors of some length. These arrays are ragged, such
+that they can have arbitrary length but constant width.  Vectors in hyperspy usually
+represent pixel positions related to some Signal.  Like Signals, they can have signal dimensions
+return


Something is missing here! :)

ericpre · 2022-09-04T10:35:36Z

doc/user_guide/vector_signal.rst

+The :py:method:`~._signals.vector_signal.BaseVectorSignal.cluster` is designed to take
+some `ClusterMixin <https://scikit-learn.org/stable/modules/generated/sklearn.base.ClusterMixin.html#sklearn.base.ClusterMixin>`


Use intersphinx, something along the line of (need checking):

Suggested change

The :py:method:`~._signals.vector_signal.BaseVectorSignal.cluster` is designed to take

some `ClusterMixin <https://scikit-learn.org/stable/modules/generated/sklearn.base.ClusterMixin.html#sklearn.base.ClusterMixin>`

The :py:meth:`~._signals.vector_signal.BaseVectorSignal.cluster` is designed to take

some :py:class:`sklearn.base.ClusterMixin`

ericpre · 2022-09-04T10:36:53Z

doc/user_guide/vector_signal.rst

+
+The :py:method:`~._signals.vector_signal.BaseVectorSignal.cluster` is designed to take
+some `ClusterMixin <https://scikit-learn.org/stable/modules/generated/sklearn.base.ClusterMixin.html#sklearn.base.ClusterMixin>`
+object which has a method `fit_predict` which returns a set of labels. More information on the


Suggested change

object which has a method `fit_predict` which returns a set of labels. More information on the

object which has a method :py:meth:`sklearn.base.ClusterMixin.fit_predict` which returns a set of labels. More information on the

ericpre · 2022-09-04T10:44:58Z

doc/user_guide/vector_signal.rst

+
+The Vector class is a new class added in version 2.0 which is designed to store data which
+is organized as a collection of vectors of some length. These arrays are ragged, such
+that they can have arbitrary length but constant width.  Vectors in hyperspy usually


It may be worth covering in more details the difference between ragged and vector in its own section, as this can be easily confusing! A cartoon showing the possible navigation / signal dimensions may be good here.

ericpre · 2022-09-04T10:57:44Z

hyperspy/io.py

+    # Add missing vector
+    for s in signals:
+        if "vector" not in signals[s]:
+            signals[s]["vector"]=False


Since this is going to be release with 2.0, I should suggest to remove this and update signals specifications.

ericpre · 2022-09-04T11:32:31Z

hyperspy/tests/signals/test_vector_signal.py

+        x = da.empty(shape=(4), dtype=object)
+        for i in np.ndindex(x.shape):
+            x[i] = da.random.random((6, 4))
+        s = L(x).T


What is L? This fixture doesn't seem to be used, but it would add test for lazy signal!

ericpre · 2022-09-04T13:57:35Z

hyperspy/_signals/lazy.py

@@ -367,7 +370,7 @@ def _get_dask_chunks(self, axis=None, dtype=None):
        dc = self.data
        dcshape = dc.shape
        for _axis in self.axes_manager._axes:
-            if _axis.index_in_array < len(dcshape):
+            if _axis.index_in_array != () and _axis.index_in_array < len(dcshape):


Suggested change

if _axis.index_in_array != () and _axis.index_in_array < len(dcshape):

if not isinstance(_axis, VectorDataAxis) and _axis.index_in_array < len(dcshape):

ericpre · 2022-09-04T13:58:28Z

hyperspy/_signals/lazy.py

@@ -382,7 +385,8 @@ def _get_dask_chunks(self, axis=None, dtype=None):
        elif not isinstance(dtype, np.dtype):
            dtype = np.dtype(dtype)
        typesize = max(dtype.itemsize, dc.dtype.itemsize)
-        want_to_keep = multiply([ax.size for ax in need_axes]) * typesize
+        want_to_keep = multiply([ax.size for ax in need_axes
+                                 if ax.size != Undefined and ax.size > 0]) * typesize


Suggested change

if ax.size != Undefined and ax.size > 0]) * typesize

if not isinstance(ax, VectorDataAxis)]) * typesize

ericpre · 2022-09-04T13:58:52Z

hyperspy/_signals/lazy.py

@@ -449,7 +453,7 @@ def get_chunk_size(self, axes=None):
        if not np.iterable(axes):
            axes = (axes,)

-        axes = tuple([axis.index_in_array for axis in axes])
+        axes = tuple([axis.index_in_array for axis in axes if axis.index_in_array is not ()])


Suggested change

axes = tuple([axis.index_in_array for axis in axes if axis.index_in_array is not ()])

axes = tuple([axis.index_in_array for axis in axes if not isinstance(axis, VectorDataAxis)])

ericpre · 2022-09-04T14:03:01Z

hyperspy/axes.py

+        # These traits need to added dynamically to be removed when necessary
+        self.update_axis()
+        self.size = -1
+        self.vector = True


Don't think, we need to keep it?

Suggested change

self.vector = True

CSSFrancis · 2022-09-05T12:46:13Z

@ericpre that sounds like a good idea. I can resubmit a PR with just the changes to the Axes, axes manager and the signal. I'll see what I can do tonight and tomorrow. I'll try to reply to some of the other comments then.

Defining a VariableSizeUniformAxis should probably be done by defining the index in the array and then the column in the array. The difference between a vector and a ragged signal being that vectors always have the same number of columns.

I've gone back and forth on how to represent these columns. While each column usually maps to some axis in a signal, it's more of a pandas representation than numpy. Maybe each DataAxiscan have a columns attribute which can be defined as an empty list in most cases or can be defined by UniformColumn objects.

I'll mention one last thing because I've been thinking about it a fair amount. This is the idea that vectors shouldn't be ragged. I.e. for a 4d stem dataset you should have a n x 4 set of (x,y,kx,ky) vectors. Then you just create different ragged views of the data based on labels, positions, or any grouping. In this case we can still utilize lots of the vector speed ups from numpy, and slicing is much much faster. I think this blog post describes this approach fairly well. (This is still probably something to think about later as I think we will still have the same problems)

CSSFrancis · 2022-09-05T13:18:29Z

Maybe it's best to start with the use cases:

1: Ragged signal of vectors with constant number of columns. (Think something like the results from find_peaks

2: Non-ragged signal with specific column information. (Think like a center of mass calculation with an x and a y column)

3: Pure list of vectors. This is closest to a pandas.DataFrame where the data is represented as a 2d array.

There are additional questions like can you have UniformColumn objects for a axes which is a navigation axis.

CSSFrancis · 2022-09-08T21:25:37Z

@ericpre I've looked over this a little bit more and I think that the best course of action is to create a new class ColumnAxis. The idea being that an axis can be described by a list of ColumnAxis objects. This is similar to the idea of a non-linear axis where DataAxis.axis is manually set, but the array is a list of ColumnAxis Objects.

The previous implementation is kind of a hacky way to treat columns of some dataset as axes. While that works well from the standpoint of consistent methods for operating on vectors and signals I think it limits the functionality of labeled column values.

I'll start with my idea for a treatment of ragged axes and go from there:

As far as ragged axes go. I think it makes sense that every type of axis can be cast to a raggedAxis. The idea there is that ragged axes define a consistent array inside of another array. So for some axes_manager which is ragged the default it to assume that there is no underlying structure to the ragged array.

In the case of a different sized 2D array at every navigation position this can be defined as two RaggedUniformDataAxis objects. The index_in_array is undefined and the attribute index_in_ragged_array is set. I've denoted ragged arrays with a (r) but we could also consider something like making them italic etc.

            Name |   size |  index |  offset |   scale |  units 
================ | ====== | ====== | ======= | ======= | ====== 
       x         |     10 |      0 |       0 |       1 | "nm"
       y         |     10 |      0 |       0 |       1 | "nm" 
---------------- | ------ | ------ | ------- | ------- | ------ 
    raggedAx(r)  |      - |      0 |       0 |       1 | <undefined> 
    raggedAx1(r) |      - |      0 |      0  |       1 | <undefined>

class BaseDataAxis:
   # size is not defined
   # ragged attribute (if ragged than self.index_in_array == None, self.index_in_ragged_array != None)

class DataAxis(BaseDataAxis):
   # construct from an array (can be an array of ColumnAxes)
   # size is defined

class FunctionalDataAxis(BaseDataAxis):
   # construct from an expression
   # size is defined

class UniformDataAxis(BaseDataAxis):
   # construct from an expression
   # size can be defined

class ColumnAxis()
    # size not defined
    # scale defined
    # name defined
    # offset defined

Lets try some more examples but with Labeled Columns

Case 1:

This case and case 3 are just extensions of each other. In the simplest case lets say you have some center of mass data:

            Name |   size |  index |  offset |   scale |  units 
================ | ====== | ====== | ======= | ======= | ====== 
       x         |     10 |      0 |       0 |       1 | "nm"
       y         |     10 |      0 |       0 |       1 | "nm" 
---------------- | ------ | ------ | ------- | ------- | ------ 
     Centers     |      4 |      0 |       - |       - | <undefined> 
         |- cx   |     -  |      - |       0 |      .1 |  "nm^-1" 
         L  cy   |     -  |      - |       0 |      .1 |  "nm^-1"

Then if you wanted all of the "cx" data you could go:

s.isig["cx"]  # or s.isig[0]

Case 2: For a ragged set of vectors then we could imagine something like this:

            Name |   size |  index |  offset |   scale |  units 
================ | ====== | ====== | ======= | ======= | ====== 
       x         |     10 |      0 |       0 |       1 | "nm"
       y         |     10 |      0 |       0 |       1 | "nm" 
---------------- | ------ | ------ | ------- | ------- | ------ 
    RaggedAxis(r)|      2 |      0 |       - |       - | <undefined> 
         |- kx   |     -  |     -  |       0 |       1 | "nm^-1" 
         L  ky   |     -  |     -  |       0 |       1 | "nm^-1" 
  RaggedIndex(r) |     -  |      0 |       - |       1 | <undefined>

In this case we have ragged signal dimensions. This makes things like slicing or indexing a little more difficult. We could still do the same thing but this requires us to go through every array using a for loop which isn't that fast.

s.isig["kx"]  # or s.isig[0]

Case 3:

This is just an extension of case1 where an additional "Index" axis is added.

Lets start with something like a non ragged signal of vectors (x, y, kx and ky). In this case we have a two dimensional array (nx4) when n is the number of vectors(lets just say 10 in this example) In this case we would have an axes manager that looks like this

            Name |   size |  index |  offset |   scale |  units 
================ | ====== | ====== | ======= | ======= | ====== 
---------------- | ------ | ------ | ------- | ------- | ------ 
     vectorAxis  |      4 |      0 |       - |       - | <undefined> 
         |- x    |     -  |     -  |       0 |       1 | "nm"
         |- y    |     -  |     -  |       0 |       1 | "nm" 
         |- kx   |     -  |     -  |       0 |       1 | "nm^-1" 
         L  ky   |     -  |     -  |       0 |       1 | "nm^-1" 
     Index       |     10  |      0 |       1 |       1 | <undefined>

Then if we wanted to index the array we could use the following syntax

s.isig["x"] # get just the x column
s.isig["x", :2] # get just the x column and first 2 vectors

Of course the functionality that we are probably more interested in is something like this:

s.isig[:, s.isig["x"]<10] # bool Indexing

But maybe that deserves some more discussion.

ericpre · 2022-09-10T10:59:40Z

@CSSFrancis, please comment or correct me:

case 1

Can you specify what do you mean with center of mass? Are you talking about severals center of mass per navigation index?

case 2

with the suggested representation of RaggedAxis(r), you mean that both axis have the same lengths at each navigation index
I don't understand what is the RaggedIndex(r) axis

case 3

Same a previous, I don't think that I understand what you mean.
if this is not ragged, it should already be possible with any change?
in case of ragged array, by vector signal, you mean that the start coordinates (x, y) and the end coordinate (kx, ky) of the vectors are captured in the signal space?
similar as for case 2, I don't understand what is the Index is

CSSFrancis · 2022-09-10T12:23:26Z

@ericpre Sorry I think this was maybe overly confusing. I was still trying to figure things out.

Basically I tried to figure out the non ragged case first with the idea and then apply that same idea to ragged signals. Specifically ragged signals with the same dimensions in the sub array but not necessarily the same shape

@CSSFrancis, please comment or correct me:

case 1

Can you specify what do you mean with center of mass? Are you talking about severals center of mass per navigation index?

In this case no that is not a ragged axis. That is just an axis with a set of labeled columns. This can be implemented similar to how we handle nonlinear axes. In this case the scale and the offset operate on the array values and no the index. The array shape would be something like (10,10,2) as you said.

case 2

with the suggested representation of RaggedAxis(r), you mean that both axis have the same lengths at each navigation

I don't understand what is the RaggedIndex(r) axis

This is the only truly ragged case. In this case there are 4 axes 2 in the array and always 2 in the sub array. I define the two in the sub array as being ragged. That is kind of confusing because like you said they don't have to be ragged. So maybe something like called them subarray axes makes more sense.

One of the axes has a constant size while the index can be any size. The one with constant size we can label the columns and calibrate it that way.

case 3

Same a previous, I don't think that I understand what you mean.

if this is not ragged, it should already be possible with any change?

in case of ragged array, by vector signal, you mean that the start coordinates (x, y) and the end coordinate (kx, ky) of the vectors are captured in the signal space?

similar as for case 2, I don't understand what is the Index is

Like you said this is already fairly possible. The only change necessary is to make the non linear axis accept anything and not be constrained to strictly increasing axes/not care if the indexes aren't values.

I set this example similar to how pandas deals with data. The idea being that for a 2 d array you have two axes. One which is labeled columns and the other which is and index. Theoretically you could actually import as pandas dataframe into hyperspy with this change and it would look similar to this.

I'm trying to get some working code examples up and maybe that will make things less confusing.

The fundamental changes necessary for this are small, some of the axes functions need to be rearranged to allow for a nonlinear axis to accept any values in the axis parameter but after that things should work well in the non ragged case.

For the ragged case there is no reason those axes shouldn't look the same but describe a constant sub array

CSSFrancis marked this pull request as draft January 24, 2022 20:27

CSSFrancis force-pushed the VectorSignal branch 2 times, most recently from 8f0ea60 to 521acfb Compare March 14, 2022 21:48

magnunor mentioned this pull request Mar 21, 2022

Removing (some) Generators From PyXEM pyxem/pyxem#776

Open

magnunor mentioned this pull request Mar 22, 2022

Ragged BaseSignal navigation shape #2907

Open

CSSFrancis mentioned this pull request Apr 14, 2022

Protect axes manager attribute #2830

Merged

8 tasks

CSSFrancis force-pushed the VectorSignal branch from 102ae75 to ad31374 Compare May 19, 2022 15:38

CSSFrancis mentioned this pull request May 19, 2022

Reference Signal in Axis Class #2944

Open

CSSFrancis mentioned this pull request Aug 8, 2022

pyxem-1.0.0 pyxem/pyxem#624

Open

CSSFrancis force-pushed the VectorSignal branch from ff66e1f to 2dfcf89 Compare August 10, 2022 20:06

CSSFrancis mentioned this pull request Aug 17, 2022

Diffraction Vectors Overhaul pyxem/pyxem#872

Open

21 tasks

CSSFrancis marked this pull request as ready for review August 19, 2022 19:14

ericpre added the status: needs review label Aug 25, 2022

ericpre mentioned this pull request Aug 25, 2022

Release 1.7.x and 2.0.0 #2996

Closed

57 tasks

jlaehne added the release: next minor label Aug 28, 2022

NewSignal: Added in Vector class for keeping track of real units for …

da6210d

…vectors

CSSFrancis added 16 commits August 30, 2022 14:09

fixed some zero dimensional edge cases

b056902

Refactor: Added more test coverage and small documentation changes

f8ca399

Testing: Testing converting to real units

15bdd55

Testing: Added initialization test for VectorDataAxis

a519c42

Testing: Fixed slicing with float values

8e38de8

Bugfix: Vectors slice using the same notation as signals.

86e35e2

NewFeature: Added in index in vector axis property to `VectorDataAxis` class

Refactor: Cleaned up some code

84e4bff

Testing: Increased coverage

69ac6b6

Bug Fix: Improved clustering workflow and coverage

14869bc

Bug Fix: Improved have vector signal handles axes

60e044a

Testing: Fixing failing tests when casting to lazy

a1ec207

BugFix: Fixed Signal Initialization, and deepcopying

7bf8f0d

Revert:Revert Black

cc573a9

BugFix:Changed Equality for inequality. Still problems with traits.ap…

9357685

…i.Undefined

BugFix:Set size of vector axis from Undefined to -1

57b5e8b

Testing: Increased test coverage

94fe3ab

CSSFrancis force-pushed the VectorSignal branch from b15a9c5 to 94fe3ab Compare August 30, 2022 19:10

CSSFrancis added 3 commits August 30, 2022 14:27

Testing: Skipped tests requiring sklearn

3cb9869

BugFix: To_marker working for 0 dim navigation axes

fb95bf3

Documentation: Added additional information about the BaseVectorSigna…

9a152d8

…l class to vector_signal.rst

ericpre reviewed Sep 4, 2022

View reviewed changes

ericpre mentioned this pull request Sep 10, 2022

Consistant Implmentation of Axes and Slicing #3011

Open

CSSFrancis mentioned this pull request Sep 23, 2022

Handling Axis attributes with @property decorator #3031

Open

3 tasks

CSSFrancis mentioned this pull request Oct 24, 2022

Support slicing with Boolean Arrays #3051

Open

CSSFrancis mentioned this pull request Nov 21, 2023

Hyperspy 2.1.0 Release Tracker #3272

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding in a VectorSignal #2876

Adding in a VectorSignal #2876

CSSFrancis commented Jan 22, 2022 •

edited

codecov bot commented Jan 22, 2022 •

edited

hakonanes commented Jan 23, 2022

magnunor commented Jan 24, 2022

CSSFrancis commented Jan 24, 2022

CSSFrancis commented Feb 26, 2022

magnunor commented Mar 21, 2022

CSSFrancis commented Mar 22, 2022

magnunor commented Mar 22, 2022

CSSFrancis commented Mar 22, 2022

magnunor commented Mar 22, 2022

CSSFrancis commented May 19, 2022

ericpre commented May 22, 2022

CSSFrancis commented Aug 19, 2022

CSSFrancis commented Aug 19, 2022

ericpre left a comment

ericpre Sep 4, 2022

ericpre Sep 4, 2022

ericpre Sep 4, 2022

ericpre Sep 4, 2022

ericpre Sep 4, 2022

ericpre Sep 4, 2022

ericpre Sep 4, 2022

ericpre Sep 4, 2022

ericpre Sep 4, 2022

ericpre Sep 4, 2022

CSSFrancis commented Sep 5, 2022 •

edited

CSSFrancis commented Sep 5, 2022

CSSFrancis commented Sep 8, 2022 •

edited

ericpre commented Sep 10, 2022

CSSFrancis commented Sep 10, 2022

case 1

case 2

case 3

		The :py:method:`~._signals.vector_signal.BaseVectorSignal.cluster` is designed to take
		some `ClusterMixin <https://scikit-learn.org/stable/modules/generated/sklearn.base.ClusterMixin.html#sklearn.base.ClusterMixin>`

	object which has a method `fit_predict` which returns a set of labels. More information on the
	object which has a method :py:meth:`sklearn.base.ClusterMixin.fit_predict` which returns a set of labels. More information on the

	if _axis.index_in_array != () and _axis.index_in_array < len(dcshape):
	if not isinstance(_axis, VectorDataAxis) and _axis.index_in_array < len(dcshape):

	if ax.size != Undefined and ax.size > 0]) * typesize
	if not isinstance(ax, VectorDataAxis)]) * typesize

	axes = tuple([axis.index_in_array for axis in axes if axis.index_in_array is not ()])
	axes = tuple([axis.index_in_array for axis in axes if not isinstance(axis, VectorDataAxis)])

Adding in a VectorSignal #2876

Are you sure you want to change the base?

Adding in a VectorSignal #2876

Conversation

CSSFrancis commented Jan 22, 2022 • edited

Description of the change

Progress of the PR

Minimal example of the bug fix or the new feature

codecov bot commented Jan 22, 2022 • edited

Codecov Report

hakonanes commented Jan 23, 2022

magnunor commented Jan 24, 2022

CSSFrancis commented Jan 24, 2022

CSSFrancis commented Feb 26, 2022

magnunor commented Mar 21, 2022

CSSFrancis commented Mar 22, 2022

magnunor commented Mar 22, 2022

CSSFrancis commented Mar 22, 2022

magnunor commented Mar 22, 2022

CSSFrancis commented May 19, 2022

ericpre commented May 22, 2022

CSSFrancis commented Aug 19, 2022

CSSFrancis commented Aug 19, 2022

ericpre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CSSFrancis commented Sep 5, 2022 • edited

CSSFrancis commented Sep 5, 2022

CSSFrancis commented Sep 8, 2022 • edited

I'll start with my idea for a treatment of ragged axes and go from there:

Lets try some more examples but with Labeled Columns

ericpre commented Sep 10, 2022

case 1

case 2

case 3

CSSFrancis commented Sep 10, 2022

case 1

case 2

case 3

CSSFrancis commented Jan 22, 2022 •

edited

codecov bot commented Jan 22, 2022 •

edited

CSSFrancis commented Sep 5, 2022 •

edited

CSSFrancis commented Sep 8, 2022 •

edited