Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ClusterArray class to import any kind of microstates maps #118

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

RunKZhang
Copy link

@RunKZhang RunKZhang commented Aug 2, 2023

Hello, this is the new feature I described in Issues, and I open a pull requests here. This is a draft of the thoughts, and not heavily tested.

Closes #117

@mscheltienne
Copy link
Collaborator

mscheltienne commented Aug 2, 2023

I think that addition to the API would be very welcome. My initial impression is that it should mimic the ClassArray API from MNE-Python. Examples:

In all 5 cases, the class signature is ClassArray(data, info, optional_arguments). Thus, I would define a ClusterArray class which would ingest fitted data.

class ClusterArray(_BaseCluster):

    def __init__(
        data: NDArray[float], 
        info: Union[Info, ChInfo], 
        cluster_names: Optional[List[str]] = None, 
        fitted_data: Optional[NDArray[float]] = None,
        labels: Optional[NDArray[int]] = None,
    ):

Where data would be the cluster_centers_ (named data to match MNE-Python's API). The number of clusters n_clusters can be inferred from data. Note that I've set cluster_names, fitted_data, and labels as Optional. I don't think they are key elements that we require to run other methods/functions from the pycrostates API. They are "nice-to-have" variables, but not mandatory. I could be wrong however, I would have to check (especially for labels).

WDYT @RunKZhang ?

@mscheltienne
Copy link
Collaborator

@RunKZhang I fixed my proposed class signature in the previous post, I defined it as a function for some reason 🙈

@RunKZhang
Copy link
Author

I agree with you for the ClassArray, and data could be cluster_centers_. But I think labels should be mandatory, since it is essential to calculate gev_.

@RunKZhang
Copy link
Author

Sorry I don't get why define it as a function. Since the predict method helps make segmentation of the raw object, and it is inherited from _BaseCluster, I think define it as a class is preferable.

@mscheltienne
Copy link
Collaborator

Sorry I don't get why define it as a function. Since the predict method helps make segmentation of the raw object, and it is inherited from _BaseCluster, I think define it as a class is preferable.

Definitely! I just made a mistake in my original comment. prior to edit, where I wrote:

class ClusterArray(
    data: NDArray[float], 
    info: Union[Info, ChInfo], 
    cluster_names: Optional[List[str]] = None, 
    fitted_data: Optional[NDArray[float]] = None,
    labels: Optional[NDArray[int]] = None,
):

which was obviously wrong!

@mscheltienne
Copy link
Collaborator

But I think labels should be mandatory, since it is essential to calculate gev_

OK, then let's put it as third argument, after info.
@vferat Sounds good to you?

@vferat
Copy link
Owner

vferat commented Aug 6, 2023

Hey @RunKZhang, thanks for your contribution !

In my opinion, fitted_data and labels should be optional, as it can happen that only cluster centers (i.e. microstates maps) are shared from one study to another.

  • fitted_data can be defined if labels is not defined (for example, if the maps come from a non-clustering algorithm such as ICA, then there are no labels).
  • labels must not be defined if fitted_data is not defined.
  • If both fitted_data and labels are defined, then gev_ should be calculated automatically.
  • If neither fitted_data nor labels are defined, an specific error should be raise when using clustering metric.
  • We should add another argument ignore_polarity to know if the imported maps were computed/are intended to be used with or without polarity (i.e. Add support for ERP analysis: ignore_polarity = False #93)
  • We need to make sure that saving in fif format takes these different possibilities into account.

What do you think about it ?

Otherwise, the rest of your discussion seems good.

@RunKZhang
Copy link
Author

Hello @vferat , I agree with your idea!

As you have mentioned, there are no labels if the cluster_centers are from ICA or other algorithms, and I have to admit that your design of this feature is thoughtful. For the remaining design, I have no objection to them.

Finally, I am grateful my suggestions is taken into consideration. Thank you guys so much!

@mscheltienne
Copy link
Collaborator

mscheltienne commented Aug 7, 2023

OK, then let's go with this class signature:

class ClusterArray(_BaseCluster):

    def __init__(
        data: NDArray[float], 
        info: Union[Info, ChInfo], 
        cluster_names: Optional[List[str]] = None, 
        fitted_data: Optional[NDArray[float]] = None,
        labels: Optional[NDArray[int]] = None,
        ignore_polarity: bool = False,
    ):

I think ignore_polarity=False is the sensible default, @vferat?

@vferat
Copy link
Owner

vferat commented Aug 8, 2023

We touch a sensitive point here:

  • We do not support .predict() for ignore_polarity=False (yet?). So ClusterArray(ignore_polarity=False) would be of no use.
  • ignore_polarity is a critical parameter that may completely change the analysis. I would rather ensure the user has set it on purpose, than relying on an optional parameter.

The solution I see is to set ignore_polarity as a mandatory argument, and throw a NotImplementedError error if someone tries to set ClusterArray(ignore_polarity=False) for now.

@mscheltienne
Copy link
Collaborator

OK, that works for me.

@mscheltienne mscheltienne changed the title Add some codes for Custom class Add ClusterArray class to import any kind of microstates maps Mar 21, 2024
@mscheltienne
Copy link
Collaborator

mscheltienne commented Mar 21, 2024

I picked up this PR, and started with (1) clean-up, (2) input validation on the class, (3) I/O roundtrip to FIFF.
No test for now, but that should be a good basis. @vferat Can you have a first look to the main class and to what could be missing. Mostly, could we compute fitting variables from the information provided with this signature, e.g. GEV?

Repository owner deleted a comment from codecov bot Mar 21, 2024
Copy link
Owner

@vferat vferat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fitted_data and labels are not mandatory.

  • If both fitted_data and labels are provided, then it is possible to compute GEV_.
  • If neither fitted_data nor labels are provided, then it is not possible to use pycrostates.metrics.

I don't see any other parameter that might need specific handling

pycrostates/io/fiff.py Outdated Show resolved Hide resolved
pycrostates/io/fiff.py Outdated Show resolved Hide resolved
pycrostates/cluster/array.py Outdated Show resolved Hide resolved
@mscheltienne
Copy link
Collaborator

But if we have fitted_data, we should be able to compute the labels? So in the end, we compute the GEV in all cases except if fitted_data is not provided?

@vferat
Copy link
Owner

vferat commented Mar 27, 2024

But if we have fitted_data, we should be able to compute the labels? So in the end, we compute the GEV in all cases except if fitted_data is not provided?

It could be, but there are some cases where the process is not straight forward:

  • some clustering algorithms assign sample to a cluster even if the distance between the sample and the choosen cluster center is the smallest ( for example a clustering algorithm using another distance definition, or a clsuter algorithm taking into account other parameters such as sample order..)
  • some algorithms can extract maps without labels (i.e. ICA).

These special cases make it difficult to decide whether labels should be automatically calculated. I'm open to discussion, as I have no arguments either for or against.

@mscheltienne
Copy link
Collaborator

Sounds like labels should not be auto-calculated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding a Custom class that is same as ModKMeans
3 participants