Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-label classification #267

Open
jayahm opened this issue May 17, 2022 · 10 comments
Open

Multi-label classification #267

jayahm opened this issue May 17, 2022 · 10 comments

Comments

@jayahm
Copy link

jayahm commented May 17, 2022

Hi

Can this library and its methods work with multi-label classification algorithms?

@Menelau
Copy link
Collaborator

Menelau commented May 24, 2022

@jayahm Hello,

I haven't tested it yet but it should work well with the multi-output package from scikit-learn which transform a general estimator into a multi-label classification (or regression) algorithm: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.multioutput

so in this case it would be used together with the ClassifierChain or MultiOutputClassifier methods.

@jayahm
Copy link
Author

jayahm commented Jun 5, 2022

Hi

I have tested with ClassifierChain. I got the following errors:

y_self has the shape of (200, 6918), where 6918 is the number of labels (0-1 binarized).

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [62], in <cell line: 3>()
      1 from deslib.dcs import OLA
      2 ola = OLA(pool_classifiers)
----> 3 ola.fit(X_val, y_val)
      4 ola_prediction = ola.predict(X_test, y_test)

File ~\anaconda3\lib\site-packages\deslib\base.py:207, in BaseDS.fit(self, X, y)
    204 self.random_state_ = check_random_state(self.random_state)
    206 # Check if the length of X and y are consistent.
--> 207 X, y = check_X_y(X, y)
    209 # Check if the pool of classifiers is None.
    210 # If yes, use a BaggingClassifier for the pool.
    211 if self.pool_classifiers is None:

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:63, in _deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
     61 extra_args = len(args) - len(all_args)
     62 if extra_args <= 0:
---> 63     return f(*args, **kwargs)
     65 # extra_args > 0
     66 args_msg = ['{}={}'.format(name, arg)
     67             for name, arg in zip(kwonly_args[:extra_args],
     68                                  args[-extra_args:])]

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:826, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    823     y = check_array(y, accept_sparse='csr', force_all_finite=True,
    824                     ensure_2d=False, dtype=None)
    825 else:
--> 826     y = column_or_1d(y, warn=True)
    827     _assert_all_finite(y)
    828 if y_numeric and y.dtype.kind == 'O':

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:63, in _deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
     61 extra_args = len(args) - len(all_args)
     62 if extra_args <= 0:
---> 63     return f(*args, **kwargs)
     65 # extra_args > 0
     66 args_msg = ['{}={}'.format(name, arg)
     67             for name, arg in zip(kwonly_args[:extra_args],
     68                                  args[-extra_args:])]

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:864, in column_or_1d(y, warn)
    858         warnings.warn("A column-vector y was passed when a 1d array was"
    859                       " expected. Please change the shape of y to "
    860                       "(n_samples, ), for example using ravel().",
    861                       DataConversionWarning, stacklevel=2)
    862     return np.ravel(y)
--> 864 raise ValueError(
    865     "y should be a 1d array, "
    866     "got an array of shape {} instead.".format(shape))

ValueError: y should be a 1d array, got an array of shape (200, 6918) instead.

@Menelau
Copy link
Collaborator

Menelau commented Jun 6, 2022

Hello,

Can you provide me with a small code example you used to get this error? Then, I can what can be done.

@jayahm
Copy link
Author

jayahm commented Jun 6, 2022

Hi

Thanks for your response.

I have created s simple code here

https://www.dropbox.com/s/soaysxi2rhhj388/for_deslib.zip?dl=0

@jayahm
Copy link
Author

jayahm commented Jun 17, 2022

Hi

Were you able to run my code?

I really hope there is a way to perform multi-label classification using this library.

@Menelau
Copy link
Collaborator

Menelau commented Jun 17, 2022

@jayahm Hello,

According to the example you provided, you want each base model to be a multilabel classifier and select the best between them according to each new sample correct? If that is the case there is no support for that. To the best of my knowledge, there is no dynamic ensemble technique that performs classifier selection of multi-label models. So we would need to develop a new technique first and then add it as it would involve multiple adaptations to this context in multiple steps in the pipeline (region of competence definition, competence estimation, selection scheme, and combination). I spent some time looking if there exists any technique in the literature by did not find any, so there is a huge potential for interesting research there...

If what you want is just to have a usual, classical DS technique (which works as single label classifier) that is transformed to perform multilabel classification with classical techniques that makes multi-label decomposition (binary relevance or classifier chain) you can just use like that:

`from sklearn.datasets import make_multilabel_classification
from deslib.des import KNORAE
from sklearn.model_selection import train_test_split
from sklearn.multioutput import ClassifierChain
X, Y = make_multilabel_classification(n_samples=1000, n_classes=5, random_state=0)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=0)

knorae = KNORAE(random_state=42)
chain = ClassifierChain(knorae, order='random', random_state=0)
chain.fit(X_train, Y_train).predict(X_test)
chain.predict_proba(X_test)`

@jayahm
Copy link
Author

jayahm commented Jun 17, 2022

According to the example you provided, you want each base model to be a multilabel classifier and select the best between them according to each new sample correct?

Yes, very true.

I spent some time looking if there exists any technique in the literature by did not find any, so there is a huge potential for interesting research there...

Yes, I couldn't find too actually. I could feel from the beginning that this task is not straightforward since we need to define many things in the context of multi-label classification (region of competence definition, competence estimation, selection scheme, combination, etc). The main reason might be, for example, that a sample can have 3 labels, while another sample can have 5 labels. So, I am not sure how that can be adapted to this library.

If what you want is just to have a usual, classical DS technique (which works as a single-label classifier) that is transformed to perform multilabel classification with classical techniques that makes multi-label decomposition (binary relevance or classifier chain) you can just use like that:

Do you mean to first train multiple single-label classifiers as base classifiers (pool_classifiers) and apply KNORAE as ClassifierChain?

pool_classifiers = [model_perceptron,
                    model_svc,
                    model_bayes,
                    model_tree,
                    model_knn]

knorae = KNORAE(pool_classifiers, random_state=42)
knorau.fit(X_dsel, y_dsel)

chain = ClassifierChain(knorae, order='random', random_state=0)

chain.fit(X_train, Y_train).predict(X_test)
chain.predict_proba(X_test)`

@jayahm
Copy link
Author

jayahm commented Jun 23, 2022

Hi

I tried your suggestion but using a heterogeneous pool of classifiers. I used the code I wrote above.

It seems like in order to train each classifier, it still needs a single-label dataset.

I think the code you suggested previously will generate bagging classifiers, right? Or, what are the base classifiers of that KNORAE you suggested?

@Menelau
Copy link
Collaborator

Menelau commented Jun 29, 2022

Yeah, it would generate a bagging classifier. Unfortunately to use a heterogenous one the current implementation does not allow due to some limitations in how scikit-learn clone classifiers (issue #89 ). I have a workaround in mind but it will take some time to have everything compatible with both libraries.

However I just saw there is a quite recent paper (published on june 20th) that proposes a DES method for multi-label classification: [(https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4145875]
)
I will see if I can get their original code and add it to this library.

@jayahm
Copy link
Author

jayahm commented Jul 3, 2022

Hi @Menelau

That sounds good. I'll check the mentioned paper. Thanks for sharing.

Hopefully, deslib will capable of handling multi-label classification soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants