Multi-label classification #267

jayahm · 2022-05-17T09:32:13Z

Hi

Can this library and its methods work with multi-label classification algorithms?

Menelau · 2022-05-24T23:12:24Z

I haven't tested it yet but it should work well with the multi-output package from scikit-learn which transform a general estimator into a multi-label classification (or regression) algorithm: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.multioutput

so in this case it would be used together with the ClassifierChain or MultiOutputClassifier methods.

jayahm · 2022-06-05T06:09:59Z

Hi

I have tested with ClassifierChain. I got the following errors:

y_self has the shape of (200, 6918), where 6918 is the number of labels (0-1 binarized).

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [62], in <cell line: 3>()
      1 from deslib.dcs import OLA
      2 ola = OLA(pool_classifiers)
----> 3 ola.fit(X_val, y_val)
      4 ola_prediction = ola.predict(X_test, y_test)

File ~\anaconda3\lib\site-packages\deslib\base.py:207, in BaseDS.fit(self, X, y)
    204 self.random_state_ = check_random_state(self.random_state)
    206 # Check if the length of X and y are consistent.
--> 207 X, y = check_X_y(X, y)
    209 # Check if the pool of classifiers is None.
    210 # If yes, use a BaggingClassifier for the pool.
    211 if self.pool_classifiers is None:

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:63, in _deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
     61 extra_args = len(args) - len(all_args)
     62 if extra_args <= 0:
---> 63     return f(*args, **kwargs)
     65 # extra_args > 0
     66 args_msg = ['{}={}'.format(name, arg)
     67             for name, arg in zip(kwonly_args[:extra_args],
     68                                  args[-extra_args:])]

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:826, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    823     y = check_array(y, accept_sparse='csr', force_all_finite=True,
    824                     ensure_2d=False, dtype=None)
    825 else:
--> 826     y = column_or_1d(y, warn=True)
    827     _assert_all_finite(y)
    828 if y_numeric and y.dtype.kind == 'O':

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:63, in _deprecate_positional_args.<locals>._inner_deprecate_positional_args.<locals>.inner_f(*args, **kwargs)
     61 extra_args = len(args) - len(all_args)
     62 if extra_args <= 0:
---> 63     return f(*args, **kwargs)
     65 # extra_args > 0
     66 args_msg = ['{}={}'.format(name, arg)
     67             for name, arg in zip(kwonly_args[:extra_args],
     68                                  args[-extra_args:])]

File ~\anaconda3\lib\site-packages\sklearn\utils\validation.py:864, in column_or_1d(y, warn)
    858         warnings.warn("A column-vector y was passed when a 1d array was"
    859                       " expected. Please change the shape of y to "
    860                       "(n_samples, ), for example using ravel().",
    861                       DataConversionWarning, stacklevel=2)
    862     return np.ravel(y)
--> 864 raise ValueError(
    865     "y should be a 1d array, "
    866     "got an array of shape {} instead.".format(shape))

ValueError: y should be a 1d array, got an array of shape (200, 6918) instead.

Menelau · 2022-06-06T03:51:09Z

Hello,

Can you provide me with a small code example you used to get this error? Then, I can what can be done.

jayahm · 2022-06-06T13:57:25Z

Hi

Thanks for your response.

I have created s simple code here

https://www.dropbox.com/s/soaysxi2rhhj388/for_deslib.zip?dl=0

jayahm · 2022-06-17T01:25:25Z

Hi

Were you able to run my code?

I really hope there is a way to perform multi-label classification using this library.

Menelau · 2022-06-17T03:26:06Z

@jayahm Hello,

According to the example you provided, you want each base model to be a multilabel classifier and select the best between them according to each new sample correct? If that is the case there is no support for that. To the best of my knowledge, there is no dynamic ensemble technique that performs classifier selection of multi-label models. So we would need to develop a new technique first and then add it as it would involve multiple adaptations to this context in multiple steps in the pipeline (region of competence definition, competence estimation, selection scheme, and combination). I spent some time looking if there exists any technique in the literature by did not find any, so there is a huge potential for interesting research there...

If what you want is just to have a usual, classical DS technique (which works as single label classifier) that is transformed to perform multilabel classification with classical techniques that makes multi-label decomposition (binary relevance or classifier chain) you can just use like that:

`from sklearn.datasets import make_multilabel_classification
from deslib.des import KNORAE
from sklearn.model_selection import train_test_split
from sklearn.multioutput import ClassifierChain
X, Y = make_multilabel_classification(n_samples=1000, n_classes=5, random_state=0)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=0)

knorae = KNORAE(random_state=42)
chain = ClassifierChain(knorae, order='random', random_state=0)
chain.fit(X_train, Y_train).predict(X_test)
chain.predict_proba(X_test)`

jayahm · 2022-06-17T04:35:26Z

According to the example you provided, you want each base model to be a multilabel classifier and select the best between them according to each new sample correct?

Yes, very true.

I spent some time looking if there exists any technique in the literature by did not find any, so there is a huge potential for interesting research there...

Yes, I couldn't find too actually. I could feel from the beginning that this task is not straightforward since we need to define many things in the context of multi-label classification (region of competence definition, competence estimation, selection scheme, combination, etc). The main reason might be, for example, that a sample can have 3 labels, while another sample can have 5 labels. So, I am not sure how that can be adapted to this library.

If what you want is just to have a usual, classical DS technique (which works as a single-label classifier) that is transformed to perform multilabel classification with classical techniques that makes multi-label decomposition (binary relevance or classifier chain) you can just use like that:

Do you mean to first train multiple single-label classifiers as base classifiers (pool_classifiers) and apply KNORAE as ClassifierChain?

pool_classifiers = [model_perceptron,
                    model_svc,
                    model_bayes,
                    model_tree,
                    model_knn]

knorae = KNORAE(pool_classifiers, random_state=42)
knorau.fit(X_dsel, y_dsel)

chain = ClassifierChain(knorae, order='random', random_state=0)

chain.fit(X_train, Y_train).predict(X_test)
chain.predict_proba(X_test)`

jayahm · 2022-06-23T07:27:50Z

Hi

I tried your suggestion but using a heterogeneous pool of classifiers. I used the code I wrote above.

It seems like in order to train each classifier, it still needs a single-label dataset.

I think the code you suggested previously will generate bagging classifiers, right? Or, what are the base classifiers of that KNORAE you suggested?

Menelau · 2022-06-29T17:23:58Z

Yeah, it would generate a bagging classifier. Unfortunately to use a heterogenous one the current implementation does not allow due to some limitations in how scikit-learn clone classifiers (issue #89 ). I have a workaround in mind but it will take some time to have everything compatible with both libraries.

However I just saw there is a quite recent paper (published on june 20th) that proposes a DES method for multi-label classification: [(https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4145875]
)
I will see if I can get their original code and add it to this library.

jayahm · 2022-07-03T17:53:01Z

Hi @Menelau

That sounds good. I'll check the mentioned paper. Thanks for sharing.

Hopefully, deslib will capable of handling multi-label classification soon.

jayahm mentioned this issue Jun 9, 2022

Faiss Powered Multi-Label Classification #257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-label classification #267

Multi-label classification #267

jayahm commented May 17, 2022

Menelau commented May 24, 2022

jayahm commented Jun 5, 2022

Menelau commented Jun 6, 2022

jayahm commented Jun 6, 2022

jayahm commented Jun 17, 2022

Menelau commented Jun 17, 2022

jayahm commented Jun 17, 2022

jayahm commented Jun 23, 2022 •

edited

Menelau commented Jun 29, 2022

jayahm commented Jul 3, 2022

Multi-label classification #267

Multi-label classification #267

Comments

jayahm commented May 17, 2022

Menelau commented May 24, 2022

jayahm commented Jun 5, 2022

Menelau commented Jun 6, 2022

jayahm commented Jun 6, 2022

jayahm commented Jun 17, 2022

Menelau commented Jun 17, 2022

jayahm commented Jun 17, 2022

jayahm commented Jun 23, 2022 • edited

Menelau commented Jun 29, 2022

jayahm commented Jul 3, 2022

jayahm commented Jun 23, 2022 •

edited