Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad error messages in ClassifierChain on multioutput multiclass #13339

Open
amueller opened this issue Feb 28, 2019 · 19 comments · May be fixed by #21942
Open

Bad error messages in ClassifierChain on multioutput multiclass #13339

amueller opened this issue Feb 28, 2019 · 19 comments · May be fixed by #21942
Labels
good first issue Easy with clear instructions to resolve module:multioutput

Comments

@amueller
Copy link
Member

trying to run classifier chain on a multiclass problem with a reshaped y (which is interpreted as multioutput multiclass) doesn't work (see #9245), which is fine. But the error messages are really bad making it hard to understand what's going on.

We should detect that we're trying to do multioutput multiclass instead of multilabel and give an informative error.

@amueller amueller added Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted labels Feb 28, 2019
@EdinCitaku
Copy link

Hey Im pretty new, can I take a look at this and try to fix it?

@jnothman
Copy link
Member

jnothman commented Mar 1, 2019

This should reproduce the issue:

from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

Xs, ys = zip(*(make_classification(random_state=i, n_classes=3, n_informative=3) for i in range(3)))
X = np.hstack(Xs)
Y = np.transpose(ys)
ClassifierChain(LogisticRegression()).fit(X, Y).predict(X)

but actually this is running without complaint for me!!

@amueller, please provide a reproducible snippet??

@amueller
Copy link
Member Author

amueller commented Mar 1, 2019

ah, it's only decision_function that's broken.
If predict works, maybe fixing decision_function is reasonable?
I didn't actually read #9245 in detail: it looks like the predict part is already done in master?

from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
import numpy as np

Xs, ys = zip(*(make_classification(random_state=i, n_classes=3, n_informative=3) for i in range(3)))
X = np.hstack(Xs)
Y = np.transpose(ys)
ClassifierChain(LogisticRegression()).fit(X, Y).decision_function(X)

@amueller
Copy link
Member Author

amueller commented Mar 1, 2019

Maybe we should try fixing decision_function and predict_proba and if that's too hard we should raise attribute errors in the multioutput multiclass case?

@EdinCitaku
Copy link

EdinCitaku commented Mar 2, 2019

In case of raising an attribute error, is it enough to add an if statement, that calls the type_of_target function of Y? Something along the line of:
if type_of_target(Y_decision_chain) == 'multiclass-multioutput': raise AttributeError("decision_function currently does not support multiclass-multioutput as target type")

@jnothman
Copy link
Member

jnothman commented Mar 3, 2019 via email

@EdinCitaku
Copy link

So what do we need to add for this support? As I understood decision_function expects a 2d-Array when calling estimator.decision_function(X_aug) but since in our case its a multiclass-multioutput target type it gets a list of 2d-arrays instead.
Is it suitable to just add more collumns to the variable Y_decision_chain and fill it with the collumns from each 2d-array from the list of 2d-arrays we get?

@jnothman
Copy link
Member

jnothman commented Mar 4, 2019 via email

@EdinCitaku
Copy link

EdinCitaku commented Mar 8, 2019

So I looked at it again and made some modifications but Im really not sure if I understood it correctly.
This is the code that I changed in the decision_function.

decision_function_result =  estimator.decision_function(X_aug)
if decision_function_result.shape[1] == 1:
    Y_decision_chain[:, chain_idx] = decision_function_result
else:
    Y_decision_chain[:, chain_idx] = decision_function_result[:,chain_idx]

Basicly each multi-output classifier in the Classifierchain is responsible for one row in the Y_decision_chain matrix. I threw away my previous idea of adding rows to Y_decision_chain , since it still need to conform the output rule of the decision_function.
Is this solving the problem at hand?
I'm sorry if it might be wrong or if I didn't understand you correctly.
I dont know how else to use the decision_function of each estimator to modify the Y_decision_chain, since each class is being predicted multible times.

@michellemroy
Copy link

Hi, I'm new and looking for an issue to work on. Is this issue still open?

@amueller
Copy link
Member Author

@michellemroy there's currently some work being done in #14654

@sethiabhishek
Copy link

I am new to the repo , is this issue still open?

@dhivyasreedhar
Copy link

Hi, can I work on this? I'm new so can someone guide me?

@ankitasankars
Copy link

Hello, Can I use this issue as my first contribution? I'm new so please can someone guide me?

@codecypher
Copy link

codecypher commented Dec 15, 2021

I would like to contribute but we should assign the tickets that are open but appear to be assigned.

@cmarmo cmarmo added module:multioutput and removed Easy Well-defined and straightforward way to resolve labels Feb 22, 2022
@albonec
Copy link

albonec commented Apr 28, 2023

If this issue hasn't been resolved already, iId like to contribute! It would be my first open-source contribution

@YashSaxena21
Copy link

Hi everyone, I'm new to the open source world and I want to contribute, please let me know a easy good first issue to start with. It will be really helpful if someone can guide me.

@santiagoahl
Copy link

I am working on this!

@PragyanTiwari
Copy link

Can anyone confirm is this issue still open or not??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Easy with clear instructions to resolve module:multioutput
Projects
None yet