Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AUC metric #3935

Merged
merged 8 commits into from
Apr 28, 2024
Merged

Fix AUC metric #3935

merged 8 commits into from
Apr 28, 2024

Conversation

celestinoxp
Copy link
Contributor

@celestinoxp celestinoxp commented Mar 4, 2024

some details have not been updated to support the latest scikit-learn 1.4 code

  • confirm/update code
  • fix errors
  • make sure tests are testing metrics correctly (needs create more tests?)

Closes #3932

@celestinoxp
Copy link
Contributor Author

@Yard1 @moezali1 @tvdboom @glemaitre @ogrisel @thomasjpfan @lorentzenchr @adrinjalali

something is wrong with AUC metrics... i have no idea how to fix this pull-request...

image

@ogrisel
Copy link

ogrisel commented Mar 5, 2024

Could you please provide a minimal reproducer on synthetic data that ideally only involves scikit-learn? Working on crafting such a reproducer will likely help you understand what's going on.

@celestinoxp
Copy link
Contributor Author

Could you please provide a minimal reproducer on synthetic data that ideally only involves scikit-learn? Working on crafting such a reproducer will likely help you understand what's going on.

from pycaret.datasets import get_data
juice = get_data('juice')
from pycaret.classification import *
exp_name = setup(data = juice,  target = 'Purchase')
best_model = compare_models()

@celestinoxp
Copy link
Contributor Author

@ngupta23 can you help?

@@ -115,10 +116,11 @@ def __init__(
if scorer
else pycaret.internal.metrics.make_scorer_with_error_score(
score_func,
needs_proba=target == "pred_proba",
needs_threshold=target == "threshold",
response_method=None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is calling scikit-learn's make_scorer under the covers, then you can pass in the response_method directly here.

if target == "pred"
    response_method = "predict"
elif target == "pred_proba":
    response_method = "predict_proba"
else:  # threshold
    response_method = "decision_function"

...

else pycaret.internal.metrics.make_scorer_with_error_score(
    score_func,
    response_method=response_method,
    greater_is_better=greater_is_better,
    error_score=0.0,
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested but still not working...
logs.log show:

2024-03-12 18:16:58,428:WARNING:C:\Users\celes\anaconda3\lib\site-packages\pycaret\internal\metrics.py:196: FitFailedWarning: Metric 'make_scorer(roc_auc_score, response_method=('decision_function', 'predict_proba'), average=weighted, multi_class=ovr)' failed and error score 0.0 has been returned instead. If this is a custom metric, this usually means that the error is in the metric code. Full exception below:
Traceback (most recent call last):
  File "C:\Users\celes\anaconda3\lib\site-packages\pycaret\internal\metrics.py", line 188, in _score
    return super()._score(
  File "C:\Users\celes\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py", line 345, in _score
    y_pred = method_caller(
  File "C:\Users\celes\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py", line 87, in _cached_call
    result, _ = _get_response_values(
  File "C:\Users\celes\anaconda3\lib\site-packages\sklearn\utils\_response.py", line 210, in _get_response_values
    y_pred = prediction_method(X)
  File "C:\Users\celes\anaconda3\lib\site-packages\pycaret\internal\pipeline.py", line 341, in predict_proba
    Xt = transform.transform(Xt)
  File "C:\Users\celes\anaconda3\lib\site-packages\sklearn\utils\_set_output.py", line 295, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "C:\Users\celes\anaconda3\lib\site-packages\pycaret\internal\preprocess\transformers.py", line 233, in transform
    X = to_df(X, index=getattr(y, "index", None))
  File "C:\Users\celes\anaconda3\lib\site-packages\pycaret\utils\generic.py", line 103, in to_df
    data = pd.DataFrame(data, index, columns)
  File "C:\Users\celes\anaconda3\lib\site-packages\pandas\core\frame.py", line 822, in __init__
    mgr = ndarray_to_mgr(
  File "C:\Users\celes\anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 319, in ndarray_to_mgr
    values = _prep_ndarraylike(values, copy=copy_on_sanitize)
  File "C:\Users\celes\anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 575, in _prep_ndarraylike
    values = np.array([convert(v) for v in values])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

  warnings.warn(

2024-03-12 18:16:58,428:WARNING:C:\Users\celes\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1561: UserWarning: Note that pos_label (set to 'MM') is ignored when average != 'binary' (got 'weighted'). You may use labels=[pos_label] to specify a single positive class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan Can you help to investigate if the problem is with pycaret or scikit-learn? I'm doing tests on my laptop but I'm not sure where the error is to fix...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have the bandwidth to investigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have the bandwidth to investigate.

but can you talk to someone on the scikit-learn side for support?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to debug to see if it is a pycaret bug or a scikit-learn bug. If it is a scikit-learn bug, then open an issue with a minimal reproduce that only involves scikit-learn.

#3935 (comment) is not a valid reproducer for scikit-learn because it is still using pycaret.

@celestinoxp
Copy link
Contributor Author

@celestinoxp
Copy link
Contributor Author

@Aloqeely can you give help fix bugs in pycaret?

@Aloqeely
Copy link

Sorry, I am not familiar with PyCaret
Good luck!

@moezali1 moezali1 requested a review from Yard1 April 25, 2024 19:47
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
@Yard1 Yard1 changed the title [WIP] fix auc metric Fix AUC metric Apr 28, 2024
@Yard1 Yard1 merged commit 9ee0cf4 into pycaret:master Apr 28, 2024
14 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All result AUC = 0 with compare_model
5 participants