Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROC AUC scores don't match to those of sklearn #211

Open
abudis opened this issue Feb 5, 2020 · 1 comment
Open

ROC AUC scores don't match to those of sklearn #211

abudis opened this issue Feb 5, 2020 · 1 comment

Comments

@abudis
Copy link

abudis commented Feb 5, 2020

Hey,

I've tried using the package on my dataset, which is binary classification problem, but the ROC AUC scores produced by HPH don't match to the scores produced by a simple CV loop from sklearn.

The difference is about 0.1, which is way too large for this to be explained by random splits.
I tried multiple packages - XGBoost, LGBM, CatBoost, and they all produce similar result: HPH CV AUC is about 0.1 lower than the CV AUC calculated by sklearn's cross_validate.

I use the same hyperparameters to cross validate in sklearn and in HPH, therefore this cannot be explained by the different hyperparams neither.

On HPH side, I only used CVExperiment for now.

Could you please point me in the direction, where the difference may come from?

Cheers,
Artem

@HunterMcGushion
Copy link
Owner

Sorry for the delay in responding to you, @abudis!
Thanks for opening this issue. Hopefully we can clear everything up!

Although you say the disparity is too large to be explained by random splits, in my experience random seeds often have a much larger impact on scores than expected. Aside from the random splits made during CV, many algorithms also accept random state parameters, which HH also assigns and records. So this may also be contributing to the difference between your scores.

This issue is rather old, but some of the sample code may help explain how HH does things behind the scenes. Also, keep in mind that average ROC-AUC scores are computed via micro-averaging the actual predictions, rather than averaging the scores themselves.

If the scores still don't seem to make sense, I'd love to see a minimal code example demonstrating the score mismatch you're seeing (preferably with a popular toy dataset for reproducibility). That way, we can determine the cause of the difference between your scores.

Thank you again for opening this issue, and once again, I apologize for taking so long to reply. Please let me know if you're having trouble coming up with some code to reproduce the problem, and we'll figure something else out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants