Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLPRegressor - Validation score wrongly defined #24411

Open
ducvinh-nguyen opened this issue Sep 9, 2022 · 5 comments
Open

MLPRegressor - Validation score wrongly defined #24411

ducvinh-nguyen opened this issue Sep 9, 2022 · 5 comments

Comments

@ducvinh-nguyen
Copy link

ducvinh-nguyen commented Sep 9, 2022

Describe the bug

In MLPRegressor, if the option early_stopping is set as True, the model will monitor the loss calculated on the validation set in stead of the training set, using the same loss formulation which is the mean squared error. However, as implemented in the line 719 in the source code:

self.validation_scores_.append(self.score(X_val, y_val))

The function "score", which returns (to confirm) the coefficient of determination, is used. This is not correct. It should be something like:

self.validation_scores_.append(mean_squared_error(self.predict(X_val), y_val))

Steps/Code to Reproduce

Sorry, I don't have time to write a simple code. But the error is quite clear.

Expected Results

The validation score must be mean squared error.

Actual Results

Coefficient of determination

Versions

1.1.1
@ducvinh-nguyen ducvinh-nguyen added Bug Needs Triage Issue requires triage labels Sep 9, 2022
@ducvinh-nguyen ducvinh-nguyen changed the title MLPRegressor - Validation score wrongly calculated MLPRegressor - Validation score wrongly defined Sep 10, 2022
@MaxwellLZH
Copy link
Contributor

Hi @ducvinh9 , thank you for reporting the issue.

The coefficient of determination is calculated as 1 - u / v, where u is sum of square errors ((y_true - y_pred) **2).sum() and v is the total sum of squares ((y_true - y_true.mean()) **2).sum() .

Therefore a larger coefficient of determination is equivalent to smaller mean squared error.

@ducvinh-nguyen
Copy link
Author

ducvinh-nguyen commented Sep 13, 2022

Hi @MaxwellLZH,

Thank for your reply. It is true that a larger coefficient of determination is equivalent to smaller mean squared error, but I think we should not use them interchangeably for the the following reasons:

  • With neural networks, one wants to model a non linear model. And R-squared is not a valid metric in this case. See the following links for more details: link1, link2

  • The mean squared error has a unit which is the square of the unit of the output and the R_squared has no unit. Or the self.tol variable is an absolute value which is compared to the absolute change of the monitored metric. So when one use a validation set, he must rethink about self.tol, which is not intuitive and easy to be falsely ignored.

  • In almost all ML courses, one monitors and plots the learning curve and the validation curve in function of the number of epochs on the same plot, with the same axis, and bases on it to conclude on underfitting and overfitting probleme. I do not understand why we must do the other way and make a counter-intuitive change here, and not just use the mean squared error for the validation.

Juste my thinking. MLPRegressor is an awesome tool which performs faster on simple NN models than Tensorflow on CPU. I heard someone says people stop supporting additional features for the MLP module, which is sad.

@glemaitre
Copy link
Member

I agree that usually, we should monitor the loss instead of the final metric. This is indeed the default in other estimators such as gradient boosting. We could think of adding a scoring keyword allowing us to switch to another score and by default, it uses the loss as in HistGradientBoosting.

@ogrisel
Copy link
Member

ogrisel commented Oct 17, 2022

Note that in scikit-learn we have the naming convention that "score" always mean "higher is better" and "loss" mean that "lower is better" when we speak about performance metrics.

I think our early stopping API is rather poor and not consistent across estimators. One way to make them consistent without introducing too much code duplication would be to go through the future callback API currently being designed by @jeremiedbb in #22000.

@ogrisel
Copy link
Member

ogrisel commented Oct 17, 2022

Note that in HistGradientBoostingClassifier/Regressor we do allow for custom metrics but we use the negative loss by default in an attribute named validation_score_...

This is not very intuitive either (but it's consistent with the scikit-learn loss/score naming convention).

We need to rethink this. The callback API should allow us to both use one specific loss or scoring to decide when to stop but also allow us to compute many metric values both on the training and validation data at the end of each iteration in fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants