MLPRegressor - Validation score wrongly defined #24411

ducvinh-nguyen · 2022-09-09T18:12:16Z

Describe the bug

In MLPRegressor, if the option early_stopping is set as True, the model will monitor the loss calculated on the validation set in stead of the training set, using the same loss formulation which is the mean squared error. However, as implemented in the line 719 in the source code:

self.validation_scores_.append(self.score(X_val, y_val))

The function "score", which returns (to confirm) the coefficient of determination, is used. This is not correct. It should be something like:

self.validation_scores_.append(mean_squared_error(self.predict(X_val), y_val))

Steps/Code to Reproduce

Sorry, I don't have time to write a simple code. But the error is quite clear.

Expected Results

The validation score must be mean squared error.

Actual Results

Coefficient of determination

Versions

1.1.1

The text was updated successfully, but these errors were encountered:

MaxwellLZH · 2022-09-13T04:28:02Z

Hi @ducvinh9 , thank you for reporting the issue.

The coefficient of determination is calculated as 1 - u / v, where u is sum of square errors ((y_true - y_pred) **2).sum() and v is the total sum of squares ((y_true - y_true.mean()) **2).sum() .

Therefore a larger coefficient of determination is equivalent to smaller mean squared error.

ducvinh-nguyen · 2022-09-13T07:35:30Z

Hi @MaxwellLZH,

Thank for your reply. It is true that a larger coefficient of determination is equivalent to smaller mean squared error, but I think we should not use them interchangeably for the the following reasons:

With neural networks, one wants to model a non linear model. And R-squared is not a valid metric in this case. See the following links for more details: link1, link2
The mean squared error has a unit which is the square of the unit of the output and the R_squared has no unit. Or the self.tol variable is an absolute value which is compared to the absolute change of the monitored metric. So when one use a validation set, he must rethink about self.tol, which is not intuitive and easy to be falsely ignored.
In almost all ML courses, one monitors and plots the learning curve and the validation curve in function of the number of epochs on the same plot, with the same axis, and bases on it to conclude on underfitting and overfitting probleme. I do not understand why we must do the other way and make a counter-intuitive change here, and not just use the mean squared error for the validation.

Juste my thinking. MLPRegressor is an awesome tool which performs faster on simple NN models than Tensorflow on CPU. I heard someone says people stop supporting additional features for the MLP module, which is sad.

glemaitre · 2022-09-15T08:51:01Z

I agree that usually, we should monitor the loss instead of the final metric. This is indeed the default in other estimators such as gradient boosting. We could think of adding a scoring keyword allowing us to switch to another score and by default, it uses the loss as in HistGradientBoosting.

ogrisel · 2022-10-17T17:04:37Z

Note that in scikit-learn we have the naming convention that "score" always mean "higher is better" and "loss" mean that "lower is better" when we speak about performance metrics.

I think our early stopping API is rather poor and not consistent across estimators. One way to make them consistent without introducing too much code duplication would be to go through the future callback API currently being designed by @jeremiedbb in #22000.

ogrisel · 2022-10-17T17:11:57Z

Note that in HistGradientBoostingClassifier/Regressor we do allow for custom metrics but we use the negative loss by default in an attribute named validation_score_...

This is not very intuitive either (but it's consistent with the scikit-learn loss/score naming convention).

We need to rethink this. The callback API should allow us to both use one specific loss or scoring to decide when to stop but also allow us to compute many metric values both on the training and validation data at the end of each iteration in fit.

ducvinh-nguyen added Bug Needs Triage Issue requires triage labels Sep 9, 2022

ducvinh-nguyen changed the title ~~MLPRegressor - Validation score wrongly calculated~~ MLPRegressor - Validation score wrongly defined Sep 10, 2022

cmarmo added the module:neural_network label Sep 17, 2022

glemaitre added Enhancement Bug and removed Needs Triage Issue requires triage Bug labels Oct 17, 2022

glemaitre mentioned this issue Oct 17, 2022

FIX always expose best_loss_, validation_scores_, and best_validation_score #24683

Merged

thomasjpfan mentioned this issue Dec 2, 2022

FIX Allow input validation by pass in MLPClassifier #24873

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLPRegressor - Validation score wrongly defined #24411

MLPRegressor - Validation score wrongly defined #24411

ducvinh-nguyen commented Sep 9, 2022 •

edited

MaxwellLZH commented Sep 13, 2022

ducvinh-nguyen commented Sep 13, 2022 •

edited

glemaitre commented Sep 15, 2022

ogrisel commented Oct 17, 2022

ogrisel commented Oct 17, 2022

MLPRegressor - Validation score wrongly defined #24411

MLPRegressor - Validation score wrongly defined #24411

Comments

ducvinh-nguyen commented Sep 9, 2022 • edited

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

MaxwellLZH commented Sep 13, 2022

ducvinh-nguyen commented Sep 13, 2022 • edited

glemaitre commented Sep 15, 2022

ogrisel commented Oct 17, 2022

ogrisel commented Oct 17, 2022

ducvinh-nguyen commented Sep 9, 2022 •

edited

ducvinh-nguyen commented Sep 13, 2022 •

edited