[BUG] Chapter 2, CV splits are not random as opposed to what is written #105

eranr · 2023-11-08T20:32:04Z

Notebook name: 02_end_to_end_machine_learning_project
Section 4.2 “Better Evaluation Using Cross-Validation”, cell 140
Book Chapter 2, subsection "Better Evaluation Using Cross-Validation".
According to the book the following code randomly splits the training set:

from sklearn.model_selection import cross_val_score

tree_rmses = -cross_val_score(tree_reg, housing, housing_labels,
                              scoring="neg_root_mean_squared_error", cv=10)

According to the documentation of cross_val_score (version 1.3.2) specifying an integer for the “cv” variable implies an internal use of (Stratified)KFold class with shuffle=False. Perhaps stating the obvious - to get randomization, one could pass a CV splitter instance as e.g. below:

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import KFold

forest_reg = make_pipeline(preprocessing,
                       	RandomForestRegressor(random_state=42))
forest_rmses = -cross_val_score(forest_reg, housing, housing_labels,
        	scoring="neg_root_mean_squared_error", cv=KFold(n_splits=10,  shuffle=True, random_state=42))

The text was updated successfully, but these errors were encountered:

eranr changed the title ~~[BUG]~~ [BUG] Chapter 2, CV splits are not random as opposed to what is written Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Chapter 2, CV splits are not random as opposed to what is written #105

[BUG] Chapter 2, CV splits are not random as opposed to what is written #105

eranr commented Nov 8, 2023

[BUG] Chapter 2, CV splits are not random as opposed to what is written #105

[BUG] Chapter 2, CV splits are not random as opposed to what is written #105

Comments

eranr commented Nov 8, 2023