[QUESTION] About Cross-Validation in Chapter2 #631

hongxinfu · 2024-01-26T15:45:40Z

In Chapter2, we had make a training set by using stratified sampling to guarantee that the test set is representative of the overall population. However, in the "Better Evaluation Using Cross-Validation" we just use Scikit-Learn’s K-fold cross-validation feature. to randomly splits the training set, that means everytime we train the model, we use a training set that might be not representative of the overall population. Why would this be okay? why don't we need to divided the traing set to k-folds by using stratified sampling?
Thank you for your answer

lvalencip · 2024-04-23T00:13:13Z

I have the same feeling as you. I believe the K-fold cross-validation should use the strat_train_set

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] About Cross-Validation in Chapter2 #631

[QUESTION] About Cross-Validation in Chapter2 #631

hongxinfu commented Jan 26, 2024 •

edited

lvalencip commented Apr 23, 2024

[QUESTION] About Cross-Validation in Chapter2 #631

[QUESTION] About Cross-Validation in Chapter2 #631

Comments

hongxinfu commented Jan 26, 2024 • edited

lvalencip commented Apr 23, 2024

hongxinfu commented Jan 26, 2024 •

edited