Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] About Cross-Validation in Chapter2 #631

Open
hongxinfu opened this issue Jan 26, 2024 · 1 comment
Open

[QUESTION] About Cross-Validation in Chapter2 #631

hongxinfu opened this issue Jan 26, 2024 · 1 comment

Comments

@hongxinfu
Copy link

hongxinfu commented Jan 26, 2024

In Chapter2, we had make a training set by using stratified sampling to guarantee that the test set is representative of the overall population. However, in the "Better Evaluation Using Cross-Validation" we just use Scikit-Learn’s K-fold cross-validation feature. to randomly splits the training set, that means everytime we train the model, we use a training set that might be not representative of the overall population. Why would this be okay? why don't we need to divided the traing set to k-folds by using stratified sampling?
Thank you for your answer

@lvalencip
Copy link

I have the same feeling as you. I believe the K-fold cross-validation should use the strat_train_set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants