Skip to content

Sorted stratification for Kfold regression problems with sklearn

Notifications You must be signed in to change notification settings

ARomoH/sorted_stratification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

sorted_stratification

When performing a grid search on sklearn, we have several mechanisms for classification problems to stratify the sample. However, there are not too many utilities for regression problems. That is why I have developed a method called Sorted stratification inspired by the following article link that allows combining this technique with the traditional K-fold method of sklearn.

Method definition

"Let N denote the number of samples, y the target variable for the samples, and k the number of equally sized partitions we wish to create.

With sorted stratification, we first sort the samples based on their target variable, y. Then we step through each consecutive k samples in this order and randomly allocate exactly one of them to one of the partitions. We continue this floor(N/k) times, and the remaining mod(N,k) samples are randomly allocated to one of the k partitions."

About

Sorted stratification for Kfold regression problems with sklearn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages