Skip to content

When signaficant amount of data in highly-important features are missing, what can we do? Impute the missing data with mean or median? In this Juyter notebook, I demonstrate embedding a XGBoost model to do the data imputation in the data transformer.

Notifications You must be signed in to change notification settings

hanfei1986/Impute-missing-data-with-XGBoost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Impute-missing-data-with-XGBoost

When signaficant amount of data in highly-important features are missing, what can we do? Impute the missing data with mean or median? In this Juyter notebook, I demonstrate embedding a XGBoost model to do the data imputation in the data transformer.

In this dataset, a lot of "cost" data missing, but they are quite important to predict "price".

image

If we impute the missing "cost" with its mean or median, there will be a spike in the imputed dataset. In contrast, imputing the missing "cost" with a XGBoost regressor which is embedded in the data transformer and predicts "cost" from other features is very effective.

image

About

When signaficant amount of data in highly-important features are missing, what can we do? Impute the missing data with mean or median? In this Juyter notebook, I demonstrate embedding a XGBoost model to do the data imputation in the data transformer.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published