Skip to content

Rushi21-kesh/Handling-Missing-Values

Repository files navigation

Handling-Missing-Values

image

Missing data, also known as missing values, is where some of the observations in a data set are blank.

ex.

  • At age feature womens hesitate to put down their age
  • Men hesitate to show their salary
  • informations are not that valid

Type of data

1. Categorical Data: 
    It is a string type of data such as Gender, Sex, education, etc.

2. Discrite Data:
    It is a number type data which is whole number only such as How many bank account you have,How many bike you have, etc.

3.Continous Data:
    It is a number type data such as Age, Height, Profit, etc.

Different type of Missing data

1. Missing Completely at Random (MCAR)
    - The variable is missing completely at random (MCAR) if the probability of being missing is the same for all the observations.
    - When data is MCAR, there is absolutely no relationship between the data missing and other values.

2. Missing Data Not at Random (MNAR)
    - There is absolutely some relationship between the data misssing and any other feature's  values in dataset. 

3. Missing at Random(MAR)
    - If the propensity for a data point to be missing is't related to the missing data, but it's related to some of the observed data.
    - When Data is MAR, The data is missing but can be predicted from other information.

All techniques of handling Missing values

1. Mean/Mode/Median replacement
2. Random sample imputation
3. Capturing NAN values with a new feature
4. End of Distribution imputation
5. Arbitrary imputation
6. Frequent categories imputation   

Note we perform all this techniques on 'Titanic' dataset, you can download it from :

About

This repository is on different types of data, types of missing values and how to handle missing value

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published