Skip to content

Manar20575/Data-Science-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Binary income classification :

build a model that predicts whether an individual makes over $50,000 per year based on anonymized census data

Goal :

understanding factors influencing income inequality and potentially informing targeted social programs.

Data Cleaning or Refinement :

1- deals with Missing Values. 2- Figure out why the data is missing. 3- Eliminating all extra variables. 4- Eliminating duplicates. 5- detect and remove outliers (you can use box plot to ensure that your data have outliers). 6- Scaling and Normalization. 7- Eliminating blank spaces or missing information.(can use SimpleImputer to handle missing values). 8- Arranging the data logically and sequentially so that it is easy to visualize. 9- Grouping data in rows and columns or horizontally and vertically will help in data arrangement and also proper visualization. 10- Dealing with Inconsistent Data Entry.

Exploratory data analysis (EDA):

How is one variable related to the other? What sort of relationship exists between two different variables? What kind of trend is the data following? Can a dataset be divided into smaller parts?

Visualization:

used basic visualization methods using plottly and cufflinks not matplotlib and seaborn : 1- Line plots. 2- Area plots. 3- Histogram. 4- Bar charts. 5- Pie charts. 6- Box plots. 7- Scatter plots. 8- Bubble plots.

Feature Engineering:

Dimensionality Reduction (PCA) / Encoding (1 Hot - Normal) / Scaling

Build model:

7 Models evaluation using different evaluation metrics like (Accuracy – Precision – Recall – ROC): image