Skip to content

Machine Learning, EDA, Classification tasks, Regression tasks for customer churn

License

Notifications You must be signed in to change notification settings

sondosaabed/Customer-Churn-Dataset-Analysis

Repository files navigation

Customer-Churn-Dataset-Analysis

This project was created as part of Machine Learning Course at BZU. After completing the analysis of the Customer Churn Data provided, the results can be found in the document provided. Based on the analysis conclusions were found that will better decision making concerning the reducing of customer churn and improving the customer’s value.

Data corelation using heat map:

image

Linear reggression models to predict customer value:

image

Trained models:

KNN classifier:

high variance overfitting

image

Naive bayes classifier:

image

Logestic regression:

high bias underfitting

image

Conclusions:

The analysis of the customer churn dataset has shown that ID and Age group attributes will not contribute to the prediction of Customer value and the classification of Churn attribute so they were dropped.

It is also shown, that attributes selection, in which to decide the important factors to predict “customer value” will be more accurate if chosen based on the correlation matrix, and not on the point of view of the analyzer. Which were these attributes:

1- Frequency of SMS 2- Frequency of use 3- Seconds of use

As for the classification models, the Naive Bayes classifier was found to be the best performer among the three classification algorithms tested, because it is not an overfitting neither an under fitting model.

The results of this analysis can be used by your company to reduce customer churn. The consideration may be that the company starts using the Naive Bayes classifier as a tool to predict customer churn, hence prevent it. Also focus on improving the above attributes to make higher customer values.