Skip to content

Balakishan77/Spam-Email-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Spam-Email-Classification

Analyzing the content of an Email dataset which contains above 5000 email sample with labeled spam or not.We have built a model to classify given email Spam((junk email) or ham (good email) using Naive Bayes Classification algorithm with accuracy score of ~99 . #Naive Bayes Classifier Introduction Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features #Checking the distribution of data.

with lier

we can see some extreme outliers, we'll set a threshold for length of text (here threshold is 10000, I have not applied this threshold in algotithm implementaion) and plot the histogram again

with outlier

Below are metrics about the results:

#Confusion Matrix

image

We achieved 98.836899942163114% accuracy(Mean) with 0.4% standard variance. We are in low bias and low variance region, below plot of the Learning curve.

learning curve

Releases

No releases published

Packages

No packages published

Languages