Spam-Email-Classification

Analyzing the content of an Email dataset which contains above 5000 email sample with labeled spam or not.We have built a model to classify given email Spam((junk email) or ham (good email) using Naive Bayes Classification algorithm with accuracy score of ~99 . #Naive Bayes Classifier Introduction Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features #Checking the distribution of data.

we can see some extreme outliers, we'll set a threshold for length of text (here threshold is 10000, I have not applied this threshold in algotithm implementaion) and plot the histogram again

Below are metrics about the results:

#Confusion Matrix

We achieved 98.836899942163114% accuracy(Mean) with 0.4% standard variance. We are in low bias and low variance region, below plot of the Learning curve.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
spam_ham.py		spam_ham.py
spamham.csv		spamham.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

spam_ham.py

spam_ham.py

spamham.csv

spamham.csv

Repository files navigation

Spam-Email-Classification

About

Releases

Packages

Languages

Balakishan77/Spam-Email-Classifier

Folders and files

Latest commit

History

Repository files navigation

Spam-Email-Classification

About

Topics

Resources

Stars

Watchers

Forks

Languages