spam-email-detector

Project Description

This project details the various steps that I took to build my own spam email classifier that is able to classify an email as spam or non-spam(ham) via a set of its features.

I obtained the dataset for this project through Kaggle: https://www.kaggle.com/datasets/nitishabharathi/email-spam-dataset?resource=download. In particular, I have used the 'completeSpamAssassin.csv' dataset for this project. It simply includes a serial number column that can be used as the index, a body column that contains the actual text content of each email, and the label column that is 0 for ham emails and 1 for spam emails as seen in the figure below.

From the dataset, it is straightforward that we may use a vectorizer (count/TD-IDF) to extract features from the email body column. However, for this project, I have implemented some feature engineering to attempt at producing more features (in addition to the ones extracted from a TF-IDF vectorizer) that can enhance the performance of the spam email classifier. I have also implemented several different Machine Learning algorithms to identify the best one.

Topics covered in this project are:

Supervised Machine Learning
Binary Classification
Natural Language Processing
Data Visualisation
Exploratory Data Analysis & Manipulation
Feature Engineering & Extraction
Model Evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
completeSpamAssassin.csv		completeSpamAssassin.csv
email_spam.ipynb		email_spam.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

completeSpamAssassin.csv

completeSpamAssassin.csv

email_spam.ipynb

email_spam.ipynb

Repository files navigation

spam-email-detector

Project Description

About

Releases

Packages

Languages

License

chloeoxe/spam-email-detector

Folders and files

Latest commit

History

Repository files navigation

spam-email-detector

Project Description

About

Topics

Resources

License

Stars

Watchers

Forks

Languages