spam-classifier

Spam Classifier built using CountVectorizer and Tf-idf Vectorizer. Source of dataset: https://www.kaggle.com/uciml/sms-spam-collection-dataset We employed Upsampling and Cross-val in our project, and built the following models:

Naive Bayes model with imbalanced dataset, using CountVectorizer
Naive Bayes model with imbalanced dataset, using Tf-idf Vectorizer
Naive Bayes model with cross-validation, using CountVectorizer
Naive Bayes model with cross-validation, using Tf-idf Vectorizer
Decision Tree models with imbalanced dataset, cross-val, and upsampled data. (6 models in total)

For EDA, we created the following:

Histogram of most commonly occuring words in the ham and spam messages
Wordclouds of most commonly occurring words in the ham and spam messages
Bar chart showing the number of spam and ham messages

We reported the f-measure and accuracy scores of each model as part of our findings in our powerpoint presentation, which is uploaded as well.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Decision-Tree		Decision-Tree
Naive-Bayes		Naive-Bayes
AI-Project.pptx		AI-Project.pptx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decision-Tree

Decision-Tree

Naive-Bayes

Naive-Bayes

AI-Project.pptx

AI-Project.pptx

README.md

README.md

Repository files navigation

spam-classifier

About

Releases

Packages

Languages

samimakhan/Spam-Classification-Project

Folders and files

Latest commit

History

Repository files navigation

spam-classifier

About

Topics

Resources

Stars

Watchers

Forks

Languages