sms_spam_classification

This repository contains a classification model to identify spam messages a database of SMS.

Spam Classification

This project aims to classify a set of messages into spam or not spam (also called ham) using a Naive Bayes Classifier algorithm.

The open-source data set is extracted from machine learning repository archives @ UCI SMS Spam Collection.

This interactive Python notebook walkthroughs the cross-industry standard data science practices (CRISP-DM) utilized to create, test, and optimize a naive Bayes classification algorithm.

This notebook section discusses the end-to-end methodology used to create this classifier.

EDA : Exploratory Data Analysis
Cross Validation : Splitting Training and Test Data
Data Cleaning : Natural Language Processing
Model Development : Naive Bayes Classifier
Testing Model Performance

#EDA : Exploring the Dataset

Key Highlights

The data set is sourced from machine learning repository archives @ UCI SMS Spam Collection, made available by Tiago Almeida and Jos Hidalgo. It is a classic vanilla use case for learning Naive Bayes Classifier from scratch.
The dataset is contains 2 key pieces of information of 5572 text messages. One is is the text message itself (SMS) and another indicator variable to identify whether it is a spam or a ham. (Label).
Label is our outcome variable and SMS is our predictor variables. We have to utilise some standard cleaning and feature engineering practises to extrapolate more information from the only predictor Variable available to us.

Given the above data we want to predict if a given sms is spam or not (aslo called ham). One way to think about this problem is using Bayes' Theorem. In terms of probability we want to find out the $P(Spam | SMS)$

P(Spam | SMS) = {P(Spam) /P(SMS|Spam) } {P(Spam) P(SMS|Spam) + P(Ham) P(SMS|Ham)} where $P(Spam)$ and $P(Ham)$is the prior probability of an sms being spam, which can be directly calculated from the count of spam/ham messages by the count of total messages. This is calculated below.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
SMSSpamCollection		SMSSpamCollection
naive_bayes_spam_classifier.ipynb		naive_bayes_spam_classifier.ipynb
readme		readme
smsspamcollection.zip		smsspamcollection.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

SMSSpamCollection

SMSSpamCollection

naive_bayes_spam_classifier.ipynb

naive_bayes_spam_classifier.ipynb

readme

readme

smsspamcollection.zip

smsspamcollection.zip

Repository files navigation

sms_spam_classification

Spam Classification

Key Highlights

About

Releases

Packages

Languages

License

debopriyobhowmick/sms_spam_classification

Folders and files

Latest commit

History

Repository files navigation

sms_spam_classification

Spam Classification

Key Highlights

About

Topics

Resources

License

Stars

Watchers

Forks

Languages