Spam-Text-Classification

This project aims to classify text messages as spam or ham (non-spam) using machine learning models (Naive Bayes, Random Forest, Decision Tree, K Nearest Neighbor) and Natural language techniques.

Installation

Clone the repository:

git clone https://github.com/Elilora/spam-text-classification.git
cd spam-text-classification

Install the required libraries:

pip install numpy pandas seaborn matplotlib scikit-learn xgboost shap

DATASET

The dataset used is gotten from Kaggle https://www.kaggle.com/team-ai/spam-text-message-classification with 5157 unique values

Data Preprocessing

Data preprocessing steps include calculating the length of messages, exploring basic statistics, and visualizing the distribution of message categories.

Word Cloud for Spam and Ham Messages

Word clouds are generated to visualize the most common words in spam and ham messages.

Data Transformation

Text data is preprocessed by removing special characters, converting text to lowercase, tokenizing, stemming, removing stopwords, and expanding contractions.

Model Building

The following models were used to predict stroke occurrence:

Naive Bayes Classifier
K-Nearest Neighbors Classifier
Decision Tree Classifier
Random Forest Classifier

Steps

Split the data into training and testing sets (70% train, 30% test).
Train the models on the training set.
Evaluate the models on the testing set using accuracy and F1-score metrics.
Generate classification reports and confusion matrices.

Results

Various machine learning models are trained and evaluated using the processed data.

Naive Bayes Classifier

The Multinomial Naive Bayes classifier is trained and evaluated for text classification.

Accuracy: 97%
F1-score: 99%

K-Nearest Neighbors Classifier

The K-Nearest Neighbors classifier with n_neighbors=2 is trained and evaluated for text classification.

Accuracy: 93%
F1-score: 96%

Decision Tree Classifier

The Decision Tree classifier is trained and evaluated for text classification.

Accuracy: 97%
F1-score: 98%

Random Forest Classifier

The Random Forest classifier is trained and evaluated for text classification.

Accuracy: 98%
F1-score: 99%

Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
SPAM text message 20170820 - Data.csv		SPAM text message 20170820 - Data.csv
spam-text-classification.ipynb		spam-text-classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SPAM text message 20170820 - Data.csv

SPAM text message 20170820 - Data.csv