Sentiment Analysis and Spam Classification

📔 Table of Contents

ABOUT THE PROJECT
DATASETS
DEEP LEARNING
FLASK
- Database
- Multi-Language
HOW TO RUN

🌟 About the Project

This study is a Natural Language Processing project which is one of the artificial intelligence applications. This project was carried out in order to analyze the sentiment from Twitter comments and to understand whether the text message (SMS) received on the phone is unsolicited message (spam). Later, it was integrated into the web and a more understandable and simple graphical interface was created for the users.

📷 Screenshots

👾 Tech Stack

Client

HTML
CSS
JavaScript

Server

Python - Flask

Database

MySQL

🎯 Features

Prediction of the sentiment of the given sentences
Classification of SMS as spam or ham
You can create a new dataset (via User Sentences)
Recording the messages sent from the user to the database
Vanilla language switcher
Searching for a specific word in datasets

💿 Datasets

Two different data sets were used in the project. The first is Sentiment140, which is used for sentiment analysis. Sentimen140 is consist of 1.6 million tweets and labelled as "positive" or "negative". The second is the SMS Spam Collection Dataset used for sms classification. SMS Spam Collection Dataset contains almost 5.6k English SMS. Also, this dataset is labeled as two classes too (Spam - Ham). The spam class contains about 5k of data.

⚠️ If you want to examine the dataset, please do not forget to add the datasets to the dataset folder.

🤖 Deep Learning

In this section, topics such as model training and preprocessing will be discussed. The Sentiment dataset has been cleaned of some special characters like "@, http, 0-9". In addition, the stop words have been removed. Then, Word2vec was trained from these tokens. After that, these texts are pad_sequenced with a maximum length of 300. After the embedding layer was created, the vanilla LSTM model was builded. The final accuracy of the model is 79.10%. The model architecture can be seen in the figure below.

Model Architecture (Image by Author)

The Spam dataset was trained with Multinomial Naive Bayes algorithm is a Bayesian learning approach popular in Natural Language Processing (NLP).

💻 Flask

The Web Application consists of 5 pages which can be seen in the gif above. These are Home, Project, About, Contact and finally Dataset page.

🗂️ Database

Users can submit their opinions, suggestions or problems about the project after filling out the form on the Contact page. Some information in the form is recorded in the database.

SQL query that saves data to MySQL database:

CREATE TABLE contact (
	id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
	name VARCHAR(30) NOT NULL,
	email VARCHAR(30) NOT NULL,
	company_name VARCHAR(50),
	message VARCHAR(200),
	reg_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
	);

🗺️ Multi-Language

Web App offers you two different language support. One is in English and the other is in Turkish. This option is made with vanilla Javascript and is open for development.

🏃 How to Run

1.Fork this repository.

git clone https://github.com/MelihGulum/Sentiment-Analysis-and-Spam-Classification.git

2.Load the dependencies of the project

pip install -r requirements.txt

3.Now you can run project.

flask --app app.py --debug run

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
User Sentences		User Sentences
datasets		datasets
static		static
templates		templates
README.md		README.md
Sentiment_Analysis.ipynb		Sentiment_Analysis.ipynb
app.py		app.py
requirements.txt		requirements.txt
tokenizer.pkl		tokenizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User Sentences

User Sentences

datasets

datasets

static

static

templates

templates

README.md

README.md

Sentiment_Analysis.ipynb

Sentiment_Analysis.ipynb

app.py

app.py

requirements.txt

requirements.txt

tokenizer.pkl

tokenizer.pkl

Repository files navigation

Sentiment Analysis and Spam Classification

📔 Table of Contents

🌟 About the Project

📷 Screenshots

👾 Tech Stack

🎯 Features

💿 Datasets

🤖 Deep Learning

💻 Flask

🗂️ Database

🗺️ Multi-Language

🏃 How to Run

About

Releases

Packages

Languages

MelihGulum/Sentiment-Analysis-and-Spam-Classification

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis and Spam Classification

📔 Table of Contents

🌟 About the Project

📷 Screenshots

👾 Tech Stack

🎯 Features

💿 Datasets

🤖 Deep Learning

💻 Flask

🗂️ Database

🗺️ Multi-Language

🏃 How to Run

About

Topics

Resources

Stars

Watchers

Forks

Languages