Sentiment analysis of women's clothes reviews

General info

The project concerns sentiment analysis of clothes reviews to determine whether the product is recommended or not. Moreover it includes data analysis, data preparation, text mining and build model by using virtue different machine learning and deep learning algorithms. The project also includes EDA analysis and sentiment analysis by using Vader and TextBlob methods.

I have created the web application in Flask based on this project and code for this app is available here.

Dataset

The dataset comes from Woman Clothing Review that can be find at Kaggle.

Motivation

The text classification relies to assigned documents into one or many categories. One of the most useful text classification is Sentiment analysis. Is aimed to determine the user's point of view about given product, topic or service. The reviews of products users play a very important role in the e-commerce industry. The product quality can be measured by review given by customer. A new client can decided whether it buy or not given product based on previously reviews.

Project includes:

EDA analysis - EDA_sentiment.ipynb
Sentiment analysis with ML algorithms - Sentiment_analysis.ipynb
Sentiment analysis with Glove embeddings and LSTM - Sentiment_glove.ipynb
Sentiment analysis with Vader and TextBlob - Sentiment_vader.ipynb
Sentiment analysis based on customer rating - Sentiment_two.ipynb
Python script to clean data - clean_data.py
Python script to use sentiment model with smote method - sentiment_model.py
data - data used in the project.

Summary

The goal of the project was sentiment analysis of clothes reviews by using the various methods and determine possible recommendation. I built a model to predict if the review is positive or negative. I have started with the data analysis, data pre-processing and text mining, which cover change text into tokens, remove punctuation, numbers, stop words and normalization them by using lemmatization. I have also used SMOTE method to resolve the imbalance problem in data. In the first approach I used bag of words model to convert the text into numerical feature vectors. To get more accurate predictions I have applied five different classification algorithms like: Logistic Regression, Naive Bayes, Stochastic Gradient Descent, Random Forest and Ada Boosting as well. Finally I got the best accuracy of 89 % for Logistic Regression model.

In the second I have used a a pre-trained Glove word embeddings with Bidirectional LSTM Neural Network to get more accurate predictions. I achieved an accuracy on the test set equal to 88 % and it is a very good result. From the experiments one can see that the both tested approaches give an overall high accuracy and similar results for the problem.

Model	Embeddings	Accuracy
Logistic Regression	BOW +TF-IDF	0.89
Bidirectional LSTM	Glove embeddings	0.88
Random Forest	BOW +TF-IDF	0.87
Naive Bayes	BOW +TF-IDF	0.87
Stochastic Gradient Descent	BOW +TF-IDF	0.86
AdaBoosting	BOW +TF-IDF	0.84

In another approach I also used The Vader and TextBlob methods to analyze the sentiment of reviews and to compare their relevance.

Technologies

The project is created with:

Python 3.6
libraries: pandas, numpy, scikit-learn, keras, tensorflow, nltk, imbalanced-learn, wordcloud, Vader, TextBlob.

Running the project:

To run this project use Jupyter Notebook or Google Colab.

You can run the scripts in the terminal:

clean_data.py
sentiment_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

EDA_sentiment.ipynb

EDA_sentiment.ipynb

README.md

README.md

Sentiment_analysis.ipynb

Sentiment_analysis.ipynb

Sentiment_glove.ipynb

Sentiment_glove.ipynb

Sentiment_two.ipynb

Sentiment_two.ipynb

Sentiment_vader.ipynb

Sentiment_vader.ipynb

clean_data.py

clean_data.py

requirements.txt

requirements.txt

sentiment_model.py

sentiment_model.py

Repository files navigation

Sentiment analysis of women's clothes reviews

General info

Dataset

Motivation

Project includes:

Summary

Technologies

The project is created with:

Running the project:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
data		data
EDA_sentiment.ipynb		EDA_sentiment.ipynb
README.md		README.md
Sentiment_analysis.ipynb		Sentiment_analysis.ipynb
Sentiment_glove.ipynb		Sentiment_glove.ipynb
Sentiment_two.ipynb		Sentiment_two.ipynb
Sentiment_vader.ipynb		Sentiment_vader.ipynb
clean_data.py		clean_data.py
requirements.txt		requirements.txt
sentiment_model.py		sentiment_model.py

aniass/Sentiment-analysis-reviews

Folders and files

Latest commit

History

Repository files navigation

Sentiment analysis of women's clothes reviews

General info

Dataset

Motivation

Project includes:

Summary

Technologies

The project is created with:

Running the project:

About

Topics

Resources

Stars

Watchers

Forks

Languages