GitHub - rafaelaqfc/Sentiment-Analysis-of-Medication-Reviews-Project: This is one of my projects about NLP and sentiment analysis of medication reviews.

Sentiment Analysis of Medication Reviews

Introduction

Sentiment Analysis is an Natural Language Processing (NLP) application that classifies a text document or corpus’s emotional or sentimental tone, language, expression or point of view. Most of the time, emotions or attitudes can be positive, negative, somewhat positive and negative, mixed and so on. Therefore, sentiment analysis can help us to pick up and interpret the discursive patterns found in the language in order to understand and predict what are the evaluations and representations people are giving about a customer support, item bought, medication that has been taking, feedback analysis, market research, etc. In addition, classification tasks as this one can also give us clue about the audience by analyzing the demographics of the users.

Project Goals

The major goal of this project is to explore a dataset of medication reviews by analyzing the relationship between medication reviews, ratings given by their users, medications popularity throughout time, and hypothesis-testing about the dataset distribution, among others. Similarly, it has the goal to create a machine learning model to predict the emotion or sentiment addressed in the users' reviews or comments. For that, it was used NLP techniques and different machine learning algorithms, such as Random Forest Classifier, Naive Bayes Classifier and Long-Short Term Memory (LSTM) to create different models.

Hypothesis

Project Steps

Data gathering/loading
Data exploration (EDA)
Text preprocessing
Feature engineering
Model building, evaluation and hyperparameter tunning
Model deployment

Expectations from this Project

This project is organized in modules and notebooks. Similarly, they are suplemented with theory, comments and coding cells. In regards of the repo organization, this repository is divided into the modules below:

Notebook 1 about data exploration (EDA) called notebook_1_data_exploration.ipynb;
Notebook 2 about data preprocessing called notebook_2_data_preprocessing.ipynb;
Notebooks 3, 4 and 5 about feature engineering called notebook_3_feature_engineering.ipynb, notebook_4_feature_engineering.ipynb and notebook_5_feature_engineering.ipynb;
Notebook 6 and 7 about Random Forest Classifier modeling and testing called notebook_6_data_modeling_with_random_forest_classifier.ipynb and notebook_7_data_testing_with_random_forest_classifier.ipynb;
Notebook 8 and 9 about Naive Bayes Classifier modeling and testing called notebook_8_data_modeling_with_multinomial_naive_bayes.ipynb and notebook_9_data_testing_with_multinomial_naive_bayes.ipynb;
Notebook 10 about an ensemble model composed of Random Forest Classifier and Naive Bayes Classifier called notebook_10_data_modeling_with_an_ensemble_model.ipynb;
Notebook 11 about data modeling with Word2Vec called notebook_11_data_modeling_with_word2vec.ipynb (in progress);
Notebook 12 about data modeling with Long-Short Term Memory (LSTM) called notebook_12_data_modeling_with_LSTM.ipynb (in progress);
Under models folder, the models mnbc_model.joblib and rfc_model.pkl;
Uner app folder, the model ensemble_model.pkl, the model deployed in and API flask app called app.py, and the model testing file called reviews_test_app_with_python.ipynb.

Challenges

During the execution of this project, many challenges were faced, starting with the dataset. As we know, an unbalanced training data can lead a machine learning algorithm to perform bias classifications. Thus, since it was not balanced, many strategies had to be employed in order to bounce the training data.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
app		app
images		images
.gitattributes		.gitattributes
README.md		README.md
notebook_10_data_modeling_with_an_ensemble_model.ipynb		notebook_10_data_modeling_with_an_ensemble_model.ipynb
notebook_12_data_modeling_with_LSTM.ipynb		notebook_12_data_modeling_with_LSTM.ipynb
notebook_1_data_exploration.ipynb		notebook_1_data_exploration.ipynb
notebook_2_data_preprocessing.ipynb		notebook_2_data_preprocessing.ipynb
notebook_3_feature_engineering_1.ipynb		notebook_3_feature_engineering_1.ipynb
notebook_4_feature_engineering_2.ipynb		notebook_4_feature_engineering_2.ipynb
notebook_5_feature_engineering_3.ipynb		notebook_5_feature_engineering_3.ipynb
notebook_6_data_modeling_with_random_forest_classifier.ipynb		notebook_6_data_modeling_with_random_forest_classifier.ipynb
notebook_7_data_testing_with_random_forest_classifier.ipynb		notebook_7_data_testing_with_random_forest_classifier.ipynb
notebook_8_data_modeling_with_multinomial_naive_bayes.ipynb		notebook_8_data_modeling_with_multinomial_naive_bayes.ipynb
notebook_9_data_testing_with_multinomial_naive_bayes.ipynb		notebook_9_data_testing_with_multinomial_naive_bayes.ipynb

rafaelaqfc/Sentiment-Analysis-of-Medication-Reviews-Project

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of Medication Reviews

Introduction

Project Goals

Hypothesis

Project Steps

Expectations from this Project

Challenges

Future Goals

About

Topics

Resources

Stars

Watchers

Forks

Languages