Improving the Yelp Review Experience by Stardardizing Reviewer Sentiment

Team:

Angela Detweiler
Hee Kang
Alexander Lam
Behesteh Mostaghni

Dataset link: Yelp Dataset in Kaggle with a focus on Restaurants- https://www.kaggle.com/yelp-dataset/yelp-dataset

Problem: When you are researching restaurants on Yelp, do you look at the star rating or do you read the review? Do you look at both? Given that reviews are highly subjective, and star ratings can be influenced by various aspects of business performance, can we use machine learning to standardize the interpretation of reviews?

Goal: Our goal is to apply Natural Language Processing (NLP) and other features from the Yelp reviews into a model that outputs a new 5-star-rating, so that there is less discrepancy between reviews and star ratings. In order to make our model more robust, we will also incorporate new user star-ratings based on reviews read (meaning that someone who did not write the review gives a star-rating based on the review text alone) into our model so that it better reflects the review sentiment.

Hypothesis: We hypothesize that automating star ratings based on NLP of restaurant reviews will improve Yelp review experience by normalizing reviewer sentiment.

ML algorithms:

Naive Bayes
k-NN
K-Means
LSTM
N-Gram
TD-IDF
Linear Regression

Libraries:

Numpy
Scipy
Scikit_Learn
Pandas
Matplotlib
NLTK
PySpark
Keras
HTML/ CSS/ Bootstrap
Tableau

Sentiment Analysis Lexicon:

AFINN
VADER

Project components, steps, analyses, and final products:

Components and final products
- ML algorithms
- Game (user rates reviews)/HTML page
- Database with game data to be reincorporated into model
- Model output/vizualizations in JN
Steps and analyses
- Select and clean restaurant/food category data from Yelp
- Cluster reviews into 5 categories (5 star-rating)
- Use NLP to train model
- Test Yelp rating/review data (user inputs both)
- Incorporate new user star-rating from game into the model
- Other...

Questions/Topics of Interest:

(ML) Are yelp reviews highly correlated to restaurant quality (based on star rating) ? In other words, are the reviews useful?
What percentage of reviews talk about the quality of the food versus the quality of the service?
Correlate photo captions to reviews.
(ML) Is there consistency in review style for a particular user?
Distribution of ratings (stars)- Is it a bell curve or does it peak at both extremes (1 and/or 5 star ratings)?
(ML) Is there a pattern to Yelp Elite status? Elite vs non-elite.
Patterns in ratings/review sentiment correlated to business attributes? (Outdoor seating, live music, etc.)
Patterns in 'useful' reviews?
Use NLP to train model, test then have HUMANS rate as well and compare the difference

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
AFINN		AFINN
AFINN_analyses_AMD		AFINN_analyses_AMD
Data		Data
Hee_pics		Hee_pics
Net_files		Net_files
export_filtered_reviews_files		export_filtered_reviews_files
kMeans_files		kMeans_files
static		static
templates		templates
Create_dataframe_with_50_100reviews.ipynb		Create_dataframe_with_50_100reviews.ipynb
Lam_Contribution.pptx		Lam_Contribution.pptx
NaiveBayes_Yelp_30K_BM.ipynb		NaiveBayes_Yelp_30K_BM.ipynb
README.md		README.md
Yelp_ML_FinalPresentation.pdf		Yelp_ML_FinalPresentation.pdf
Yelp_explore.py		Yelp_explore.py
combine_restaurant_review.py		combine_restaurant_review.py
export_filtered_reviews.md		export_filtered_reviews.md
game_flask.py		game_flask.py
kMeans.md		kMeans.md
restaurants.csv		restaurants.csv

minas26902/Improving_Yelp_Ratings_with_ML

Folders and files

Latest commit

History

Repository files navigation

Improving the Yelp Review Experience by Stardardizing Reviewer Sentiment

Team:

About

Topics

Resources

Stars

Watchers

Forks

Languages