Skip to content

stat-learning/group-2

Repository files navigation

Exploring Yelp Data

Group Members

Josh Dey, Emmett Powers, and Gio Ramirez

Project Overview & Questions

Yelp provides the public with access to a massive dataset including millions of reviews and hundreds of thousands of businesses in 10 metropolitan areas. In our project, we plan to explore the Yelp dataset and construct models to address one of the following questions:

Proposal 1:

Can we build a model to predict a restaurant's Yelp rating based on their price range, cuisine, and restaurant type, etc? Since ratings are continuous, this would be a predictive regression model.

Proposal 2:

We are interested in examining the language used in yelp reviews. For this, we would construct some sort of natural language processing model that sorts through words in reviews and classifies the attitude of reviews accordingly (positive, negative, etc). We'd have to train model a vocabulary of sentiment, and it could then apply this to test data to classify the review as positive, negative, or neutral.

Proposal 3:

Can we find a presence of bias in yelp reviews that is potentially racialized (Mexican/Chinese/Thai) and/or based on other restaurant characteristics? This would be an inferential exploratory data analysis, we could run a regression to control for all other variables and find the presence of bias or we could incorporate the use of sentiment analysis to identify the possible existence of these trends.

Project dates

  1. Pre-proposal: Sunday 11/3 11:59 pm
  2. Group proposal: Thursday 11/7 11:59 pm
  3. Technical Report (Exploratory Data Analysis section only) : Wednesday 11/20 before class
  4. Technical Report: Monday 12/2 before class
  5. Final Presentations: Wednesday 12/4 and Monday 12/9 in class

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published