Skip to content

samsaara/yelp-analysis

Repository files navigation

yelp-analysis

Yelp Data Analysis. This is part of my blog post multipart series. Find more details about the data analysis and implementation details there.

There's also a dockerized version available.

With this dataset, we shall explore these 4 (10 actually 😉) questions.

  1. How did generosity change over time? How does it compare by reviews' growth ?
  • How does it vary by region / sex ?
  1. Is there any relationship between the reviews and tips left by any given user?
  • Is it different when looked from a business' perspective ?
  1. How did gender diversity change over time?
  • How is it related to the contribution of reviews & tips?
  1. Predict the rating given by a user just from his/her review.
  • In other words, perform a fine grained sentiment classification.

Install & Setup

  1. Install the packages: pandas, seaborn, keras, wordcloud, scikit-learn, pyspark
  2. Download the Yelp dataset to your local system
  • Keep the extracted JSON files in the folder named ./data/yelp_dataset.
  1. Download glove embeddings to your local system and extract them.
  • We need just glove.6B.100d.txt... you can also use others if you wish to. Move them to ./data/embeddings.
  1. Download the baby names dataset from here.
  • Extract it and place the names folder under ./data/ directory.

Explore

Run the notebook and/or follow the blog post.

(You should be able to see the visualizations in the above notebook if rendered correctly. Open it with nbviewer here, in case if it doesn't.)

About

Yelp Dataset Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published