Skip to content

Latest commit

 

History

History
37 lines (23 loc) · 1.89 KB

README.md

File metadata and controls

37 lines (23 loc) · 1.89 KB

yelp-analysis

Yelp Data Analysis. This is part of my blog post multipart series. Find more details about the data analysis and implementation details there.

There's also a dockerized version available.

With this dataset, we shall explore these 4 (10 actually 😉) questions.

  1. How did generosity change over time? How does it compare by reviews' growth ?
  • How does it vary by region / sex ?
  1. Is there any relationship between the reviews and tips left by any given user?
  • Is it different when looked from a business' perspective ?
  1. How did gender diversity change over time?
  • How is it related to the contribution of reviews & tips?
  1. Predict the rating given by a user just from his/her review.
  • In other words, perform a fine grained sentiment classification.

Install & Setup

  1. Install the packages: pandas, seaborn, keras, wordcloud, scikit-learn, pyspark
  2. Download the Yelp dataset to your local system
  • Keep the extracted JSON files in the folder named ./data/yelp_dataset.
  1. Download glove embeddings to your local system and extract them.
  • We need just glove.6B.100d.txt... you can also use others if you wish to. Move them to ./data/embeddings.
  1. Download the baby names dataset from here.
  • Extract it and place the names folder under ./data/ directory.

Explore

Run the notebook and/or follow the blog post.

(You should be able to see the visualizations in the above notebook if rendered correctly. Open it with nbviewer here, in case if it doesn't.)