yelp-analysis

Yelp Data Analysis. This is part of my blog post multipart series. Find more details about the data analysis and implementation details there.

There's also a dockerized version available.

With this dataset, we shall explore these 4 (10 actually 😉) questions.

How did generosity change over time? How does it compare by reviews' growth ?

How does it vary by region / sex ?

Is there any relationship between the reviews and tips left by any given user?

Is it different when looked from a business' perspective ?

How did gender diversity change over time?

How is it related to the contribution of reviews & tips?

Predict the rating given by a user just from his/her review.

In other words, perform a fine grained sentiment classification.

Install & Setup

Install the packages: pandas, seaborn, keras, wordcloud, scikit-learn, pyspark
Download the Yelp dataset to your local system

Keep the extracted JSON files in the folder named ./data/yelp_dataset.

Download glove embeddings to your local system and extract them.

We need just glove.6B.100d.txt... you can also use others if you wish to. Move them to ./data/embeddings.

Download the baby names dataset from here.

Extract it and place the names folder under ./data/ directory.

Explore

Run the notebook and/or follow the blog post.

(You should be able to see the visualizations in the above notebook if rendered correctly. Open it with nbviewer here, in case if it doesn't.)