Yelp Data Analysis. This is part of my blog post multipart series. Find more details about the data analysis and implementation details there.
There's also a dockerized version available.
With this dataset, we shall explore these 4 (10 actually 😉) questions.
- How did generosity change over time? How does it compare by reviews' growth ?
- How does it vary by region / sex ?
- Is there any relationship between the reviews and tips left by any given user?
- Is it different when looked from a business' perspective ?
- How did gender diversity change over time?
- How is it related to the contribution of reviews & tips?
- Predict the rating given by a user just from his/her review.
- In other words, perform a fine grained sentiment classification.
- Install the packages:
pandas, seaborn, keras, wordcloud, scikit-learn, pyspark
- Download the Yelp dataset to your local system
- Keep the extracted JSON files in the folder named
./data/yelp_dataset
.
- Download glove embeddings to your local system and extract them.
- We need just
glove.6B.100d.txt
... you can also use others if you wish to. Move them to./data/embeddings
.
- Download the baby names dataset from here.
- Extract it and place the
names
folder under./data/
directory.
Run the notebook and/or follow the blog post.
(You should be able to see the visualizations in the above notebook if rendered correctly. Open it with nbviewer here, in case if it doesn't.)