Skip to content

dishaumarwani/Yelp_Data_Analysis

Repository files navigation

Yelp Data Analysis

The project aims to analyze Yelp Data in Spark using PySpark and SparkSQL.

All the analysis have been performed on Databricks platform and can also be simulated in anywhere by just changing the import statements.

Please find the data here: https://www.yelp.com/dataset/challenge

Download .json files. The folder contains business.json and reviews.json along with other files which have been used for analysis in this project.

There are basically 4 notebooks:

  1. YelpAnalysisSpark.ipynb which adresses loading of Json data, working with hierarchical data and performing geospacial analysis.
  2. YelpAnalysisSpark.ipynb attempts to combine two files which are a part of relational database.
  3. YelpAnalysisMongoDB1 attempts to analyze data using MongoDB framework using PyMongo API.
  4. GeoSpacialAnalysisMongoDB.ipynb which aims to perpare data for Geospacial Analysis in MongoDB.

Please mail me on dhu200@nyu.edu if you have any queries.

Releases

No releases published

Packages

No packages published