Skip to content

happylittlebunny/Yelp-User-Pattern-And-Recommender-System

Repository files navigation

Yelp Toronto User Pattern Analysis and Recommender System

In this project, we used yelp challenge data. Our goal is to analyze user pattern and build a recommender system for users. We focus on the users who have rated businesses in Great Toronto Area. We processed data and built a web application tool to demonstrate the results by using big data technologies.

Demo URL:

IMAGE ALT TEXT HERE

Repository Description

  • data/db
    Contains mongoDB data used in this application
  • data_preprocess_code
    Contains codes that accomplish following tasks:
    • Collect data in the Great Toronto Area
    • Convert string id to int id
    • Process data for building user-business relationship
    • Process data for finding user compliments and votes.
    • Process data for analyzing user reviews by using TF*IDF
    • Process data for gathering user rated businesses, identifying new categories and re-assigning new categories
    • Train the recommendation system
  • yelpserver
    Contains the visualization web application tool that shows all our results.
    The web application does following tasks:
    • Query data from mongoDB ( yelpserver/app.py, db.py )
    • Populate data to web frontend ( yelpserver/app.py )
    • Visualize data at web frontend ( yelpserver/static/js/recommend.js,vs.js , yelpserver/templates/index.html,user.html, recommend.html)
    • Construct user id and new business id to feed to pre-trained recommender model ( yelpserver/app.py )
    • Invoke spark to load pre-trained model and make prediction (yelpserver/recommenderSystem.py)

Application Setup:

  • Install Mongo DB server locally
  • Git clone repository
  • Under the repository folder, start mongo db with data
mongod --dbpath data/db
  • CD to yelpserver folder
  • Start web server by using command:
spark-submit server.py
  • In browser, type 0.0.0.0/5000

Technologies we used:

  • Data Processing: Spark, Spark SQL, Spark MLlib
  • Web backend: flask, spark, cherrypy
  • Web frontend: D3.js, DC.js, crossfilter.js, Leaflet.js, keen.js, bootstrap v4
  • Data Storage: MongoDB
  • Other tools/technologies: Gephi for user-business relationship graph, yelp GraphQL for addition data query

Releases

No releases published

Packages

No packages published

Languages