Authors : Zihe Wang (zw2624), Di Ye (dy2404), Ziyao Zhang (zz2583), Yinhe Lu (yl4372)
Please look at the requirements file to learn about dependencies and useful packages to reproduce our results.
Directory Code contains developing codes in jupyter notebook or python file format. In the Spark folder in this directory contains the algorithm developed using pyspark.
Directory Notebooks contains generated PDFs of jupyter notebooks we used to produce our report.
Directory Data contains the small dataset from the full dataset.
Our business objective is to recommend movies to users, and we choose those users who have already watched at least a few movies on the platform. We want to make sure that movies that are recommended using our algorithm are interested by the users.
Full dataset can be found here: Full Movielens Dataset
We then subsampled a smaller dataset from the full dataset to work on.
Please see final_report.pdf
Recommender Systems: The Textbook, By Charu C. Aggarwal
Lecture Notes from IEORE 4571, by Dr. Brett Vintch, Columbia University
Spark MLlib Tutorial: https://spark.apache.org/docs/latest/index.html
Scikit-Surprise User Guide: https://surprise.readthedocs.io/en/stable/getting_started.html
Evan Casey's Github Repo: https://github.com/evancasey/spark-knn-recommender/