GoodReads

This project will build a recommendation system for GoodReads. The dataset is available on https://sites.google.com/eng.ucsd.edu/ucsdbookgraph/home In our work, we only use the graphic subdata as it's comparitively small.

Currently, we have done the following:

Use a singular value decomposition based model called matrix factorization (https://datajobs.com/data-science-repo/Recommender-Systems-[Netflix].pdf) to create a baseline model. We use keras functional api to implement gradient descent, and our model includes bias and regularization terms for both book and users.
We use word2vec (skip-gram) schema (https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) to build user and book embeddings (100 dimension for each) in keras. For each user, we use one of his rated book as predictor and try to predict the other books he read. And we follow this fashion for each of his readed book. Hence, we consider the set of users' rated book is 'sentence' in the original paper. The book vector shows high correlation with reality: for example, the most similiar book of a japanese manga are also japense manga.
We use a deep and wide neural network architecture (https://arxiv.org/abs/1606.07792) which includes interaction between user and book to predict the rating score.
Our proposed model improve the mse from 2.6 to 1.2.

Next to do:

user part of speech tag to extract all adjectives in reviews
use topic modeling to see patterns exist in different books and detect whether there's group of semantics

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Data		Data
book_user_embeddings		book_user_embeddings
matrix_factorization_baseline		matrix_factorization_baseline
recommender_system		recommender_system
README.md		README.md
base_model.py		base_model.py
part_of_speech_recognition.ipynb		part_of_speech_recognition.ipynb

Provide feedback