Netflix Recommendation System

1. Business Problem

1.1 Problem Description

Netflix is all about connecting people to the movies they love. To help customers find those movies, they developed world-class movie recommendation system: CinematchSM. Its job is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies. Netflix use those predictions to make personal movie recommendations based on each customer’s unique tastes. And while Cinematch is doing pretty well, it can always be made better.

Now there are a lot of interesting alternative approaches to how Cinematch works that netflix haven’t tried. Some are described in the literature, some aren’t. We’re curious whether any of these can beat Cinematch by making better predictions. Because, frankly, if there is a much better approach it could make a big difference to our customers and our business.

Credits: https://www.netflixprize.com/rules.html

1.2 Problem Statement

Netflix provided a lot of anonymous rating data, and a prediction accuracy bar that is 10% better than what Cinematch can do on the same training data set. (Accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings.)

1.3 Sources

https://www.netflixprize.com/rules.html
https://www.kaggle.com/netflix-inc/netflix-prize-data
Netflix blog: https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429 (very nice blog)
surprise library: http://surpriselib.com/ (we use many models from this library)
surprise library doc: http://surprise.readthedocs.io/en/stable/getting_started.html (we use many models from this library)
installing surprise: https://github.com/NicolasHug/Surprise#installation
Research paper: http://courses.ischool.berkeley.edu/i290-dm/s11/SECURE/a1-koren.pdf (most of our work was inspired by this paper)
SVD Decomposition : https://www.youtube.com/watch?v=P5mlg91as1c

1.4 Real world/Business Objectives and constraints

Objectives:

Predict the rating that a user would give to a movie that he ahs not yet rated.
Minimize the difference between predicted and actual rating (RMSE and MAPE)

Constraints:

Some form of interpretability.

2. Machine Learning Problem

2.1 Data

2.1.1 Data Overview

Get the data from : https://www.kaggle.com/netflix-inc/netflix-prize-data/data

Data files :

combined_data_1.txt
combined_data_2.txt
combined_data_3.txt
combined_data_4.txt
movie_titles.csv

The first line of each file [combined_data_1.txt, combined_data_2.txt, combined_data_3.txt, combined_data_4.txt] contains the movie id followed by a colon. Each subsequent line in the file corresponds to a rating from a customer and its date in the following format:

CustomerID,Rating,Date

MovieIDs range from 1 to 17770 sequentially. CustomerIDs range from 1 to 2649429, with gaps. There are 480189 users. Ratings are on a five star (integral) scale from 1 to 5. Dates have the format YYYY-MM-DD.

2.2 Mapping the real world problem to a Machine Learning Problem

2.2.1 Type of Machine Learning Problem

For a given movie and user we need to predict the rating would be given by him/her to the movie. 
The given problem is a Recommendation problem. It can also be seen as a Regression problem

2.2.2 Performance metric

Mean Absolute Percentage Error: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
Root Mean Square Error: https://en.wikipedia.org/wiki/Root-mean-square_deviation

2.2.3 Machine Learning Objective and Constraints

Minimize RMSE.
Try to provide some interpretability.

3. Getting Started

Start by downloading the project and run "Netflix_Recommendation.ipynb" file in ipython-notebook.

4. Prerequisites

You need to have installed following softwares and libraries before running this project.

Python 3: https://www.python.org/downloads/
Anaconda: It will install ipython notebook and most of the libraries which are needed like sklearn, pandas, seaborn, matplotlib, numpy and scipy: https://www.anaconda.com/download/

5. Libraries

scikit-learn: scikit-learn is a Python module for machine learning built on top of SciPy.
- pip install scikit-learn
- conda install -c anaconda scikit-learn
scipy: SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering.
- pip install scipy
- conda install -c anaconda scipy

6. Authors

• Manish Vishwakarma - Complete work

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Netflix_Recommendation.ipynb		Netflix_Recommendation.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Netflix_Recommendation.ipynb

Netflix_Recommendation.ipynb

README.md

README.md

Repository files navigation

Netflix Recommendation System

1. Business Problem

1.1 Problem Description

1.2 Problem Statement

1.3 Sources

1.4 Real world/Business Objectives and constraints

2. Machine Learning Problem

2.1 Data

2.1.1 Data Overview

2.2 Mapping the real world problem to a Machine Learning Problem

2.2.1 Type of Machine Learning Problem

2.2.2 Performance metric

2.2.3 Machine Learning Objective and Constraints

3. Getting Started

4. Prerequisites

5. Libraries

6. Authors

About

Releases

Packages

Languages

manish-vi/netflix_movie_recommendation

Folders and files

Latest commit

History

Netflix_Recommendation.ipynb

Netflix_Recommendation.ipynb

README.md

README.md

Repository files navigation

Netflix Recommendation System

1. Business Problem

1.1 Problem Description

1.2 Problem Statement

1.3 Sources

1.4 Real world/Business Objectives and constraints

2. Machine Learning Problem

2.1 Data

2.1.1 Data Overview

2.2 Mapping the real world problem to a Machine Learning Problem

2.2.1 Type of Machine Learning Problem

2.2.2 Performance metric

2.2.3 Machine Learning Objective and Constraints

3. Getting Started

4. Prerequisites

5. Libraries

6. Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages