Movie Rating and Prediction Model

Objective

The objective of this project is to utilize the IMDB data set to generate Meaningful and Interesting Insights and then create a movie rating model based on average IMDB ratings and a sentiment analysis score of user tweets. And also to create an accurate Machine Learning model to predict average movie ratings based on some key features and make the system scalable by using big data technologies for data processing and then host the system on Google Cloud.

Technologies Used

Spark
Zeppelin
Jupyter
Twitter API
Google Cloud Engine
Sentiment Analysis (Text Blob)
Python
HTML5
CSS3
JavaScript

Data Processing

The first step for this model is to utilize IMDB dataset and process it. Data preprocessing is done through a series of steps, namely:

Cleaning
Normalization
Transformation
Feature extraction
Selection

For this model, IMDB dataset is used and the steps followed to extract the required data are:

Read the IMDB dataset.
Filter the data and extract only the movies. (Since the data contains series, and other sitcoms as well)
Now filter the movies on the basis of year. (Here we are taking 2000-2017)
Now read the data for directors.
Extract and flatten only the directors and then merge with the movies data set extracted above.
Now read the data for writers.
Extract and flatten only the writers and then merge with the movies data set extracted above.
Now arrange the dataset in descending order according to the movie year.

Sentiment Analysis

Here we are ranking the top 10 movies for every year based on the tweets collected and performing sentiment analysis on them. Each movie received a score and then the scores were normalised and finally, the movies are ranked based on the scores received.

Data Processing is the initial step where we filter the movies for every year from 2010 till 2017. Next filter applied is getting movies with votes more than 10000 and then get the top 10 movies for every year. Then, we perform the sentiment analysis on the list of movies and rank them based on scores obtained.

All the sentiment analysis and script running is done on the Google Cloud Platform.

Machine Learning

Here we are using Linear Regression Model and it is built using Spark ML library to predict the average rating of a movie based on some key features:

Director name
Writer Name
Run Time of the Movie
Genre of the Movie
Year of Release

Files:

The zeppelin notebook for Movie Rating Model is: Movie Rating Model.json
The zeppelin notebook for Sentiment Analysis is: Sentiment Analysis.json
The jupyter notebook for machine learning model is: Machine Learning - Movie Rating Prediction.ipynb
The python program to extract tweets and run sentiment analysis is: twittersearch.py
The output folder contains all the movies which are extracted using the Movie Rating Model.json

The datasets used are from the website: https://www.imdb.com/interfaces/

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Graphs		Graphs
Output		Output
Website		Website
.DS_Store		.DS_Store
Machine Learning - Movie Rating Prediction.ipynb		Machine Learning - Movie Rating Prediction.ipynb
Movie Rating Model.json		Movie Rating Model.json
README.md		README.md
Sentiment Analysis.json		Sentiment Analysis.json
twittersearch.py		twittersearch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graphs

Graphs

Output

Output

Website

Website

.DS_Store

.DS_Store

Machine Learning - Movie Rating Prediction.ipynb

Machine Learning - Movie Rating Prediction.ipynb

Movie Rating Model.json

Movie Rating Model.json

README.md

README.md

Sentiment Analysis.json

Sentiment Analysis.json

twittersearch.py

twittersearch.py

Repository files navigation

Movie Rating and Prediction Model

Objective

Technologies Used

Data Processing

Sentiment Analysis

Machine Learning

Files:

About

Releases

Packages

Languages

avaiyang/Movie-Rating-and-Prediction-Model

Folders and files

Latest commit

History

Repository files navigation

Movie Rating and Prediction Model

Objective

Technologies Used

Data Processing

Sentiment Analysis

Machine Learning

Files:

About

Topics

Resources

Stars

Watchers

Forks

Languages