Movies Data Analysis

Project Description

The aim my project was to use Sqoop to collect data and to store the data to HDFS. Process various insights while analyzing the data using Hive. The database consist of movies_dataset table, I will import movies_dataset table, which contains about movies id, names, year, rating and duration.

Technologies Used

HDP 2.6.5
Hive 2.1.0
Hadoop 2.7.3
Sqoop 1.4.6
MySQL 10.5
Git/GitHub

Features

List of features ready and TODOs for future development

1.Find the number of movies released between 1950 and 1960.

hive>SELECT count(*) FROM moviedata > WHERE year BETWEEN 1950 AND 1960;

OUTPUT - 547

2.Find the number of movies having rating more than 4.

hive> SELECT count(*) FROM moviedata > WHERE rating > 4.0;

OUTPUT - 897

3.Find the movies whose rating are between 3 and 4.

hive> SELECT name FROM moviedata > where rating > 3.0 and rating < 4.0;

OUTPUT - # Long List of Movies

4.Find the number of movies with duration more than 2 hours (7200 second).

hive> SELECTcount (*) FROM moviedata > WHERE duration > 7200;

OUTPUT - 641

5.Find the list of years and number of movies released each year.

hive> SELECT year, count (*) FROM moviedata > GROUP BY (year);

OUTPUT - Year Number_of_movies xxxx xxxxxxxx xxxx xxxxxxxx xxxx xxxxxxxx xxxx xxxxxxxx xxxx xxxxxxxx xxxx xxxxxxxx xxxx xxxxxxxx xxxx xxxxxxxx xxxx xxxxxxxx

6.Find the total number of movies in the dataset.

hive> SELECT count (*) FROM moviedata;

OUTPUT - 49590

Getting Started

All the operations below are for Windows OS.

Make sure that the virtualization is enabled for your system from the BIOS.
Install VMware Workshation Player.
Download and insatll Hortonworks HDP 2.6.5.
Get everything up and runninng.
Connect to the system either through the webshell/OpenSSH/PuTTY.
Upload all the required files into the local system or clone this repo..

Usage

All the queries used in thge project can be fount here

To run the hive queries, use the Hive shell

Use the Sqoop commands to load the data into MySQL sqoop export --connect jdbc:mysql://localhost:3306/<DB Name> -username <user> -password <pass> -table <table_name> -m 1 --export-dir /path/to/dir --input-fields-terminated-by ',';

To run the MySQL queries, use the MySQL shell mysql -u<username> -p<password>

License

This project uses the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Images		Images
LICENSE		LICENSE
MOVIE DATA ANALYSIS.pptx		MOVIE DATA ANALYSIS.pptx
README.md		README.md
commands.docx		commands.docx
movies_dataset.txt		movies_dataset.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

LICENSE

LICENSE

MOVIE DATA ANALYSIS.pptx

MOVIE DATA ANALYSIS.pptx

README.md

README.md

commands.docx

commands.docx

movies_dataset.txt

movies_dataset.txt

Repository files navigation

Movies Data Analysis

Project Description

Technologies Used

Features

Getting Started

Usage

License

About

Releases

Packages

License

Niroshamurugan/Revature-project--01

Folders and files

Latest commit

History

Repository files navigation

Movies Data Analysis

Project Description

Technologies Used

Features

Getting Started

Usage

License

About

Topics

Resources

License

Stars

Watchers

Forks