Skip to content

vineetdcunha/Hadoop_Ecosystem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hadoop_Ecosystem

WordCount_Python - Fetching the word count from the file.

Hive_Vehicle_data - Hive QL with vehicle data.

Hive_transform - Hive QL and using python code to transform and load the data.

Hive_Sum_transform - Lineorder data - Sum and Transformation using python

Hadoop_Stream_Average - Calculate the average using Hadoop streaming and python file.

Hadoop_Stream_Std_Dev - Calculate the standard deviation using Hadoop streaming and python file.

Hadoop_Stream_Join - Hadoop Streaming to join the Employee and Customer dataset.

Hadoop_Stream_Join_Agg - Hadoop Streaming to join and aggregate data from the Lineorder and Customer dataset.

Hadoop_Stream_Cluster - Clustering using hadoop streaming.

HBase - Creating system for Employee data

lo_pig - lo_discount_count, lo_revenue_sum - Lineorder data: Count and Sum

Pig_Join_Agg -Pig to join and aggregate data from the Lineorder and Customer dataset.

Hadoop_Multi_Node_WordCount - Fetching the word count from the file using python.

Mahout_Page_Rank - Implementation of Page Rank algorithm using Mahout

Mahout_Kmeans_Matrix_Fact - Implementation of Kmeans and Matrix Factorization for Movie Lens data using Mahout