Hadoop Examples

Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.

Preparations & Prerequisites

Latest stable version of Hadoop or at least the one used here, 3.3.0.
A single node setup is enough. You can also use the applications in a local cluster or in a cloud service, with needed changes on the map splits and the number of reducers, of course.
Of course, having (a somehow recent version of) Java installed. I have openjdk 11.0.5 installed to my 32-bit Ubuntu 16.04 system, and if I can do it, so can you.

Projects

Each project comes with its very own:

input data (.csv, .tsv, or simple text files in a folder ready to be copied to the HDFS).
execution guide (found in the source code of each project but also being heavily dependent of your setup of java and environment variables, so in case the guide doesn't work, you can always google/yahoo/bing/altavista your way to execution).

The projects featured in this repo are:

AvgPrice

Calculating the average price of houses for sale by zipcode.

BankTransfers

A typical "sum-it-up" example where for each bank we calculate the number and the sum of its transfers.

MaxTemp

Typical case of finding the max recorded temperature for every city.

Medals

An interesting application of working on Olympic game stats in order to see the total wins of gold, silver, and bronze medals of every athlete.

NormGrades

Just a plain old normalization example for a bunch of students and their grades.

OldestTree

Finding the oldest tree per city district. Child's play.

ScoreComp

A bit more challenging than the rest. Every key-character (A-E) has 3 numbers as values, two negatives and one positive. We just calculate the score for every character based on the following expression character_score = pos / (-1 * (neg_1 + neg_2)).

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
AvgPrice		AvgPrice
BankTransfers		BankTransfers
MaxTF		MaxTF
MaxTemp		MaxTemp
Medals		Medals
NormGrades		NormGrades
OldestTree		OldestTree
PatientFilter		PatientFilter
ReadFolderFiles		ReadFolderFiles
ScoreComp		ScoreComp
SymDiff		SymDiff
TopWords		TopWords
README.md		README.md

Coursal/Hadoop-Examples

Folders and files

Latest commit

History

Repository files navigation

Hadoop Examples

Preparations & Prerequisites

Projects

About

Topics

Resources

Stars

Watchers

Forks

Languages