Skip to content

Implementation of Hadoop "WordCount" MapReduce with Java.

Notifications You must be signed in to change notification settings

benseddikmo/Shavadoop

Repository files navigation

Shavadoop

This work is about a MapReduce implementation in Java for word count.

Objective

Design and implementation of a parallel and distributed system that computes the number of occurrences of each word present in a file.

Architecture description

The computing framework uses multiple computers inside Telecom ParisTech?s local network to run the word count procedure on text files. No transfer of files between distant computers is required since all files are saved in a centralized fashion inside the File System of Telecom ParisTech. The architecture consists of a master node that sends tasks to multiple slave nodes through SSH commands.

Map Reduce Word Count

Procedure

The Master node manages and launches programs remotely on slaves nodes that run these jobs.

I) The Master node:

  • finds the available hosts of the network (networkDiscovery function).
  • splits the initial file on which the word count procedure will be performed, either by lines or blocks of lines to generate multiple subfiles files Sx (splitting function).
  • distributes these generated splits to the available hosts of the network via threads.

II) The Slave node:

  • generates the count of each word in the split it receives and writes the output on the console (mapping function - mode SXUMX where UMx files are generated).

III) The Master node:

  • recuperates the output of the console to generate the <word, List(UM)> dictionary. For each word (key) of the dictionary, the Master launches a thread that calls the reduce method of the slave (mode UMXSMX: UMx-> SMx and afterwards SMx -> RMx) and retrieves the corresponding result from the console.
  • finally, the Master waits until all threads finish their execution and assembles all RMx files into a final output file.

About

Implementation of Hadoop "WordCount" MapReduce with Java.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published