hadoop-projects

Add the soc-LiveJournal1Adj.txt and the userdata.txt file to hdfs. Export jar files from the projects and run them using the following commands.

Input: Input files

soc-LiveJournal1Adj.txt
The input contains the adjacency list and has multiple lines in the following format:

is a unique integer ID(userid) corresponding to a unique user.
userdata.txt
The userdata.txt contains dummy data which consist of
column1 : userid ()
column2 : firstname
column3 : lastname
column4 : address
column5: city
column6 :state
column7 : zipcode
column8 :country
column9 :username
column10 : date of birth.

Program 1: MapReduce program in Hadoop to implements a simple "Mutual/Common friend list of two friends". This program will find the mutual friends between two friends.

Logic :

Let's take an example of friend list of A, B and C.

Friends of A are B, C, D, E, F.
Friends of B are A, C, F.
Friends of C are A, B, E
So A and B have C, F as their mutual friends. A and C have B, E as their mutual friends. B and C have only A as their mutual friend.

Map Phase :

In map phase we need to split the friend list of each user and create pair with each friend.

Let's process B's friend list
(Friends of B are A, C, F)
Key | Value
A,B | A, C, F
B,C | A, C, F
B,F | A, C, F
We have created pair of B with each of it's friends and sorted it alphabetically. So, the first key (B,A) will become (A,B).

Reducer Phase :

After map phase is shuffling data item into group by key. Same keys go to the same reducer.
A,B | B, C, D, E, F
A,B | A, C, F

Shuffling into {A,B} group and sent to the same reducer.
A,B | {B, C, D, E , F}, {A, C, F}

So, finally at the reducer we have 2 lists corresponding to 2 people. Now, we need to find the intersection to get the mutual friends.

Optimization

To optimize the solution i.e. to make the intersection faster I have used similar concept as merge operation in merge sort. I have sorted the friend list in the map phase. So, in reducer side we get 2 sorted lists. This way we can use the merge like operation to take only the matching values instead of going for all possible combinations in O(N2).

Please, make sure that the keys are sorted alphabetically so that we get friends list for 2 person on the same reducer.

Output

The program will output the mutual friends for following pairs.
(0,1), (20, 28193), (1, 29826), (6222, 19272), (28041, 28056)

The code can be easily changed to find mutual friends between all the people by removing the loop which is checking for these keys given above.

<User_A>,<User_B><Mutual/Common Friend List>
where <User_A> & <User_B> are unique IDs corresponding to a user A and B (A and B are friends).
< Mutual/Common Friend List > is a comma-separated list of unique IDs corresponding to mutual friend list of User A and B.

Code : MutualFriends

Program 2: Find friend pairs whose number of common friends (number of mutual friends) is within the top-10 in all the pairs. Output the output in decreasing order.

Used two pair of map reduce jobs.
The first map reduce job will find the mutual friends and produce the output with the friends pair and their mutual friends.
The second map reduce job will read the previous job output and then send the result to the same reducer by using the constant key. The value from mapper in this phase will have format and we will directly send this complete line to the reducer and process it there.
In the second reducer, we will split the received value by first and we will get the first value as the friend pair and the second value as a comma separated mutual friend list.
We can then again split the mutual friends list and store the count in a java map and use custom comparator to sort the map. Once the map is sorted in descending order we can take the top 10 values.

Output Format:

<User_A>, <User_B><Mutual/Common Friend Number>

Code : MutualFriendsCount

Program 3: Given any two Users (they are friend) as input, output the list of the names and the city of their mutual friends.

We need to use the userdata.txt to get the extra user information and in memory join to get the required details. So, the idea is to load userdata.txt dataset into memory in every mapper, using a hash map data structure to facilitate random access to tuples based on the join key (userid). For this purpose, you can override the method setup (mapper initialization) inside the Map class and load the hash map there inside.

Output format:

UserA id, UserB id, list of [city] of their mutual Friends.

Sample Output:

0, 41 [Evangeline: Loveland, Agnes: Marietta]

Code : MutualFriendsInformation

Program 4 : Calculate lowest average age of the direct friends of the users and output the lowest 15.

Step 1: Calculate the average age of the direct friends of each user.
Step 2: Sort the users by the average age from step 1 in descending order.
Step 3. Output the tail 15 (15 lowest averages) users from step 2 with their address and the calculated average age.
We need to use reduce side join.

Code : MutualFriendsAverageAge

Use the following commands to run the jar files :

hadoop jar Part1.jar MutualFriends MutualFriends /user/soc-LiveJournal1Adj.txt /user/mfriendsout

hadoop jar Part2.jar MutualFriendsCount MutualFriendsCount /user/soc-LiveJournal1Adj.txt /user/mfc1 /user/mfc2

hadoop jar Part3.jar MutualFriendsInformation MutualFriendsInformation /user/soc-LiveJournal1Adj.txt /user/mfc /user/userdata.txt

hadoop jar Part4.jar MutualFriends MutualFriends /user/soc-LiveJournal1Adj.txt /user/mfc /user/userdata.txt /user/finaloutput

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Dataset		Dataset
MutualFriends		MutualFriends
MutualFriendsAverageAge		MutualFriendsAverageAge
MutualFriendsCount		MutualFriendsCount
MutualFriendsInformation		MutualFriendsInformation
WordCount		WordCount
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset

Dataset

MutualFriends

MutualFriends

MutualFriendsAverageAge

MutualFriendsAverageAge

MutualFriendsCount

MutualFriendsCount

MutualFriendsInformation

MutualFriendsInformation

WordCount

WordCount

README.md

README.md

Repository files navigation

hadoop-projects

Program 1: MapReduce program in Hadoop to implements a simple "Mutual/Common friend list of two friends". This program will find the mutual friends between two friends.

Logic :

Map Phase :

Reducer Phase :

Optimization

Output

Program 2: Find friend pairs whose number of common friends (number of mutual friends) is within the top-10 in all the pairs. Output the output in decreasing order.

Output Format:

Program 3: Given any two Users (they are friend) as input, output the list of the names and the city of their mutual friends.

Output format:

Sample Output:

Program 4 : Calculate lowest average age of the direct friends of the users and output the lowest 15.

Use the following commands to run the jar files :

About

Releases

Packages

Languages

add1993/hadoop-projects

Folders and files

Latest commit

History

Repository files navigation

hadoop-projects

Program 1: MapReduce program in Hadoop to implements a simple "Mutual/Common friend list of two friends". This program will find the mutual friends between two friends.

Logic :

Map Phase :

Reducer Phase :

Optimization

Output

Program 2: Find friend pairs whose number of common friends (number of mutual friends) is within the top-10 in all the pairs. Output the output in decreasing order.

Output Format:

Program 3: Given any two Users (they are friend) as input, output the list of the names and the city of their mutual friends.

Output format:

Sample Output:

Program 4 : Calculate lowest average age of the direct friends of the users and output the lowest 15.

Use the following commands to run the jar files :

About

Topics

Resources

Stars

Watchers

Forks

Languages