GitHub - peloyeje/map543-dijkstra-mapreduce-spark: [MAP543] Hadoop Streaming (MapReduce) and Spark implementations of the Dijkstra shortest path algorithm

Dijkstra algorithm

[MAP543 Database Management] Final project

Authors

Overview

The Dijkstra algorithm is an algorithm that enables to find the shortest/lowest-cost path between two nodes in a graph. In its more common variant, it allows to compute a shortest path tree, i.e. the shortest paths to a source node for each node of the graph. Distance can be considered as the number of hops between nodes or as the sum of the weights/costs of the edges between nodes. It is a restricted application of the best-first search algorithm.

The idea of the algorithm is to iteratively compute each graph node’s distance to the source node, and stop when all nodes have been visited. At each iteration, starting from the source node, each children distance to the selected node is updated with (parent distance + parent-child distance). The node selection is done according to the lowest distance to the previously selected node.

Data

twitter-formatted.txt : formatted Twitter graph data from the SNAP Project

Generated graph data where nodes may not have outgoing vertices

shallow-weighted-graph-1000.txt
shallow-weighted-graph-10000.txt
shallow-weighted-graph-100000.txt

Generated complete graph data

weighted-graph-1000.txt
weighted-graph-10000.txt
weighted-graph-100000.txt

MapReduce (Hadoop Streaming)

In a local environment

Structure of an iteration

cd mapreduce
cat my-preprocessed-graph.txt | ./mapper.py | sort -k1n | ./reducer.py > step1.txt

Bash script to automate iterations

./launch_local.sh <my-graph-file> [-v]

It performs unlimited iterations. Convergence is detected if the sum of distances between all nodes and the source node stops decreasing in iteration n+1

On a Hadoop cluster

Structure of an iteration

Input folder : input/
- Contains input graph or the reducer output of the last iteration
Output folder : output/

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
-D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator \
-D mapreduce.partition.keycomparator.options=-n \
-input /user/hadoop/input -output /user/hadoop/output \
-file mapreduce/mapper.py -mapper mapreduce/mapper.py \
-file mapreduce/reducer.py -reducer mapreduce/reducer.py

Bash script to automate iterations

./launch_hadoop.sh <hdfs-input-path> <hdfs-output-path> <local-mapper-path> <local-reducer-path> [nb_iterations]

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
graph-data		graph-data
graph-generation		graph-generation
graph-preparation		graph-preparation
mapreduce		mapreduce
spark		spark
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph-data

graph-data

graph-generation

graph-generation

graph-preparation

graph-preparation

mapreduce

mapreduce

spark

spark

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Dijkstra algorithm

Overview

Data

MapReduce (Hadoop Streaming)

In a local environment

Structure of an iteration

Bash script to automate iterations

On a Hadoop cluster

Structure of an iteration

Bash script to automate iterations

About

Releases

Packages

Contributors 2

Languages

peloyeje/map543-dijkstra-mapreduce-spark

Folders and files

Latest commit

History

Repository files navigation

Dijkstra algorithm

Overview

Data

MapReduce (Hadoop Streaming)

In a local environment

Structure of an iteration

Bash script to automate iterations

On a Hadoop cluster

Structure of an iteration

Bash script to automate iterations

About

Topics

Resources

Stars

Watchers

Forks

Languages