Skip to content

Engrima18/Jellyfish-vs-FatTree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jellyfish-vs-FatTree

#1 Homework for the course "Networking for Big Data and Data Centers" at La Sapienza University of Rome

Used technologies

Python NumPy Plotly Pandas Visual Studio Code PyCharm

Content

  • main.py: python script that calls the functions and the classes from the modules below and saves the resulting plots in images (png);
  • functions.py: python script file which contains all the functions for building graphs, make simulations and evaluating performances;
  • topologies.py: python module where we implemented the classes for the used topologies;
  • report.pdf: pdf which contains a complete reports with our conclusions.

Brief description

In the first part of the homework we define functions for creating the graph randomly by two techniques:

  • r-regular graph;
  • Erdòs-Rényi.

We then implement several techniques to evaluate whether the created random graphs are connected:

  1. we verify the reachability of all nodes with a breadth-first search algorithm;
  2. we verify that the Laplacian matrix associated with the graph has a null eigenvalue of multiplicity 1;
  3. we verify that the irreducibility of the adjacency matrix (A) associated with the graph by solving the following inequality : $$I + A+ A^2+ ... + A^n > 0$$

Finally, we implement from scratch the Jellyfish and Fat-tree topologies and compare their scalability performance.

Fat-tree Jellyfish
fat jelly

Simulation and parformance evaluation

We evaluate the complexity, and thus the level of efficiency, of the methods listed above for studying the connectivity of a graph by considering two possible metrics.

Running time Bytes occupied in RAM for execution
complex_comparis complex_comparis2

Then, let $p_c(G)$ denote the probability that a graph G is connected. By running Monte Carlo simulations, we estimate $p_c(G)$ and produce two curve plots:

  1. $p_c(G)$ vs. $p$ for Erdòs-Rényi graphs with K = 100 number of nodes.

ERgraph

  1. $p_c(G)$ vs. $K$, for K ≤ 100, for r-regular random graphs with r = 2 and r = 8.

Rregular


Consider that if the job is split into N parallel tasks, that are run over N servers, then:

  • each task takes a time $T_0 + X_i$, where $X_i \sim Exp( \lambda= \frac N {E[X]})$ , X r.V. for the baseline job running time
  • each server receives an amount of input data $L_f /N$ ($L_f$: lenght of the baseline input file).
  • amount of output data produced by each server is $L_{o,i} \sim Unif(0, 2L_o/N)$ ($L_o$: lenght of the baseline output file)
  • Data is transferred to and from server i via a TCP connection between server A and server i, having average throughput given by: $$\theta_i = C \cdot \frac {1/T_i} {\sum_j T_j}$$ where $T_i=2 \tau h_i$ is the RTT between the origin server A and server i, $h_i$ is the number of hops between server A and server i and C is the capacity of each link of the DC network.

In the end we plot:

  • the mean response time $E[R]$ as a function of $N$ for $1 \leq N \leq 10000$. Let $R_baseline$ be the response time in case only server A is used, we normalize w.r.t. $ E[R_baseline]$
  • the Job running cost $S$ as a function of $N$ for $1 \leq N \leq 10000$ (normalize $S$ with respect to $S_baseline$).

TimeAndCost


Team

About

Project aimed to simulate two different Data Center topoligies and compare their perfomances in a controlled environment

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages