SURF Automatic Collection Engine (SURFace)

This repository contains several scripts to analyze and visualize data collected from SURF's Lisa cluster. The data can be found on Zenodo at https://zenodo.org/record/4459519.

Usage

Download the dataset using the link mentioned above
Clone this repository to some folder.
Per script, modify the paths as required. "/path/to/surfsara-jobdata/", "path to machine metric dataset" and variants should point the the dataset downloaded in point 1. ./cache should be point to a location where some scratch data can be put.
Run the notebook on a machine that has 64GB or more RAM, as some analyses require some in-memory storage. For some scripts, a Spark cluster is required due to the sheer amount of data and processing required. If Koalas is used in a script, you are most likely needing to setup a small spark cluster. 4-10 machines each having 64GB or more RAM will suffice. In correlation_plot_koalas.py, we use 5 machines (1 master, 4 workers) each having 64GB of RAM.
The figures will be output in the folder where the notebook resides, or where you point the paths Matplotlib/Seaborn should output to. Tables are printed in the notebook as a string.

Scripts The scripts have generally a self-describing name. Below we provide some more details per script.

Script	Explanation
!LSTM V2.ipynb	Investigates the effect of different sampling intervals on predictions of metric values.
!Network data analysis.ipynb	Performs various analyses related to network IO.
!Z-Score.ipynb	A script that investigates if anomalies can be detected using z-scores.
!jobdata_analysis_new.ipynb	Performs various different analyses related to the executed jobs within Lisa.
Full_Cluster_bottleneck_analysis.ipynb	Creates a holistic normalized overview of the dataset by aligning job arrivals with various machine metrics.
Generic_outline_dataset.ipynb	Computes various generic properties of the dataset. The overview table below in the readme is constructed using this script.
correlation_single_rack_one_day.ipynb	Computes the Pearson, Spearman, and Kendall correlation coefficients for all pairs of metrics within the dataset on individual days.
analysis_coefficient_separate_days.ipynb	Visualizes in various ways the output of correlation_single_rack_one_day.ipynb.
correlation_plot_koalas.py	Computes a dense correlation plot of normalized histograms, scatterplots with linear regression lines per metric pair, and visualized the Pearson, Spearman, and Kendall correlation coefficients per metric pair.
koalas_correlation_plot_data_only.ipynb	Creates a better visualization of the plot of correlation_plot_koalas.py by creating a variant of Seaborn's pairgrid.
daily_weekly_trend_load.ipynb	Creates several weekly and diurnal trend visualizations.
file_sizes_different_granularities.ipynb	Computes the storage overhead for different sampling frequencies using a selection of metrics.
generate_barplots.py	Generates barplots of metric values in covid vs non-covid periods.
generate_boxplots.py	Generates boxplots of metric values in covid vs non-covid periods.
job_arrival_characterization.ipynb	Creates several visualizations and performs different kind of analyses based on job arrivals.
mean_memory_utilization_nodes.ipynb	Analyses different aspects of the node RAM usage and creates several different visualization.
power_consumption_analysis.ipynb	Performs several analyses on the rack and power consumption and creates several different visualizations.
rack_temp_noenc.py	Analyzes for various racks their node temperatures and creates visualizations for them.

Outline of the dataset

The dataset spans from 2019-12-29 to 2020-08-07.

Element	Value
Sampling frequency	15 seconds
Max. samples per metric per node	1,258,646
Number of metrics	327
Number of measurements	66,541,895,243

Libraries used

Most tools in this repository were created and tested using the following libraries and their versions:

Library	Version
Pandas	1.2.0
NumPy	1.19.4
SciPy	1.5.3
statsmodels	0.12.1
pytz	2020.4
SKlearn	0.24.0
Tensorflow	2.3.1
pyarrow	3.0.0
Dask	2021.03.0
Matplotlib	3.4.1
Seaborn	0.11.1
Koalas	1.5.0
Spark	3.0.0
Hadoop	2.7.7

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
covid_cluster_plots		covid_cluster_plots
surface-tool		surface-tool
!LSTM V2.ipynb		!LSTM V2.ipynb
!Network data analysis.ipynb		!Network data analysis.ipynb
!Z-Score.ipynb		!Z-Score.ipynb
!jobdata_analysis_new.ipynb		!jobdata_analysis_new.ipynb
Full_Cluster_bottleneck_analysis.ipynb		Full_Cluster_bottleneck_analysis.ipynb
Generic_outline_dataset.ipynb		Generic_outline_dataset.ipynb
LICENSE		LICENSE
Lisa-description.md		Lisa-description.md
README.md		README.md
SLURM_config.md		SLURM_config.md
about_archive.md		about_archive.md
analysis_coefficient_separate_days.ipynb		analysis_coefficient_separate_days.ipynb
correlation_plot_koalas.py		correlation_plot_koalas.py
correlation_single_rack_one_day.ipynb		correlation_single_rack_one_day.ipynb
daily_weekly_trend_load.ipynb		daily_weekly_trend_load.ipynb
file_sizes_different_granularities.ipynb		file_sizes_different_granularities.ipynb
generate_barplots.py		generate_barplots.py
generate_boxplots.py		generate_boxplots.py
job_arrival_characterization.ipynb		job_arrival_characterization.ipynb
koalas_correlation_plot_data_only.ipynb		koalas_correlation_plot_data_only.ipynb
mean_memory_utilization_nodes.ipynb		mean_memory_utilization_nodes.ipynb
node information.md		node information.md
power_consumption_analysis.ipynb		power_consumption_analysis.ipynb
rack_temp_noenc.py		rack_temp_noenc.py
visualize_context_switches.ipynb		visualize_context_switches.ipynb

License

sara-nl/SURFace

Folders and files

Latest commit

History

Repository files navigation

SURF Automatic Collection Engine (SURFace)

Usage

Outline of the dataset

Libraries used

About

Topics

Resources

License

Stars

Watchers

Forks

Languages