Article

This repository refers to the paper: End-to-end Delay Prediction Based on Traffic Matrix Sampling.

Filip Krasniqi, Jocelyne Elias, Jeremie Leguay, Alessandro E. C. Redondi. IEEE INFOCOM WKSHPS - NI: The 3rd International Workshop on Network Intelligence. Toronto, July 2020.

Overview

This repository contains the code that allows to train, test and validate models based on scikit-learn's random forest and neural networks (pytorch). It is to be properly combined with the repository related to datasets. The repository contains two different programs:

dataset: it requires a set of datasets as provided in Routenet (https://github.com/knowledgedefinednetworking/Unveiling-the-potential-of-GNN-for-network-modeling-and-optimization-in-SDN/), Understanding (https://arxiv.org/pdf/1807.08652.pdf) or those obtained from the ns3 simulations. This code is in the libs directory, and consists mainly of the classes DatasetContainer, RoutenetDataset, UnderstandingDataset and NS3Dataset. It provides the dataframes for training or testing or the pytorch tensors, depending on the algorithm that uses them.
training: regardless from the algorithm, searches in a set of hyperparameters H, each of which is characterized by a discrete or continuous domain, selects the one providing best results in the validation set with crossvalidation and provides test results on the selected model.

For a simple use, stick to the following steps to be able to run this code with the already generated datasets.

Pull the the repository related to datasets
Fix dependency with the machine, i.e., directory describing the root folder. MUST be the same of the dataset one. Change variable dir_datasets in the corresponding DatasetContainer. The variable is always in the function init_base_directory but it is overwritten for each subclass of DatasetContainer.
If your dataset folder doesn't contain any dataset, either download the dataset folder or generate the simulations.
Install the dependencies from requirements.txt, together with pytorch. Regarding this last one, for any problem related to installation refer to the quick start. The code is built to run on CPU, and the porting to GPU is up to the user.
Execute the learning code as explained in the RF / NN part.

Dataset container

Abstract class defining the normal procedure to import the raw data and build the datasets. It sequentially does the following:

initializes directory variables. Must be changed as it contains the information regarding the base directory of the datasets.
initializes information regarding the simulation, i.e, num_periods and num_nodes
initializes variables related to the subclass (need overriding of init_variables_dataset)
computes, if required, adjacency and spectral convolution matrix
initializes columns for raw data, dataframes (i.e., input for RF), tensors
initializes files related to cache
builds (from cache or raw data) dataframes and eventually write them to cache
builds tensors (from cache or raw data)

An UML representing the structure is the following:

UML representing dataset container. Have a look here to see which functions each class overrides.

In the next subparagraphs you can find some details on the inheriting classes.

NS3 Dataset

D = F(T, C), being D = delay matrix, T = features extracted from traffic matrix, C capacity vector. Given a window size W, the rolling mean, std and quantiles is computed for each OD flow in the traffic matrix to obtain these features (lag = 1).

Routenet Dataset

D = F(T), being D = delay matrix, T = traffic matrix.

Understanding Dataset

D = F(T), being D = delay matrix, T = traffic matrix.

Everytime a dataset container is executed and asked to combine the data to build a unique dataset for a scenario, both tensors and dataframes are associated to a cache directory, that will speedup an eventual next execution avoiding this part.

Models

The current implementation expects a fixed topology. The scenarios on which we want to learn in the ns3 dataset can be different, and are identified to the Scenario enum in NS3Dataset. Regardless from the model, it is possible to switch the scenario by changing the scenario variable. Each scenario differs to the other basically on how we split train and test sets. It follows a brief explanation on each possible scenario instance. Given a topology, routing is always fixed.

LEVEL_1: fixed an environment (i.e., a topology and a network, so fixed distribution of both capacities and propagation delays), try to generalize the prediction of the delay in case of unseen traffic. The traffic distribution varies with the intensity. For each intensity, we define a traffic rate distribution and draw S traffic rates matrices. The test dataset is made of unseen traffic rates matrices.
LEVEL_2: fixed a topology, fixed a propagation delay distribution, we consider different capacities distribution, and split traffic in the same way as in 1.
LEVEL_3: fixed a topology, we consider different propagation delay and capacity distributions. Test is made of traffic generated under different propagation delay and capacity distributions.

In both cases, the user can configure at execution time the following parameters:

Cache directory: name of directory where to cache the combined datasets
Dataset name: ns3 | understanding | routenet
Model name: name of the model. It is important to identify the directory where to save the model and the score
Topology: name of the directory, associated to a topology, containing the datasets (eg: abilene, nsf, ...)
Identifier: only for ns3. It allows to select to which simulation (i.e., execution of generate_simulations.py) we refer to
Intensity: only for ns3. If the user assigns "only_low" to this parameter, only one intensity (the lowest one) is considered
Test less intensities (bool): only for ns3. Whether to assign all the intensities to test or not. Default: False, and only three intensities are considered in test.
Scenario: only for ns3. Integer defining the scenario. 1, 2 or 3 to identify, respectively, level_1, level_2 or level_3.

General command for RF and NN:

python <script> <cache> <dataset> <model> <topology> <identifier> <intensity> <less_intensities> <scenario>

Once the dataset is loaded, the model is ran with cross validation and the best model with the relative score on the test dataset are output in the model name directory.

Random Forest

General command:

python rf.py <cache> <dataset> <model> <topology> <identifier> <intensity> <less_intensities> <scenario>

Example:

python rf.py cache_v1 ns3 rf_v1 abilene v1_fixed_capacity all True 1

Details on implementation

The script will:

build the dataset and split it into train and test according to the input scenario. In case of ns3 dataset, the dataset is related to the simulation ran on the given topology
execute a RandomizedSearch with KFold CV (change hyperparameters if needed). A RandomizedSearch needs K = number of cross validations and S = number of different hyperparameter combinations attempt
select the best model from RandomizedSearch and test it
best model is output in base_dir_proj/exported/crossvalidation/, together with the score. Model name is random_forest.model, scores goes under scores.txt

Multilayer perceptron

The script will:

build the dataset and split it into train and test according to the input scenario. In case of ns3 dataset, the dataset is related to the simulation ran on the given topology
execute a GridSearch with CV (change hyperparameters if needed). The GridSearch needs the entire space of exploration and tries K times each combination
select the best model from GridSearch and test it
best model is output in base_dir_proj/exported/crossvalidation/, together with the score. Model name is NN.model, scores (all of them!) go under r2_test_scores.txt. For each search, we provide the best model inside the corresponding directory, and results and model for each CV execution.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.idea		.idea
supervised-learning-qos-learning		supervised-learning-qos-learning
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

supervised-learning-qos-learning

supervised-learning-qos-learning

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Article

Overview

Dataset container

NS3 Dataset

Routenet Dataset

Understanding Dataset

Models

Random Forest

Details on implementation

Multilayer perceptron

About

Releases

Packages

Languages

filipkrasniqi/QoSML

Folders and files

Latest commit

History

Repository files navigation

Article

Overview

Dataset container

NS3 Dataset

Routenet Dataset

Understanding Dataset

Models

Details on implementation

About

Resources

Stars

Watchers

Forks

Languages