Skip to content

SESARLab/certification-across-system-changes

Repository files navigation

Continuous Certification of Non-Functional Properties Across System Changes

Existing certification schemes implement continuous verification techniques aimed to prove non-functional (e.g., security) properties of software systems over time. These schemes provide different re-certification techniques for managing the certificate life cycle, though their strong assumptions make them ineffective against modern service-based distributed systems. Re-certification techniques are in fact built on static system models, which do not properly represent the system evolution, and on static detection of system changes, which results in an inaccurate planning of re-certification activities. In this paper, we propose a continuous certification scheme that departs from a static certificate life cycle management and provides a dynamic approach built on the modeling of the system behavior that reduces the amount of unnecessary re-certification. The quality of the proposed scheme is experimentally evaluated using an ad hoc dataset built on publicly-available datasets.

This repository contains the source code, input dataset, and detailed results of our experimental evaluation.

1. Overview

The code is written in Python 3 and tested in an MacOS environment (virtualenv) with Python 3.10; dependencies are listed in requirements.txt.

The aim of our experimental evaluation is to compare our scheme with a scheme representing the state of the art, covering all the scenarios as described in our paper. For this reason, our experimental data are based on the dataset available at https://doi.org/10.13012/B2IDB-6738796_V1. It measures the response time of a set of microservices along a given execution path in normal and anomalous conditions. Specifically, the creators of the dataset executed three well-known distributed systems in the literature with and without injecting anomalies (for details, see the corresponding paper: https://www.usenix.org/conference/osdi20/presentation/qiu).

We defined our experimental settings extracting normal and anomalous data from the above data, to generate a dataset including environmental changes and code changes with and without impact on the behavior. Each data point of the dataset is also annotated with additional information (e.g., presence of critical components affected by the change). We then apply our scheme and the state of the art scheme on the generated dataset.

Each row of the table below represents an experimental settings driving the generation of our datasets. The process of dataset generation and schemes application have been repeated 10 times for each distributed system and experimental settings. The original paper and Supp1-Exp_Settings.pdf contain details on the meaning of each columns.

Name $\Delta_b$ $\Delta_c$ $\Delta_c$ with cascading non critical minor $n(comp)_b$ $n(comp)_{\text{min}}$ $n(comp)_{\text{maj}}$
P1.1 $0.3\overline{3}$ $ 0.3\overline{3}$ $ 0.3\overline{3}$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$
P1.2 $0.3\overline{3}$ $ 0.3\overline{3}$ $ 0.3\overline{3}$ $ 0.5$ $ 0.5$ $ 0.5$ $ 0.5$ $ 0.5$
P1.3 $0.3\overline{3}$ $ 0.3\overline{3}$ $ 0.3\overline{3}$ $ 0.75$ $ 0.75$ $ 0.75$ $ 0.75$ $ 0.75$
P2.1 $0.5$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$
P2.2 $0.5$ $ 0.25$ $ 0.25$ $ 0.5$ $ 0.5$ $ 0.5$ $ 0.5$ $ 0.5$
P2.3 $0.5$ $ 0.25$ $ 0.25$ $ 0.75$ $ 0.75$ $ 0.75$ $ 0.75$ $ 0.75$
P3.1 $0.25$ $ 0.5$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$
P3.2 $0.25$ $ 0.5$ $ 0.25$ $ 0.5$ $ 0.5$ $ 0.5$ $ 0.5$ $ 0.5$
P3.3 $0.25$ $ 0.5$ $ 0.25$ $ 0.75$ $ 0.75$ $ 0.75$ $ 0.75$ $ 0.75$
P4.1 $0.25$ $ 0.25$ $ 0.5$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$ $ 0.25$
P4.2 $0.25$ $ 0.25$ $ 0.5$ $ 0.5$ $ 0.5$ $ 0.5$ $ 0.5$ $ 0.5$
P4.3 $0.25$ $ 0.25$ $ 0.5$ $ 0.75$ $ 0.75$ $ 0.75$ $ 0.75$ $ 0.75$

2. Organization

The repository is organized in the following directories:

  • Code: contains the Python code to run the experiments.
  • Input: contains the raw data (response times) of the microservices, taken from https://doi.org/10.13012/B2IDB-6738796_V1.
  • Output: contains the results of our experiments, including generated datasets, with results aggregated at different levels.
  • input.json: contains the actual input to run the code and reproduce our results.

2.1. Details: Code Organization

The code consists of the following files:

  • Code/base.py: contains utility classes and functions used in the rest of the code, including the code to train the system model based on isolation forest.
  • Code/const.py: contains constants.
  • Code/dataset_generator.py: contains the code to generate the experimental dataset starting from data in the provided input directory.
  • Code/entrypoint.py: contains the main entrypoint
  • Code/exp_quality_scheme.py: contains the code running the core function applying our scheme and the state of the art on a set of experimental settings and evaluating the results
  • Code/exp_quality_scheme_support.py: contains support code, including code to export data
  • Code/situation.py: contains the code selecting the scenario that applies in each detected change

Each file has its own set of tests executable with pytest.

2.2. Details: Input

The code requires two inputs: initial data from https://doi.org/10.13012/B2IDB-6738796_V1, and experimental settings as reported in the table above.

Initial data: the code works on the data as is, as long as each distributed system has its own directory. Experimental settings: they need to be provided as a json file; this file contains the different probabilities as well as path to input data. This repository contains the input data we used during our experiments, therefore paths to be adjusted properly.

2.3. Details: Output

The code produces aggregated and detailed results in two formats: excel and csv. The two formats are always generated, that is, there is no option to choose the desired format.

During execution, the code requires the base directory where output data should be placed. With the hope that they may be useful, we include all our data: our results as well as generated datasets.

In particular, we generate two types of datasets:

More in detail, the generated output is contained in the following directories.

Individual files have a pretty self-explanatory names, those containing scheme contains data evaluating the two certification schemes, those containing model contains data evaluating the system model.

Data in our paper are created mostly from Output/Aggregated_Datasets/by_config_scheme_stripped.xlsx andOutput/Aggregated_Datasets/by_dataset_model_stripped.xlsx for what concerns system model.

Note: our experiments have been executed with option --include-strip-down True, this means that for each output file there exists a stripped down version where only columns including averaged data are reported. This also means that some files are basically empty because this filter removes all data. We nevertheless decided to include these files, since we run our experiments including this option.

3. Example Execution

To reproduce our results you need to:

  • modify input.json according to the path where you placed directory Input
  • run the following command, replacing path-to-input.json and path-to-output with the desired paths.
python entrypoint.py \
	--config-file-name path-to-inputv.json \
	--output-directory path-to-output \
	--include-strip-down True

Execution is parallelized and experiments should last some minutes.

4. Appendix of the Paper

We provide two supplements of our paper.

  • Supp1-Exp_Settings.pdf contains a detailed description of the experimental process extending its discussion in the paper.
  • Supp2-Walkthrough.pdf contains a detailed walkthrough of the certification scheme presented in the paper.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages