Skip to content

This methodology is a project that proposes applying the Multiple Preditors System (MPS) to forecasts time series extracted from Microservice-Based Applications (MBAs).

gfads/mps-methodology

Repository files navigation

MPS-Methodology

MPS methodology is a project that proposes applying the Multiple Preditors System (MPS) to forecasts time series extracted from Microservice-Based Applications (MBAs). In literature, works have applied time series forecasting to predict performance degradation in MBAs. However, all these studies use a single forecast model, which increases the risk of inaccurate estimates, which can lead the application to undesirable scenarios such as unavailability. MPS emerges as an alternative to this problem since it uses multiple models in the forecast. MPS's basic idea of the ensemble is to combine the strengths of different learning algorithms to build a more accurate and reliable forecasting system.

More reliable and accurate forecasting systems are essential in proactive microservice auto-scaling systems. They improve the decision-making process of these systems by more reliably estimating microservices trends while mitigating incorrect adaptations triggered by inaccurate estimates. Consequently, microservices have a reduction in operating costs, and their customer satisfaction is maintained.

Installation

How to install the software?

$ virtualenv venv
$ source venv/bin/activate
$ pip3 install -r requirements.txt

What do the folders generated by the results mean?

All processor results were stored in the results folder.

The description of each folder and its respective content is given below:

Folder Content description
Increasing It contains the accuracy of monolithic models in Increasing workload.
Decreasing It contains the accuracy of monolithic models in Decreasing workload.
Periodic It contains the accuracy of monolithic models in Periodic workload.
Random It contains the accuracy of monolithic models in Random workload.
Series 1 It contains the accuracy of monolithic models in Series 1 workload.
Series 2 It contains the accuracy of monolithic models in Series 2 workload.
Series 3 It contains the accuracy of monolithic models in Series 3 workload.
Series 4 It contains the accuracy of monolithic models in Series 4 workload.
Summary It contains the summary with precision of all approaches (monolithic, homogeneous and heterogeneous).
Summary/better_lags It contains the best lag for each monolithic model.
Summary/better_acurracy It contains the best accuracy values for each monolithic model.
Summary/better_pool_values MPS accuracy.
Summary/better_pool_values_aggregate It contains aggregated data of better_pool_values and better_acurracy
Summary/pool_size_homogeneous_analisys It contains data from each time series's optimal bagging size analysis.
Multiple Metrics It contains a summary of results on additional metrics beyond RMSE
DM test it contains data from the DM statistical test.

How to regenerate paper results using saved models?

Download and extract from the models

Download the models from OneDrive and save them inside the MPS Methodology folder. Models were uploaded externally due to their size.

$ unzip models.zip

Regenerate all results

Be patient. This process can take a while. Between 15-30 minutes per performance metric.

$ rm results/ -r; mkdir results/
$ python3 generate_initial_results.py --competence_measure rmse --deployment frontend --lags 10 20 30 40 50 60 --learning_algorithms arima lstm xgboost svr rf mlp --metrics cpu memory responsetime traffic  --workloads decreasing increasing random periodic
$ python3 generate_pool_results.py --competence_measure rmse --deployment frontend --lags 10 20 30 40 50 60 --learning_algorithms arima lstm xgboost svr rf mlp --metric cpu memory traffic responsetime --workloads decreasing increasing random periodic

For real-world time series, you have to execute the command per series:

$ python3 generate_initial_results.py --competence_measure rmse --deployment microservice1 --lags 10 20 30 40 50 60 --learning_algorithms arima lstm xgboost svr rf mlp --metrics cpu memory responsetime traffic  --workloads microservice1;
$ python3 generate_pool_results.py --competence_measure rmse --deployment microservice1 --lags 10 20 30 40 50 60 --learning_algorithms arima lstm xgboost svr rf mlp --metric cpu memory traffic responsetime --workloads microservice1;

and 

$ python3 generate_initial_results.py --competence_measure rmse --deployment microservice2 --lags 10 20 30 40 50 60 --learning_algorithms arima lstm xgboost svr rf mlp --metrics cpu memory responsetime traffic  --workloads microservice2;
$ python3 generate_pool_results.py --competence_measure rmse --deployment microservice2 --lags 10 20 30 40 50 60 --learning_algorithms arima lstm xgboost svr rf mlp --metric cpu memory traffic responsetime --workloads microservice2;

and so on.

If desired, you can generate the values of a specific metric by changing the input command. For example, to generate results for just the memory metric, do:

$ python3 generate_initial_results.py --competence_measure rmse --deployment frontend --lags 10 20 30 40 50 60 --learning_algorithms arima lstm xgboost svr rf mlp --metrics memory  --workloads decreasing increasing random periodic;
$ python3 generate_pool_results.py --competence_measure rmse --deployment frontend --lags 10 20 30 40 50 60 --learning_algorithms arima lstm xgboost svr rf mlp --metric memory --workloads decreasing increasing random periodic;

or

$ python3 generate_initial_results.py --competence_measure rmse --deployment microservice1 --lags 10 20 30 40 50 60 --learning_algorithms arima lstm xgboost svr rf mlp --metrics memory  --workloads microservice1;
$ python3 generate_pool_results.py --competence_measure rmse --deployment microservice1 --lags 10 20 30 40 50 60 --learning_algorithms arima lstm xgboost svr rf mlp --metric memory --workloads microservice1;

Summary of other information presented throughout the paper.

Learning algorithm parameters

Algorithm Hyper-parameters Source
ARIMA Autoarima library -
LSTM batch_size: [64, 128], epochs: [1, 2, 4, 8, 10], hidden_layers: [2, 3, 4, 5, 6], learning_rate: [0.05, 0.01, 0.001] Coulson et. al.
MLP hidden layer sizes: [5, 10, 15, 20], activation: [ tanh, relu, logistic], solver: [lbfgs, sgd, adam], ‘max iter’: [100, 500, 1000, 2000, 3000], ‘learning rate’: [ constant, adaptive] Rubak
RF min_samples_leaf: [1, 5, 10], min_samples_split: [2, 5, 10, 15], n_estimators: [100, 500, 1,000] Espinosa et al.
SVR gamma: [0.001, 0.01, 0.1, 1], kernel: [rbf, sigmoid], epsilon: [0.1, 0.001, 0.0001] C: [0.1, 1, 10, 100, 1,000, 10,000] de Oliveira et al.
XGBoost col_sample_by_tree: [0.4, 0.6, 0.8], gamma: [1, 5, 10], learning_rate: [0.01, 0.1, 1], max_depth: [3, 6, 10], n_estimators: [100, 150, 200], reg_alpha: [0.01, 0.1, 10], reg_lambda: [0.01, 0.1, 10], subsample: [0.4, 0.6, 0.8] Mohamed and El-Gayar

Lags and Bagging Size

The results for selecting the homogeneous pool size are summarised here. Also, the results of the best monolithic model by dataset and its lag size are available here and here, respectively.

The following table summarises both results for synthetic series.

DatasetsApproaches
Time SeriesWorkloadBest MonolithicMonolithic LagHomogeneous Pool Size
CPU
Usage
DecreasingSVR3020
IncreasingMLP3040
PeriodicSVR1050
RandomMLP1090
Memory

DecreasingSVR5020
IncreasingSVR2010
PeriodicMLP1020
RandomSVR1010
Response
Time
DecreasingLSTM2030
IncreasingMLP6030
PeriodicMLP60110
RandomMLP20140
Traffic

DecreasingMLP50110
IncreasingMLP20130
PeriodicSVR6050
RandomMLP1010

The following table summarises both results for real-world series.

DatasetsApproaches
Time SeriesWorkloadBest MonolithicMonolithic LagHomogeneous Pool Size
CPU
Usage
DecreasingSVR1020
IncreasingSVR4020
PeriodicSVR2020
RandomMLP1010
Memory

DecreasingSVR40150
IncreasingSVR10110
PeriodicMLP6010
RandomSVR10130
Response
Time
DecreasingSVR1020
IncreasingMLP5030
PeriodicXGBoost2030
RandomRF40100
Traffic

DecreasingRF2030
IncreasingMLP1060
PeriodicSVR5090
RandomLSTM2030

Time Series

The time series used in the research can be found here. Also, we also plot all-time series and create a description file.

The following table describes the synthetic series.

MetricSeriesTrendStationaryFrequencyMeanMedianStdSize
CPU
Usage
DecreasingMinutes244.278260.60644.4184320
increasingMinutes148.470160.59033.1224321
PeriodicMinutes221.668272.34089.2984322
RandomMinutes233.277237.98034.2624323
MemoryDecreasingMinutes1.34E+081.29E+081.29E+074324
increasingMinutes8.75E+078.64E+072.35E+074325
PeriodicMinutes1.04E+081.04E+087.88E+064326
RandomMinutes9.72E+079.74E+071.92E+064327
Response
Time
DecreasingMinutes514.907561.575162.3944328
increasingMinutes557.310624.800194.0214329
PeriodicMinutes561.310691.467296.5264330
RandomMinutes476.822454.164150.8034331
TrafficDecreasingMinutes3046.0823450.7821147.1204332
increasingMinutes3226.5073679.9591338.6344333
PeriodicMinutes3169.4683803.3331865.8834334
RandomMinutes2378.1452132.667968.2294335

The following table describes the real-world series.

MetricSeriesTrendStationaryFrequencyMeanMedianStdSizeCommunication
CPU
Usage
1Seconds0.340.330.051,426RI, IC, IPC
2Seconds0.340.400.111,427RI
3Seconds0.180.150.071,420RI, IC, IPC
4Seconds0.320.310.041,421RI, IC
Memory1Seconds0.530.520.041,427RI, IC
2Seconds0.510.500.031,426RI
3Seconds0.520.520.001,426RI, IPC
4Seconds0.450.450.021,424RI, IC
Response
Time
1Minutes1.001.020.20720RI
2Minutes23.7523.953.06720RI
3Minutes59.9458.2414.79721IC
4Minutes470.38371.96255.56715IC
Traffic1Minutes222.39220.5812.42721RI
2Minutes50.7554.3711.69721RI
3Minutes111.6044.2987.34713IC
4Minutes258.10255.1240.43721IPC

About

This methodology is a project that proposes applying the Multiple Preditors System (MPS) to forecasts time series extracted from Microservice-Based Applications (MBAs).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages