Skip to content

Comparative study of the main techniques used for COVID modeling where the information available is infected curve. The objective is to identify those univariate techniques that produce the best results, analyzing whether the more complex models are really able to provide better predictions.

License

marialonsogar/COVID-Dynamics-Model-Comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COVID-Dynamics-Model-Comparison

Introduction

Comparative study of the main techniques used for COVID modeling where the information available is infected curve. The objective is to identify those univariate techniques that produce the best results, analyzing whether the more complex models are really able to provide better predictions.

Since COVID-19 was declared a pandemic, the urgency to obtain accurate predictive methods to help institutions make decisions on measures to apply and the uncertainty surrounding the virus has facilitated the publication and application of different techniques. The motivation of this study is to compare them, in particular compartmental epidemiological models, linear regression models, ARIMA family models and recurrent neural networks.

Data

COVID-19 cases in Spain reported by province daily (see source).

EDA and Data processing

Exploratory Data Analysis was conducted to explore time series patterns (global and local trends, structural changes, seasonalities...), data inconsistencies, outliers, etc.

image

Data processing:

  • Aggregate data from province-level to national-level since the point of interest lies on a global level
  • Remove variables of hospitalized individuals and ICU inpatients (not relevant for this analysis) and rename columns for ease of analysis
  • Add population (total population of Spain as constant) and recovery cases, required for SIR model study (population=susceptible)
  • Smooth data by a mean average of 7 periods (days) to remove seasonal fluctuations caused by absence of data during weekends: the series exhibit seasonal fluctuations with period 7 (due to the lack of data communication from the communities on weekends)
  • Outliers were identified duting summer and Christmas season, but they are inherent to the series
  • Forecasting horizon set up to 14 days in the future

Modeling

A set of models were fitted and evaluated (MAE, RMSE, MAPE, RMSLE) on different windows with time series cross validation (see walk-forward schema and expanding walk-forward schema). Implementation and mathematical details are well elaborated in each notebook:

Results

  • All the metrics increase as the time horizon increases for all the models, which is reasonable, since the farther the future point is from the known observations, the greater the uncertainty.
  • It can be seen that the best model for any metric is the ARIMA(2,1,5). The RNN considered is incapable of correctly capturing the dynamics of the virus, which is manifested by generating predictions that are insufficiently accurate. The SIS model and linear regression follow a similar evolution except for the RMSLE, when the linear regression model increases drastically from time horizon 8 onwards. This may be because the series studied does not verify the hypotheses of the SIS model and is unable to provide parameters with epidemiological significance. Consequently, the model has no epidemiological interpretation but becomes a mere regression adjustment.
  • Finally, it should be recalled that none of the models studied verifies the initial hypotheses. Therefore, the results could be improved by studying another type of method.
  • More details

About

Comparative study of the main techniques used for COVID modeling where the information available is infected curve. The objective is to identify those univariate techniques that produce the best results, analyzing whether the more complex models are really able to provide better predictions.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published