GitHub - epiforecasts/multi-model-information: Characterising information loss due to aggregating epidemic model outputs

Characterising information gains and losses when collecting multiple epidemic model outputs

Katharine Sherratt (1), Ajitesh Srivastava (2), Kylie Ainslie (3), David E. Singh (4), Aymar Cublier (4), Miguel Guzman Merino (4), Maria Cristina Marinescu (5), Jesus Carretero (4), Alberto Cascajo Garcia (4), Nicolas Franco (6), Lander Willem (7), Steven Abrams (8), Christel Faes (8), Philippe Beutels (8), Niel Hens (8), Sebastian Müller (9), Billy Charlton (9), Ricardo Ewert (9), Sydney Paltra (9), Christian Rakow (9), Jakob Rehmann (9), Tim Conrad (10), Christof Schütte (10), Kai Nagel (9), Sam Abbott (1), Rok Grah (11), Rene Niehus (11), Bastian Prasse (11), Frank Sandmann (11), Sebastian Funk (1)

1 London School of Hygiene and Tropical Medicine, London, UK; 2 University of Southern California, Los Angeles, USA; 3 RIVM, Bilthoven, Netherlands; 4 Universidad Carlos III de Madrid, Madrid, Spain; 5 Barcelona Supercomputing Center, Barcelona, Spain; 6 University of Namur (Belgium), Namur, Belgium; 7 University of Antwerp (Belgium), Antwerp, Belgium; 8 University of Hasselt (Belgium), Hasselt, Belgium; 9 TU Berlin, Berlin, Germany; 10 ZIB Berlin, Berlin, Germany; 11 ECDC, Stockholm, Sweden

Now published at: https://www.sciencedirect.com/science/article/pii/S1755436524000264

Summary

We compared methods of collecting information from multiple infectious disease models. We found that key epidemic characteristics were less well represented by a quantile-summary method, while collecting modelled trajectories enabled continuous evaluation against newly observed data. The significance of information gain or loss varies with each collaboration’s aims.

Find out more:
- Read the abstract below, or the full paper
- Explore code and data, and reproduce results in the fully documented results Rmarkdown

Reproducibility and contributions

Results and supplement are generated from an Rmarkdown document
Your comments, feedback, and questions are very welcome. Please open an Issue or contact Kath Sherratt

Note: full commit history for this project prior to 16 April 2023 is available as a branch of the Scenario hub

Abstract

Background. Collaborative comparisons and combinations of multiple epidemic models are used as policy-relevant evidence during epidemic outbreaks. In the process of collecting multiple projections of the future, such collaborations may gain or lose relevant information. Typically, each modeller contributes their own probabilistic summary using descriptive statistics at each modelled time step. We compare this method to directly collecting simulated trajectories from each of the models. We aimed to explore information on key epidemic quantities; ensemble uncertainty; and performance against data in order to investigate the potential to continuously gain information from only a single cross-sectional collection of model results.

Methods. We compared July 2022 projections from the European COVID-19 Scenario Modelling Hub. Using shared scenario assumptions, five modelling teams each contributed up to 400 simulated trajectories projecting incidence in each of Belgium, the Netherlands, and Spain. We compared epidemic characteristics including incidence, peaks, and cumulative totals. We also created a probabilistic ensemble drawn from all available trajectories at each time step, and compared this to two common ensemble methods of a median across each model’s quantiles, or a linear opinion pool. We then measured the predictive accuracy of each individual trajectory compared to later observations, and used this to create a weighted ensemble combining across all simulations. We repeated this sequentially against increasing weeks of observed data. We then evaluated the performance of these ensembles to reflect their variation in performance with varying amounts of observed data.

Results. By collecting models’ simulated trajectories, we were able to show more policy-relevant epidemic characteristics, and evaluate performance against data, as opposed to collecting models’ quantiles at each time point. Sampled trajectories contained a right-skewed distribution which was poorly captured by an ensemble of models’ quantile intervals but well represented by a linear opinion pool. Ensembles weighted by predictive performance typically retained the range of plausible incidence over time, and in some cases narrowed this range by excluding some epidemic shapes altogether.

Conclusions. We observed several information gains from collecting modelled trajectories rather than summarised quantile distributions, highlighting the potential to create continuous new information from a single collection of model output. The value of different information gains and losses may vary with the aims of each collaborative effort, depending on the requirements and flexibility required by projection users. Understanding the differing information potential of methods to collect model projections can support the accuracy, sustainability, and communication of collaborative infectious disease modelling efforts.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
code		code
data		data
output		output
renv		renv
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aggregation-info-loss.Rproj		aggregation-info-loss.Rproj
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

output

output

renv

renv

.DS_Store

.DS_Store

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

aggregation-info-loss.Rproj

aggregation-info-loss.Rproj

renv.lock

renv.lock

Repository files navigation

Characterising information gains and losses when collecting multiple epidemic model outputs

Summary

Reproducibility and contributions

Abstract

About

Releases 3

Packages

Contributors 2

Languages

License

epiforecasts/multi-model-information

Folders and files

Latest commit

History

Repository files navigation

Characterising information gains and losses when collecting multiple epidemic model outputs

Summary

Reproducibility and contributions

Abstract

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages