Skip to content

scavany/covid19-forecast-hub

 
 

Repository files navigation

COVID-19 Forecast Hub

The goal of this repository is to create a standardized set of data on forecasts from teams making projections of cumulative and incident deaths and incident hospitalizations due to COVID-19 in the United States. This repository is the data source for the official CDC COVID-19 Forecasting page. This project to collect, standardize, visualize and synthesize forecast data has been led by the CDC-funded UMass-Amherst Influenza Forecasting Center of Excellence based at the Reich Lab, with contributions from many others.

This README provides an overview of the project. Additional specific links can be found in the list below:

chart

Data license and reuse

We are grateful to the teams who have generated these forecasts. They have spent a huge amount of time and effort in a short amount of time to operationalize these important real-time forecasts. The groups have graciously and courageously made their public data available under different terms and licenses. You will find the licenses (when provided) within the model-specific folders in the data-processed directory. Please consult these licenses before using these data to ensure that you follow the terms under which these data were released.

All source code that is specific to this project, along with our d3-foresight visualization tool is available under an open-source MIT license. We note that this license does NOT cover model code from the various teams (maybe available from them under other licenses) or model forecast data (available under specified licenses as described above).

What forecasts we are tracking, and for which locations

Different groups are making forecasts at different times, and for different geographic scales. The specifications below were created by consulting with collaborators at CDC and looking at what models forecasting teams were already building.

What do we consider to be "gold standard" data? We will use the daily reports containing case and death data from the JHU CSSE group as the gold standard reference data for deaths in the US. These data are the time-series version of the JHU data that do occasionally contain "revisions" of previous daily reports. Note that there are not insignificant differences (especially in daily incident death data) between the JHU data and another commonly used source, from the New York Times. The team at UTexas-Austin is tracking this issue on a separate GitHub repository.

When will forecast data be updated? We will be storing new forecasts from each group as they are either provided to us directly via pull requests. Teams are encouraged to submit data as often has they have it available, although we only support one upload for each day. In general, "updates" to forecasts will not be permitted. Teams are responsible for checking that their forecasts are ready for public viewing upon submission. This can be done locally using our interactive visualization tool.

What locations will have forecasts? Currently, forecasts may be submitted for any state and county in the US and the US at the national level.

How will probabilistic forecasts be represented? Forecasts will be represented in a standard format using quantile-based representations of predictive distributions. We encourage all groups to make available the following 23 quantiles for each distribution: c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99). One goal of this effort is to create probabilistic ensemble forecasts, and having high-resolution component distributions will provide data to create better ensembles.

What forecast targets will be stored? We will store forecasts for 1 through 20 week-ahead incident and cumulative deaths, 0 through 130 day-ahead incident hospitalizations, and 1 through 8 week-ahead incident reported cases. Please refer to the technical README for details on aligning targets with forecast dates.

Ensemble model

Every Monday at 6pm ET, we will update our COVID Forecast Hub ensemble forecast and interactive visualization using the most recent forecast from each team as long as it was submitted before 6pm ET on Monday and has a forecast_date of any day since the previous Tuesday. All models meeting the above criteria will be considered for the ensemble. For inclusion in the ensemble, we additionally require that forecasts include a full set of 23 quantiles to be submitted (see technical README for details), and that the 10th quantile of the predictive distribution for a 1 week ahead forecast is not below the most recently observed data. Additionally, we perform manual visual inspection checks to ensure that forecasts are in alignment with the ground truth data. Details on which models were included each week in the ensemble are available in the ensemble metadata folder.

Depending on how the project evolves, we may add additional weekly builds for the ensemble and visualization. Currently, our ensemble is created by taking the arithmetic average of each quantile for all models that submit 1- through 4-week ahead cumulative death targets for a given location. Ensemble methods and inclusion criteria may evolve as more data becomes available.

Forecast files

Participating teams provide their forecasts in a quantile-based format. We have developed specifications that can be used to represent all of the forecasts in a simple, long-form data format. For details about this file format specifications, please see the technical README.

Teams and models

Our list of teams whose forecasts are currently standardized and in the repository are (with data reuse license):

Participating teams must provide a metadata file.

The COVID Forecast Hub Team

Carefully curating these datasets into a standard format has taken a Herculean team effort. The following lists those who have helped out, in reverse alphabetical order:

  • Nutcha Wattanachit (ensemble model, data processing)
  • Serena Wang (data curation)
  • Nicholas Reich (project lead, ensemble model, data processing)
  • Evan Ray (ensemble model)
  • Jarad Niemi (data processing and organization)
  • Khoa Le (validation, automation)
  • Ayush Khandelwal (architecture, data curation)
  • Abdul Hannan Kanji (architecture, data curation)
  • Katie House (visualization, validation, project management)
  • Estee Cramer (data curation, ensemble model)
  • Matt Cornell (validation, Zoltar integration)
  • Andrea Brennen (metadata curation)
  • Johannes Bracher (evaluation, data processing)

About

Projections of COVID-19, in standardized format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 66.3%
  • JavaScript 11.9%
  • R 7.8%
  • Python 4.6%
  • CSS 2.9%
  • Vue 2.6%
  • Other 3.9%