Name		Name	Last commit message	Last commit date
parent directory ..
Auquan-SEIR		Auquan-SEIR
CDDEP-GlobalAgentBasedModel		CDDEP-GlobalAgentBasedModel
COVID19SimCons-COVID19Sim		COVID19SimCons-COVID19Sim
COVIDhub-ensemble		COVIDhub-ensemble
CU-60contact		CU-60contact
CU-70contact		CU-70contact
CU-80contact		CU-80contact
CU-80contact1x10p		CU-80contact1x10p
CU-80contact1x5p		CU-80contact1x5p
CU-80contactw10p		CU-80contactw10p
CU-80contactw5p		CU-80contactw5p
CU-nochange		CU-nochange
CU-nointerv		CU-nointerv
CU-select		CU-select
CovidActNow-SEIR_CAN		CovidActNow-SEIR_CAN
GT-DeepCOVID		GT-DeepCOVID
GT_CHHS-COVID19		GT_CHHS-COVID19
Geneva-DeterministicGrowth		Geneva-DeterministicGrowth
IHME-CurveFit		IHME-CurveFit
ISUandPKU-vSEIdR		ISUandPKU-vSEIdR
Imperial-ensemble1		Imperial-ensemble1
Imperial-ensemble2		Imperial-ensemble2
IowaStateLW-STEM		IowaStateLW-STEM
JHU_IDD-CovidSP		JHU_IDD-CovidSP
LANL-GrowthRate		LANL-GrowthRate
MIT_CovidAnalytics-DELPHI		MIT_CovidAnalytics-DELPHI
MOBS_NEU-GLEAM_COVID		MOBS_NEU-GLEAM_COVID
NotreDame-FRED		NotreDame-FRED
PSI-DRAFT		PSI-DRAFT
Quantori-Multiagents		Quantori-Multiagents
STH-3PU		STH-3PU
SWC-TerminusCM		SWC-TerminusCM
UA-EpiCovDA		UA-EpiCovDA
UCLA-SuEIR		UCLA-SuEIR
UChicago-CovidIL		UChicago-CovidIL
UChicago-CovidIL_100		UChicago-CovidIL_100
UChicago-CovidIL_10_increase		UChicago-CovidIL_10_increase
UChicago-CovidIL_30_increase		UChicago-CovidIL_30_increase
UChicago-CovidIL_40		UChicago-CovidIL_40
UChicago-CovidIL_60		UChicago-CovidIL_60
UChicago-CovidIL_80		UChicago-CovidIL_80
UMass-ExpertCrowd		UMass-ExpertCrowd
UMass-MechBayes		UMass-MechBayes
USACE-ERDC_SEIR		USACE-ERDC_SEIR
UT-Mobility		UT-Mobility
YYG-ParamSearch		YYG-ParamSearch
README.Rmd		README.Rmd
README.md		README.md
R_forecast_file_validation.Rmd		R_forecast_file_validation.Rmd
R_forecast_file_validation.md		R_forecast_file_validation.md
contacts.txt		contacts.txt
example_inverse_cdf-1.png		example_inverse_cdf-1.png
explore_processed_data.R		explore_processed_data.R
read_processed_data.R		read_processed_data.R

README.md

Data submission instructions

This page is intended to provide teams with all the information they need to submit forecasts. All forecasts should be submitted directly to the data-processed/ folder. Data in this directory should be added to the repository through a pull request so that automatic data validation checks are run.

These instructions provide detail about the data format as well as validation that you can do prior to this pull request. In addition, we describe meta-data that each model should provide.

Table of Contents

ground truth data
data formatting
data validation
metadata format

Ground truth data

There are several different sources for death data. Currently, all forecasts will be compared to the daily reports containing death data from the JHU CSSE group as the gold standard reference data for deaths in the US. Note that there are significant differences (especially in daily incident death data) between the JHU data and another commonly used source, from the New York Times. The team at UTexas-Austin is tracking this issue on a separate GitHub repository.

We may add additional sources of ground-truth data at a future time.

Data formatting

The automatic check validates both the filename and file contents to ensure the file can be used in the visualization and ensemble forecasting.

Subdirectory

Each subdirectory within the data-processed/ directory has the format

team-model

where

team is the teamname and
model is the name of your model.

Both team and model should be less than 15 characters and not include hyphens.

Within each subdirectory, there should be a metadata file, a license file (optional), and a set of forecasts.

Metadata

Participating teams must provide a metadata file (see example), including methodological detail about their approach and a link to a file (or a file itself) describing the methods used.

Note that the information in the methods field in the metadata is what will be shown on the interactive visualization when a user hovers on your team name. For this reason, we request that the description be brief, around 200 characters (although at the moment this is not strictly enforced).

The metadata file should have the following format

metadata-team-model.txt

License (optional)

If you would like to include a license file, please use the following format

LICENSE-team.txt

Forecasts

Each forecast file within the subdirectory should have the following format

YYYY-MM-DD-team-model.csv

where

YYYY is the 4 digit year,
MM is the 2 digit month,
DD is the 2 digit day,
team is the teamname, and
model is the name of your model.

The date YYYY-MM-DD is the forecast_date.

The team and model in this file must match the team and model in the directory this file is in. Both team and model should be less than 20 characters, alpha-numeric and underscores only, with no spaces or hyphens.

Forecast file format

The file must be a comma-separated value (csv) file with the following columns (in any order):

forecast_date
target
target_end_date
location
type
quantile
value

Additional columns are allowed, but ignored.

Each row in the file is either a point or quantile forecast for a location on a particular date for a particular target.

`forecast_date`

Values in the forecast_date column must be a date in the format

YYYY-MM-DD

This is the date on which the submitted forecast were available. This will typically be the date on which the computation finishes running and produces the standard formatted file. forecast_date should correspond and be redundant with the date in the filename, but is included here by request from some analysts. We will enforce that the forecast_date for a file must be either the date on which the file was submitted to the repository or the previous day. Exceptions will be made for legitimate extenuating circumstances.

`target`

Values in the target column must be a character (string) and be one of the following specific targets:

"N day ahead cum death" where N is a number between 0 and 130
"N day ahead inc death" where N is a number between 0 and 130
"N wk ahead cum death" where N is a number between 1 and 20
"N wk ahead inc death" where N is a number between 1 and 20
"N day ahead inc hosp" where N is a number between 0 and 130

For week-ahead forecasts, we will use the specification of epidemiological weeks (EWs) defined by the US CDC. There are standard software packages to convert from dates to epidemic weeks and vice versa. E.g. MMWRweek for R and pymmwr and epiweeks for python.

We have created a csv file describing forecast collection dates and dates for which forecasts refer to can be found.

N day ahead cum death

This target is the cumulative number of deaths predicted by the model for N days after forecast_date.

As an example, for day-ahead forecasts with a forecast_date of a Monday, a 1 day ahead cum death forecast corresponds to cumulative deaths by the end of Tuesday, 2 day ahead to Wednesday, etc....

N day ahead inc death

This target is the incident (daily) number of deaths predicted by the model on day N after forecast_date.

As an example, for day-ahead forecasts with a forecast_date of a Monday, a 1 day ahead cum death forecast corresponds to incident deaths on Tuesday, 2 day ahead to Wednesday, etc....

N wk ahead cum death

This target is the cumulative number of deaths predicted by the model up to and including N weeks after forecast_date.

For week-ahead forecasts with forecast_date of Sunday or Monday of EW12, a 1 week ahead forecast corresponds to EW12 and should have target_end_date of the Saturday of EW12. For week-ahead forecasts with forecast_date of Tuesday through Saturday of EW12, a 1 week ahead forecast corresponds to EW13 and should have target_end_date of the Saturday of EW13.

A week-ahead forecast should represent the cumulative number of deaths reported on the Saturday of a given epiweek.

N wk ahead inc death

This target is the incident (weekly) number of deaths predicted by the model during the week that is N weeks after forecast_date.

For week-ahead forecasts with forecast_date of Sunday or Monday of EW12, a 1 week ahead forecast corresponds to EW12 and should have target_end_date of the Saturday of EW12. For week-ahead forecasts with forecast_date of Tuesday through Saturday of EW12, a 1 week ahead forecast corresponds to EW13 and should have target_end_date of the Saturday of EW13.

A week-ahead forecast should represent the total number of incident deaths within a given epiweek (from Sunday through Saturday, inclusive).

N day ahead inc hosp

This target is the number of new daily hospitalizations predicted by the model on day N after forecast_date.

As an example, for day-ahead forecasts with a forecast_date of a Monday, a 1 day ahead inc hosp forecast corresponds to the number of incident hospitalizations on Tuesday, 2 day ahead to Wednesday, etc....

`target_end_date`

Values in the target_end_date column must be a date in the format

YYYY-MM-DD

This is the date for the forecast target. For "# day" targets, target_end_date will be # days after forecast_date. For "# wk" targets, target_end_date will be the Saturday at the end of the week time period.

`location`

Values in the location column must be

"US" or
a two-digit number representing the US state, territory, or district fips numeric code.

This location identifies the geographical location for the forecast.

A file with FIPS codes for states in the US is available through the fips_code dataset in the tigris R package, and saved as a public CSV file. Please note that when reading in FIPS codes, they should be read in as characters to preserve any leading zeroes.

`type`

Values in the type column are either

"point" or
"quantile".

This value indicates whether that row corresponds to a point forecast or a quantile forecast. Point forecasts are used in visualization while quantile forecasts are used in visualization and in ensemble construction.

Forecasts must include exactly 1 "point" forecast for every location-target pair.

`quantile`

Values in the quantile column are either "NA" (if type is "point") or a quantile in the format

0.###

For quantile forecasts, this value indicates the quantile for the value in this row.

Teams should provide the following 23 quantiles:

c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99)

##  [1] 0.010 0.025 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500 0.550
## [14] 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990

`value`

Values in the value column are numeric indicating the "point" or "quantile" prediction for this row. For a "point" prediction, value is simply the value of that point prediction for the target and location associated with that row. For a "quantile" prediction, value is the inverse of the cumulative distribution function (CDF) for the target, location, and quantile associated with that row.

An example inverse CDF is below.

Forecast validation

To ensure proper data formatting, pull requests for new data in data-processed/ will be automatically run.

Pull request forecast validation

When a pull request is submitted, the data are validated through Travis CI which runs the tests in test-formatting.py. The intent for these tests are to validate the requirements above and specifically enumerated on the wiki. Please let us know if the wiki is inaccurate.

If the pull request fails, please follow these instructions for details on how to troubleshoot.

Run checks locally

To run these checks locally rather than waiting for the results from a pull request, follow these instructions.

R validation checks

If you cannot get the python checks to run, you can use these instructions to run some checks in R. These checks are no longer maintained, but may still be of use to teams working with R.

Data visualization

If you want to visualize your forecasts, you can use our R shiny app to visualize your forecast by running

source("explore_processed_data.R")
shinyApp(ui = ui, server = server)

from within the data-processed/ folder. This is mainly an internal tool we use to help us know what forecasts are in the repository. Thus, it is provided as-is within no warranty.

Files

data-processed

Directory actions

More options