Skip to content

Commit

Permalink
Merge pull request #60 from Infectious-Disease-Modeling-Hubs/br-v2.0.0
Browse files Browse the repository at this point in the history
Release v2.0.0
  • Loading branch information
annakrystalli committed Jul 14, 2023
2 parents 455fbc3 + 95e1985 commit ac1fae8
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 10 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
# -- Options for EPUB output
epub_show_urls = 'footnote'

schema_version = "v1.0.0"
schema_version = "v2.0.0"
# Use schema_branch variable to specify a branch in the schemas repository from which config schema will be source, especially for docson widgets.
# Useful if the schema being documented hasn't been released to the `main` branch in the schemas repo yet. If version has been released already, set this to "main".
schema_branch = "br-"+schema_version
Expand Down
16 changes: 8 additions & 8 deletions docs/source/format/model-output.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,25 @@ Model outputs are contributed by teams, and are represented in a rectangular for
* Task ids: A set of columns specifying the model task, as described [here](task_id_vars). The columns used as task ids will vary across different Hubs.

* Model output representation: A set of three columns specifying how the model outputs are represented. All three of these columns will be used by all Hubs:
1. `type` specifies the type of representation of the predictive distribution
2. `type_id` specifies more identifying information specific to the output type
1. `output_type` specifies the type of representation of the predictive distribution
2. `output_type_id` specifies more identifying information specific to the output type
3. `value` contains the model’s prediction
These are described more in the following table:

```{margin}
Note on `pmf` model output type: Values are required to sum to 1 across all `type_id` values within each combination of values of task id variables. This representation should only be used if the outcome variable is truly discrete; if the categories would represent a binned discretization of an underlying continuous variable a CDF representation is preferred.
Note on `pmf` model output type: Values are required to sum to 1 across all `output_type_id` values within each combination of values of task id variables. This representation should only be used if the outcome variable is truly discrete; if the categories would represent a binned discretization of an underlying continuous variable a CDF representation is preferred.
```

```{margin}
Note on `sample` model output type: Depending on the Hub specification, samples with the same sample index (specified by the `type_id`) may be assumed to correspond to a joint distribution across multiple levels of the task id variables. This is discussed more below.
Note on `sample` model output type: Depending on the Hub specification, samples with the same sample index (specified by the `output_type_id`) may be assumed to correspond to a joint distribution across multiple levels of the task id variables. This is discussed more below.
```
(output_type_table)=
| `type` | `type_id` | `value` |
| `output_type` | `output_type_id` | `value` |
| ------ | ------ | ------ |
| `mean` | NA (not used for mean predictions) | Numeric: the mean of the predictive distribution |
| `median` | NA (not used for median predictions) | Numeric: the median of the predictive distribution |
| `quantile` | Numeric between 0.0 and 1.0: a probability level | Numeric: the quantile of the predictive distribution at the probability level specified by the type_id |
| `cdf` | Numeric within the support of the outcome variable: a possible value of the target variable | Numeric between 0.0 and 1.0: the value of the cumulative distribution function of the predictive distribution at the value of the outcome variable specified by the type_id |
| `quantile` | Numeric between 0.0 and 1.0: a probability level | Numeric: the quantile of the predictive distribution at the probability level specified by the output_type_id |
| `cdf` | Numeric within the support of the outcome variable: a possible value of the target variable | Numeric between 0.0 and 1.0: the value of the cumulative distribution function of the predictive distribution at the value of the outcome variable specified by the output_type_id |
| `pmf` | String naming a possible category of a discrete outcome variable | Numeric between 0.0 and 1.0: the value of the probability mass function of the predictive distribution when evaluated at a specified level of a categorical outcome variable. |
| `sample` | Positive integer sample index | Numeric: a sample from the predictive distribution.

Expand All @@ -53,7 +53,7 @@ Hubs should specify the collection of task id variables for which samples are ex
Here is an example for a Hub that collects mean and quantile forecasts for one-week-ahead incidence, but probabilities for the timing of a season peak:


| `origin_epiweek` | `target` | `horizon` | `type` | `type_id` | `value` |
| `origin_epiweek` | `target` | `horizon` | `output_type` | `output_type_id` | `value` |
| ------ | ------ | ------ | ------ | ------ | ------ |
| EW202242 | weekly rate | 1 | mean | NA | 5 |
| EW202242 | weekly rate | 1 | quantile | 0.25 | 2 |
Expand Down
4 changes: 3 additions & 1 deletion docs/source/format/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,12 @@ Some task ID variables serve specific purposes. For example, every hub must have
In general, there are no restrictions on what task ID variables may be named, however when appropriate, we suggest that Hubs adopt the following standard task ID or column names and definitions:

* `origin_date`: the starting point that can be used for calculating a target_date via the formula target_date = origin_date + horizon * time_units_per_horizon (e.g., with weekly data, target_date is calculated as origin_date + horizon * 7 days).
* `forecast_date`: usually defines the date that a model is run to produce a forecast.
* `scenario_id`: a unique identifier for a scenario
* `location`: a unique identifier for a location
* `target`: a unique identifier for the target. It is recommended, although not required, that hubs set up a single variable to define the target (i.e., as a target key), with additional detail specified in the `target_metadata` section of the [tasks metadata](tasks-metadata).
* `target_date`: for short-term forecasts, the target_date specifies the date of occurrence of the outcome of interest. For instance, if models are requested to forecast the number of hospitalizations that will occur on 2022-07-15, the target_date is 2022-07-15.
* `target_variable`/`target_outcome`: task IDs making up unique identifiers of a two-part target. These task can be used in hubs that want to split up the definition of a target across two variables. In this situation, both task IDs eill de specified as target keys in the `target_metadata` section of the [tasks metadata](tasks-metadata).
* `target_date`/`target_end_date`: for short-term forecasts, the synonymous task IDs `target_date`/`target_end_date` specify the date of occurrence of the outcome of interest. For instance, if models are requested to forecast the number of hospitalizations that will occur on 2022-07-15, the target_date is 2022-07-15.
* `horizon`: The difference between the target_date and the origin_date in time units specified by the hub (e.g., may be days, weeks, or months)
* `age_group`: a unique identifier for an age group

Expand Down

0 comments on commit ac1fae8

Please sign in to comment.