This is a unified COVID-19 dataset to fulfill the following objectives:
- Mapping all geospatial units globally into a unique standardized ID.
- Standardizing administrative names and codes at all levels.
- Standardizing dates, data types, and formats.
- Unifying variable names, types, and categories.
- Merging data from all credible sources at all levels.
- Cleaning the data and fixing confusing entries.
- Integrating hydrometeorological variables at all levels.
- Integrating population-weighted hydrometeorological variables.
- Integrating air quality, comorbidities, WorldPop, and other static data.
- Integrating policy data from Oxford government response tracker.
- Integrating an augmented version from all sources (future releases).
- Optimizing the data for machine learning applications.
Note that COVID-19 data for some European countries from Johns Hopkins University (JHU) Center for Systems Science and Engineering (CSSE) are reported in the global daily reports at province level, which will be replaced by higher-resolution data at NUTS 0-3 levels.
Column | Type | Description |
---|---|---|
ID | Character | Geospatial ID, unique identifier |
Date | Date | Date of data record |
Cases | Integer | Number of cumulative cases |
Cases_New | Integer | Number of new daily cases |
Type | Character | Type of the reported cases |
Age | Character | Age group of the reported cases |
Sex | Character | Sex/gender of the reported cases |
Source | Character | Data source: JHU, CTP, NYC, NYT, SES, DPC, RKI, JRC |
Type | Description |
---|---|
Active | Active cases |
Confirmed | Confirmed cases |
Deaths | Deaths |
Home_Confinement | Home confinement / isolation |
Hospitalized | Total hospitalized cases excluding intensive care units |
Hospitalized_Now | Currently hospitalized cases excluding intensive care units |
Hospitalized_Sym | Symptomatic hospitalized cases excluding intensive care units |
ICU | Total cases in intensive care units |
ICU_Now | Currently in intensive care units |
Negative | Negative tests |
Pending | Pending tests |
Positive | Positive tests, including hospitalised cases and home confinement |
Positive_Dx | Positive cases emerged from clinical activity / diagnostics |
Positive_Sc | Positive cases emerging from surveys and tests |
Recovered | Recovered cases |
Tested | Cases tested = Tests - Pending |
Tests | Total performed tests |
Ventilator | Total cases receiving mechanical ventilation |
Ventilator_Now | Currently receiving mechanical ventilation |
Source | Description | Level |
---|---|---|
JHU | Johns Hopkins University CSSE | Global & County/State, United States |
CTP | The COVID Tracking Project | State, United States |
NYC | New York City Department of Health and Mental Hygiene | ZCTA/Borough, New York City |
NYT | The New York Times | County/State, United States |
SES | Monitoring COVID-19 Cases and Deaths in Brazil | Municipality/State/Country, Brazil |
DPC | Italian Civil Protection Department | NUTS 0-3, Italy |
RKI | Robert Koch-Institut, Germany | NUTS 0-3, Germany |
JRC | Joint Research Centre | Global & NUTS 0-3, Europe |
ERA5 | The fifth generation of ECMWF reanalysis | All levels |
NLDAS | North American Land Data Assimilation System | County/State, United States |
CIESIN | C. for International Earth Science Information Net. | Global gridded population |
OxCGRT | Oxford COVID-19 Government Response Tracker | National (global) & subnational (US, UK) |
This work is supported by NASA Health & Air Quality project 80NSSC18K0327
, under a COVID-19 supplement, and National Institute of Health (NIH) project 3U19AI135995-03S1
("Consortium for Viral Systems Biology (CViSB)"; Collaboration with The Scripps Research Institute and UCLA).
To cite this dataset:
Badr, H. S., B. F. Zaitchik, G. H. Kerr, J. M. Colston, P. Hinson, Y. Chen, N. H. Nguyen, M. Kosek, H. Du, E. Dong, M. Marshall, K. Nixon, and L. M. Gardner, 2021: Unified COVID-19 Dataset.