Skip to content

ccodwg/Covid19CanadaData

Repository files navigation

Covid19CanadaData: Download Canadian COVID-19 Data

The goal of Covid19CanadaData is to facilitate the acquisition of Canadian COVID-19 data from the following sources:

  • Live versions of Canadian COVID-19 datasets available on the Internet
  • The Canadian COVID-19 Data Archive, which provides daily snapshots of COVID-19 data from various Canadian government sources (and select non-governmental sources), via live URLs (for current versions) and Amazon S3 (for archived versions). All datasets are catalogued in datasets.json

Covid19CanadaData is part of Covid19CanadaETL, which is used to assemble the Covid19Canada dataset from the COVID-19 Canada Open Data Working Group. It is also used in the Timeline of COVID-19 in Canada, one component of the What Happened? COVID-19 in Canada project.

Installation

You can install the development version of Covid19CanadaData from GitHub with:

# install.packages("devtools")
devtools::install_github("ccodwg/Covid19CanadaData")

Note that for webpages requiring JavaScript to render their contents, Docker must be installed the Docker daemon must be running and available. See install instructions for Docker Desktop on Windows and Mac. On Linux, rootless Docker should be installed by running the below command and following the instructions:

curl -sSL https://get.docker.com/rootless | sh

On Windows, a Python installation with the packages docker and pypiwin32 and the R package reticulate are further required; see here for more details.

Examples

Live Canadian COVID-19 datasets

Below are some example commands for downloading the live versions of data catalogued in the Canadian COVID-19 Data Archive. Datasets are referenced using the UUID from datasets.json in Covid19CanadaArchive.

# download live versions of datasets catalogued in the Canadian COVID-19 Data Archive

## get PHAC epidemiology update CSV
d1 <- Covid19CanadaData::dl_dataset("314c507d-7e48-476e-937b-965499f51e8e")

## get Ontario hospitalizations CSV
d2 <- Covid19CanadaData::dl_dataset("4b214c24-8542-4d26-a850-b58fc4ef6a30")

## get summary page of Alberta respiratory virus dashboard
d3 <- Covid19CanadaData::dl_dataset("2a11bbcc-7b43-47d1-952d-437cdc9b2ffb")
rvest::html_table(d3) # extract tables

## get BC COVID-19 situation report (requires Docker)
d4 <- Covid19CanadaData::dl_dataset("b85ca9d5-3a88-403d-9444-cac73ffb2d3f")
rvest::html_table(d4) # extract tables

Archived Canadian COVID-19 datasets

# load most recent archived PHAC epidemiology update CSV
# and current live version into R
# returns a list of data frames named according to date
Covid19CanadaData::dl_archive(
  uuid = "314c507d-7e48-476e-937b-965499f51e8e",
  date = "latest" # latest archived version
)

# download BC Regional Health Authority cumulative summary JSON files
# from December 2021 to a temporary directory
# saves files to local drive rather than loading into R
temp_dir <- tempdir() # define temporary directory
Covid19CanadaData::dl_archive(
  uuid = "91367e1d-8b79-422c-b314-9b3441ba4f42",
  after = "2021-12-01",
  before = "2021-12-31",
  path = temp_dir,
  remove_duplicates = TRUE # don't download duplicates files (default = TRUE)
)
list.files(temp_dir) # list files

Citing this package

A citation for Covid19CanadaData may be generated by running citation("Covid19CanadaData").