/
quickstart.Rmd
772 lines (660 loc) · 23.3 KB
/
quickstart.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
---
title: "C3.ai COVID-19 Data Lake Quickstart in R"
output: html_notebook
---
# C3.ai COVID-19 Data Lake Quickstart in R
Version 5.0 (August 11, 2020)
This R notebook shows some examples of how to access and use each of the [C3.ai COVID-19 Data Lake](https://c3.ai/covid/) APIs. These examples show only a small piece of what you can do with the C3.ai COVID-19 Data Lake, but will get you started with performing your own exploration. See the [API documentation](https://c3.ai/covid-19-api-documentation/) for more details.
Please contribute your questions, answers and insights on [Stack Overflow](https://www.stackoverflow.com). Tag `c3ai-datalake` so that others can view and help build on your contributions. For support, please send email to: [covid@c3.ai](mailto:covid@c3.ai).
To view an outline of this notebook, use the RStudio keyboard shortcut Control + Shift + O on Windows or Command + Shift + O on Mac.
Import the tidyverse (version >= 1.3.0), httr, jsonlite, and fitdistrplus libraries to use this notebook.
```{r}
if (!require(tidyverse)) install.packages('tidyverse')
if (!require(httr)) install.packages('httr')
if (!require(jsonlite)) install.packages('jsonlite')
if (!require(fitdistrplus)) install.packages('fitdistrplus')
library(tidyverse)
library(httr)
library(jsonlite)
library(fitdistrplus)
```
## Helper methods for accessing the API
The helper methods in `c3aidatalake.R` convert a JSON response from the C3.ai APIs to a Tibble. You may wish to view the code in `c3aidatalake.R` before using the quickstart examples.
```{r}
source("c3aidatalake.R")
```
## Access OutbreakLocation data
OutbreakLocation stores location data such as countries, provinces, cities, where COVID-19 outbeaks are recorded. See the [API documentation](https://c3.ai/covid-19-api-documentation/#tag/OutbreakLocation) for more details and for a list of available locations.
```{r}
# Fetch facts about Germany
locations <- fetch(
"outbreaklocation",
list(
spec = list(
filter = "id == 'Germany'"
)
)
)
locations %>%
dplyr::select(-location) %>%
unnest_wider(fips, names_sep = ".")
```
### Case counts
A variety of sources provide counts of cases, deaths, recoveries, and other statistics for counties, provinces, and countries worldwide.
```{r}
# Total number of confirmed cases, deaths, and recoveries in Santa Clara, California
today <- Sys.Date()
casecounts <- evalmetrics(
"outbreaklocation",
list(
spec = list(
ids = list("SantaClara_California_UnitedStates"),
expressions = list("JHU_ConfirmedCases", "JHU_ConfirmedDeaths", "JHU_ConfirmedRecoveries"),
start = "2020-01-01",
end = today,
interval = "DAY"
)
)
)
casecounts
```
Plot these counts.
```{r}
casecounts %>%
ggplot(aes(dates, data, color = value_id)) +
geom_line() +
facet_wrap(vars(name)) +
labs(
x = "Date",
y = "Count",
color = "Metric"
)
```
Export case counts as a .csv file.
```{r}
# Uncomment the line below to export the Tibble as a .csv file
# casecounts %>% write_csv("casecounts.csv")
```
### Demographics
Demographic and economic data from the US Census Bureau and The World Bank allow demographic comparisons across locations.
```{r}
# Fetch global demographic data
population <- fetch(
"populationdata",
list(
spec = list(
filter = "!contains(parent, '_') && (populationAge == '>=65' || populationAge == 'Total') && gender == 'Male/Female' && year == '2018' && estimate == 'False' && percent == 'False'"
)
),
get_all = TRUE
)
population_age_distribution <-
population %>%
mutate(location = map_chr(parent, ~.[[1]])) %>%
drop_na(value) %>%
dplyr::select(location, populationAge, value) %>%
pivot_wider(names_from = populationAge, values_from = value) %>%
mutate(proportion_over_65 = `>=65` / Total) %>%
arrange(desc(proportion_over_65))
population_age_distribution
```
Access global death counts.
```{r}
# Retrieve global deaths as of May 1
global_deaths <- evalmetrics(
"outbreaklocation",
list(
spec = list(
ids = population_age_distribution$location,
expressions = list("JHU_ConfirmedDeaths"),
start = "2020-05-01",
end = "2020-05-01",
interval = "DAY"
)
),
get_all = TRUE
)
global_deaths
```
Plot the results.
```{r}
global_deaths %>%
dplyr::select(name, total_deaths = data) %>%
# Bring the two datasets together
inner_join(population_age_distribution, by = c("name" = "location")) %>%
mutate(deaths_per_million = 1e6 * total_deaths / Total) %>%
# Plot the data
ggplot(aes(proportion_over_65, deaths_per_million)) +
geom_point() +
geom_text(
aes(label = name),
data = . %>% top_frac(0.05, wt = deaths_per_million),
hjust = -0.2,
vjust = -0.1,
size = 2
) +
scale_x_continuous(labels = scales::percent) +
labs(
title = "Several countries have high COVID-19 death rates and older populations",
subtitle = "Using data from The World Bank and Johns Hopkins University",
x = "Proportion of population over 65",
y = "Confirmed COVID-19 deaths\nper million people"
)
```
### Mobility
Mobility data from Apple and Google provide a view of the impact of COVID-19 and social distancing on mobility trends.
```{r}
mobility_trends <- evalmetrics(
"outbreaklocation",
list(
spec = list(
ids = list("DistrictofColumbia_UnitedStates"),
expressions = list(
"Apple_WalkingMobility",
"Apple_DrivingMobility",
"Google_ParksMobility",
"Google_ResidentialMobility"
),
start = "2020-03-01",
end = "2020-04-01",
interval = "DAY"
)
),
get_all = TRUE
)
mobility_trends
```
Plot these mobility trends.
```{r}
mobility_trends %>%
ggplot(aes(dates, data/100, color = value_id)) +
geom_hline(aes(yintercept = 1), linetype = "dashed") +
geom_line() +
scale_y_continuous(labels = scales::percent) +
facet_wrap(vars(name)) +
labs(
x = "Date",
y = "Mobility compared to baseline",
color = "Metric"
)
```
### Projections
Use the GetProjectionHistory API to retrieve versioned time series projections for specific metrics made at specific points in time.
```{r}
# Retrieve projections made between April 13 and May 1 of mean total cumulative deaths in Spain from April 13 to May 13
projections <- getprojectionhistory(
list(
outbreakLocation = "Spain",
metric = "UniversityOfWashington_TotdeaMean_Hist",
metricStart = "2020-04-13",
metricEnd = "2020-05-13",
observationPeriodStart = "2020-04-13",
observationPeriodEnd = "2020-05-01"
)
)
projections
```
```{r}
# Retrieve actual total cumulative deaths in Spain from April 1 to May 13
deaths <- evalmetrics(
"outbreaklocation",
list(
spec = list(
ids = list("Spain"),
expressions = list("JHU_ConfirmedDeaths"),
start = "2020-04-01",
end = "2020-05-13",
interval = "DAY"
)
)
)
deaths
```
Plot the results.
```{r}
ggplot() +
geom_line(
aes(dates, data),
data = deaths,
color = "black"
) +
geom_line(
aes(dates, data, color = expr),
data = projections %>%
filter(dates >= projection_date)
) +
facet_wrap(vars(name)) +
labs(
x = "Date",
y = "Count",
color = "Metric",
title = "Cumulative death count projections versus actual count"
)
```
### Economic indicators
GDP and employment statistics by business sector from the US Bureau of Economic Analysis enable comparisons of the drivers of local economies.
```{r}
# Real GDP for AccommodationAndFoodServices and FinanceAndInsurance in Alameda County, California
realgdp <- evalmetrics(
"outbreaklocation",
list(
spec = list(
ids = list("Alameda_California_UnitedStates"),
expressions = list(
"BEA_RealGDP_AccommodationAndFoodServices_2012Dollars",
"BEA_RealGDP_FinanceAndInsurance_2012Dollars"
),
start = "2000-01-01",
end = "2020-01-01",
interval = "YEAR"
)
),
get_all = TRUE
)
realgdp
```
High frequency spending and earnings data from Opportunity Insights allow tracking of near real-time economic trends.
```{r}
# Access consumer spending in healthcare and low income earnings in the healthcare and social assistance sector in California
opportunityinsights <- evalmetrics(
"outbreaklocation",
list(
spec = list(
ids = list("California_UnitedStates"),
expressions = list(
"OIET_Affinity_SpendHcs",
"OIET_LowIncEmpAllBusinesses_Emp62"
),
start = "2020-01-01",
end = "2020-06-01",
interval = "DAY"
)
),
get_all = TRUE
)
opportunityinsights
```
Plot the results.
```{r}
opportunityinsights %>%
ggplot() +
geom_line(aes(x = dates, y = data, color = value_id)) +
labs(
x = 'Date',
y = 'Change relative to January 4-31',
title = 'California low-income earnings and consumer spending in healthcare',
color = ''
) +
scale_y_continuous(
breaks = seq(-1, 1, 0.2),
labels = scales::percent_format(1)
) +
scale_x_datetime(date_breaks = "1 month", date_labels = "%b") +
theme(legend.position = "bottom")
```
## Access LocationExposure data
LocationExposure stores information based on the movement of people's mobile devices across locations over time. It stores the following:
- Location exposure index (LEX) for a pair of locations (`locationTarget`, `locationVisited`): the fraction of mobile devices that pinged in `locationTarget` on a date that also pinged in `locationVisited` at least once during the previous 14 days. The pair (`locationTarget`, `locationVisited`) can be two county locations or two state locations.
- Device count: the number of distinct mobile devices that pinged at `locationTarget` on the date.
See the [API documentation](https://c3.ai/covid-19-api-documentation/#tag/LocationExposures) for more details.
```{r}
exposure <- read_data_json(
"locationexposure",
"getlocationexposures",
list(
spec = list(
locationTarget = "California_UnitedStates",
locationVisited = "Nevada_UnitedStates",
start = "2020-01-20",
end = "2020-04-25"
)
)
)
# Access daily LEX where `locationTarget` is California and `locationVisited` is Nevada with the the `locationExposures` field.
location_exposures <-
content(exposure)$locationExposures$value %>%
enframe() %>%
dplyr::select(-name) %>%
unnest_wider(col = c(value), names_repair = "unique") %>%
mutate(timestamp = as.POSIXct(timestamp))
# Access daily device counts with the `deviceCounts` field.
device_counts <-
content(exposure)$deviceCounts$value %>%
enframe() %>%
dplyr::select(-name) %>%
unnest_wider(col = c(value), names_repair = "unique") %>%
mutate(timestamp = as.POSIXct(timestamp))
location_exposures
device_counts
```
Plot the LEX data to see the proportion of devices in California on each date that pinged in Nevada over the previous 14 days.
```{r}
location_exposures %>%
ggplot(aes(timestamp, value)) +
geom_line() +
scale_y_continuous(labels = scales::percent) +
labs(
x = NULL,
y = "Location exposure index (LEX)",
title = "Location exposure for target location California and visited location Nevada"
)
```
## Access LineListRecord data
LineListRecord stores individual-level crowdsourced information from laboratory-confirmed COVID-19 patients. Information includes gender, age, symptoms, travel history, location, reported onset, confirmation dates, and discharge status. See the [API documentation](https://c3.ai/covid-19-api-documentation/#tag/LineListRecord) for more details.
```{r}
# Fetch the line list records tracked by MOBS Lab
records <- fetch(
"linelistrecord",
list(
spec = list(
filter = "lineListSource == 'DXY'"
)
),
get_all = TRUE
)
records %>%
unnest_wider(location, names_sep = ".")
```
What were the most common symptoms in the dataset?
```{r}
records %>%
# Get all the symptoms, which are initially comma-separated
mutate(symptoms = map(symptoms, ~str_split(., ", "))) %>%
unnest_longer(symptoms) %>%
unnest_longer(symptoms) %>%
filter(!is.na(symptoms)) %>%
# Plot the data, removing uncommon symptoms
mutate(symptoms = fct_infreq(symptoms)) %>%
mutate(symptoms = fct_lump_prop(symptoms, prop = 0.01)) %>%
mutate(symptoms = fct_rev(symptoms)) %>%
ggplot(aes(symptoms)) +
geom_histogram(stat = "count") +
coord_flip() +
labs(
x = "Symptom",
y = "Number of patients",
title = "Common COVID-19 symptoms"
)
```
If a patient is symptomatic and later hospitalized, how long does it take for them to become hospitalized after developing symptoms?
```{r}
# Get the number of days from development of symptoms to hospitalization for each patient
hospitalization_times <-
records %>%
filter(symptomStartDate <= hospitalAdmissionDate) %>%
mutate(hospitalization_time = as.integer(hospitalAdmissionDate - symptomStartDate) / (24 * 60 * 60)) %>%
# Hospitalization time of 0 days is replaced with 0.1 to indicate near-immediate hospitalization
mutate(hospitalization_time = if_else(hospitalization_time == 0, 0.1, hospitalization_time)) %>%
pull(hospitalization_time)
# Fit a gamma distribution
parameters <- fitdist(hospitalization_times, "gamma")$estimate
gamma_dist <-
tibble(x = seq(0, max(hospitalization_times), 0.01)) %>%
mutate(y = dgamma(x, shape = parameters[["shape"]], rate = parameters[["rate"]]))
# Plot the results
ggplot() +
geom_histogram(
aes(hospitalization_times, y = ..density..),
bins = max(hospitalization_times) + 1
) +
geom_line(
aes(x, y),
data = gamma_dist,
color = "red",
alpha = 0.6,
size = 2
) +
scale_y_continuous(limits = c(0, 0.5)) +
labs(
x = "Days from development of symptoms to hospitalization",
y = "Proportion of patients",
title = "Distribution of time to hospitalization"
)
```
## Join BiologicalAsset and Sequence data
BiologicalAsset stores the metadata of the genome sequences collected from SARS-CoV-2 samples in the National Center for Biotechnology Information Virus Database. Sequence stores the genome sequences collected from SARS-CoV-2 samples in the National Center for Biotechnology Information Virus Database. See the API documentation for [BiologicalAsset](https://c3.ai/covid-19-api-documentation/#tag/BiologicalAsset) and [Sequence](https://c3.ai/covid-19-api-documentation/#tag/Sequence) for more details.
```{r}
# Join data from BiologicalAsset & Sequence C3.ai Types
sequences <- fetch(
"biologicalasset",
list(
spec = list(
include = "this, sequence.sequence",
filter = "exists(sequence.sequence)"
)
)
)
sequences
```
## Access BiblioEntry data
BiblioEntry stores the metadata about the journal articles in the CORD-19 Dataset. See the [API documentation](https://c3.ai/covid-19-api-documentation/#tag/BiblioEntry) for more details.
```{r}
# Fetch metadata for the first two thousand (2000) BiblioEntry journal articles approved for commercial use
# Note that 2000 records are returned; the full dataset can be accessed using the get_all = TRUE argument in fetch
bibs <- fetch(
"biblioentry",
body = list(
spec = list(
filter = "hasFullText == true"
)
),
)
# Sort them to get the most recent articles first
bibs_sorted <-
bibs %>%
mutate(publishTime = parse_datetime(publishTime)) %>%
arrange(desc(publishTime))
bibs_sorted
```
Use GetArticleMetadata to access the full-text of these articles, or in this case, the first page text.
```{r}
# Get the ID of the most recent published article in March
bib_id <-
bibs_sorted %>%
filter(publishTime < "2020-04-01") %>%
head(1) %>%
pull(id)
bib_id
# Get the full-text, in JSON-format, of the journal article from the CORD-19 dataset
article_data <- read_data_json(
"biblioentry",
"getarticlemetadata",
list(
ids = list(bib_id)
)
)
content(article_data)$value$value[[1]]$body_text[[1]]$text
```
## Join TherapeuticAsset and ExternalLink data
TherapeuticAsset stores details about the research and development (R&D) of coronavirus therapies, for example, vaccines, diagnostics, and antibodies. ExternalLink stores website URLs cited in the data sources containing the therapies stored in the TherapeuticAssets C3.ai Type. See the API documentation for [TherapeuticAsset](https://c3.ai/covid-19-api-documentation/#tag/TherapeuticAsset) and [ExternalLink](https://c3.ai/covid-19-api-documentation/#tag/ExternalLink) for more details.
```{r}
# Join data from TherapeuticAsset & ExeternalLink C3.ai Types (productType, description, origin, and URL links)
assets <- fetch(
"therapeuticasset",
list(
spec = list(
include = "productType, description, origin, links.url",
filter = "origin == 'Milken'"
)
)
)
assets %>%
unnest(links, keep_empty = TRUE) %>%
mutate(url = map_chr(links, ~if_else(is.null(.), NA_character_, .$url))) %>%
dplyr::select(-links)
```
## Join Diagnosis and DiagnosisDetail data
Diagnosis stores basic clinical data (e.g. clinical notes, demographics, test results, x-ray or CT scan images) about individual patients tested for COVID-19, from research papers and healthcare institutions. DiagnosisDetail stores detailed clinical data (e.g. lab tests, pre-existing conditions, symptoms) about individual patients in key-value format. See the API documentation for [Diagnosis](https://c3.ai/covid-19-api-documentation/#tag/Diagnosis) and [DiagnosisDetail](https://c3.ai/covid-19-api-documentation/#tag/DiagnosisDetail) for more details.
```{r}
# Join data from Diagnosis & DiagnosisDetail C3.ai Types
diagnoses <- fetch(
"diagnosis",
list(
spec = list(
filter = "contains(testResults, 'COVID-19')",
include = "this, diagnostics.source, diagnostics.key, diagnostics.value"
)
)
)
# Convert the data into long format
diagnoses_long <-
diagnoses %>%
filter(source != 'UCSD') %>%
unnest_longer(diagnostics) %>%
mutate(
diagnostic_key = map_chr(diagnostics, ~.$key),
diagnostic_value = map_chr(diagnostics, function(x) {x[["value"]] %>% ifelse(is.null(.), NA_character_, .)}),
)
diagnoses_long
# Convert the data into a wide, sparse table
diagnoses_wide <-
diagnoses_long %>%
dplyr::select(-diagnostics) %>%
pivot_wider(names_from = diagnostic_key, values_from = diagnostic_value) %>%
type_convert()
diagnoses_wide
```
Use the GetImageURLs API to view the image associated with a diagnosis.
```{r}
diagnosis_id <- diagnoses_wide$id[[1]]
image_urls <- read_data_json(
"diagnosis",
"getimageurls",
list(
ids = list(diagnosis_id)
)
)
# Open the browser to the image URL
browseURL(content(image_urls)$value[[diagnosis_id]]$value)
```
## Access VaccineCoverage data
VaccineCoverage stores historical vaccination rates for various demographic groups in US counties and states, based on data from the US Centers for Disease Control (CDC). See the [API documentation](https://c3.ai/covid-19-api-documentation/#tag/VaccineCoverage) for more details.
```{r}
vaccine_coverage <- fetch(
"vaccinecoverage",
list(
spec = list(
filter = "vaxView == 'Influenza' && contains(vaccineDetails, 'General Population') && (location == 'California_UnitedStates' || location == 'Texas_UnitedStates') && contains(demographicClass, 'Race/ethnicity') && year == 2018"
)
)
)
vaccine_coverage %>%
unnest_wider(location, names_sep = ".")
```
How does vaccine coverage vary by race/ethnicity in these locations?
```{r}
vaccine_coverage %>%
unnest_wider(location, names_sep = ".") %>%
ggplot(aes(str_wrap(demographicClassDetails, 20), value/100, ymin = lowerLimit/100, ymax = upperLimit/100)) +
geom_col() +
geom_errorbar(width = 0.2) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
facet_grid(cols = vars(location.id)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
x = NULL,
y = "Vaccination rate",
title = "Influenza vaccination rate by race/ethnicity",
subtitle = "Error bars indicate 95% confidence interval"
)
```
## Access Policy data
LocationPolicySummary stores COVID-19 social distancing and health policies and regulations enacted by US states. See the [API documentation](https://c3.ai/covid-19-api-documentation/#tag/LocationPolicySummary) for more details.
PolicyDetail stores country-level policy responses to COVID-19 including:
* Financial sector policies (from The World Bank: Finance Related Policy Responses to COVID-19),
* Containment and closure, economic, and health system policies (from University of Oxford: Coronavirus Government Response Tracker, OxCGRT), and
* Policies in South Korea (from Data Science for COVID-19: South Korea).
See the [API documentation](https://c3.ai/covid-19-api-documentation/#tag/PolicyDetail) for more details.
```{r}
policies <- fetch(
"locationpolicysummary",
list(
spec = list(
filter = "contains(location.id, 'UnitedStates')",
limit = -1
)
)
)
policies %>%
unnest_wider(location, names_sep = ".")
```
Use the AllVersionsForPolicy API to access historical and current versions of a policy.
```{r}
versions <- read_data_json(
"locationpolicysummary",
"allversionsforpolicy",
list(
this = list(
id = "Wisconsin_UnitedStates_Policy"
)
)
)
content(versions) %>%
enframe() %>%
dplyr::select(-name) %>%
unnest_wider(col = c(value), names_repair = "unique") %>%
unnest_wider(location, names_sep = ".")
```
Fetch all school closing policies that restrict gatherings between 11-100 people from OxCGRT dataset in PolicyDetail.
```{r}
school_policies <- fetch(
"policydetail",
list(
spec = list(
filter = "contains(lowerCase(name), 'school') && value == 3 && origin == 'University of Oxford'"
)
),
get_all = TRUE
)
school_policies %>%
unnest_wider(location, names_sep = ".")
```
## Access LaborDetail data
LaborDetail stores historical monthly labor force and employment data for US counties and states from US Bureau of Labor Statistics. See the [API documentation](https://c3.ai/covid-19-api-documentation/#tag/LaborDetail) for more details.
```{r}
# Fetch the unemployment rates of counties in California in March, 2020
labordetail <- fetch(
"labordetail",
list(
spec = list(
filter = "year == 2020 && month == 3 && contains(parent, 'California_UnitedStates')"
)
),
get_all = TRUE
)
labordetail %>%
unnest_wider(parent, names_sep = ".")
```
## Access Survey data
SurveyData stores COVID-19-related public opinion, demographic, and symptom prevalence data collected from COVID-19 survey responses. See the [API documentation](https://c3.ai/covid-19-api-documentation/#tag/SurveyData) for more details.
```{r}
# Fetch participants who are located in California and who have a relatively strong intent to wear a mask in public because of COVID-19
survey <- fetch(
"surveydata",
list(
spec = list(
filter = "location == 'California_UnitedStates' && coronavirusIntent_Mask >= 75"
)
),
get_all = TRUE
)
survey %>%
unnest_wider(location, names_sep = ".")
```
Plot the results.
```{r}
survey %>%
mutate(coronavirusEmployment = str_split(coronavirusEmployment, ", ")) %>%
unnest_longer(coronavirusEmployment) %>%
count(coronavirusEmployment) %>%
mutate(coronavirusEmployment = fct_reorder(coronavirusEmployment, n, .desc = TRUE)) %>%
ggplot(aes(coronavirusEmployment, n/nrow(survey))) +
geom_col() +
scale_y_continuous(labels = scales::percent) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
title = "Employment status of CA participants with strong intent to wear mask",
x = "Response to employment status question",
y = "Proportion of participants"
)
```