/
slides.qmd
1209 lines (803 loc) · 28.7 KB
/
slides.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Tidycensus will convince you to learn R"
subtitle: "nicar.r-journalism.com/2024/"
author: "Andrew Ba Tran @abtran"
date: March 9, 2024
lightbox: true
format:
revealjs:
theme: [default, custom.scss]
embed-resources: true
logo: img/badge.png
execute:
echo: true
---
```{r setup, include = FALSE}
options(tigris_use_cache = TRUE)
```
## Workshop agenda
* nicar.r-journalism.com/2024/ (Follow along here)
* Survey: https://bit.ly/3T6LkQh
* Intro to Tidycensus and RStudio
* Wrangling Census data with Tidyverse functions
* Common Census queries
* Visualizing Census data (if there's time)
# The American Community Survey, R, and tidycensus
## What is the ACS?
* Annual survey of 3.5 million US households
* Covers more specific topics not available in __decennial__ US Census data (e.g. income, education, language, housing characteristics)
* Available as 1-year estimates (for geographies of population 65,000 and greater) and 5-year estimates (for geographies down to the block group)
* Data delivered as _estimates_ characterized by _margins of error_
## How to get ACS data
* [data.census.gov](https://data.census.gov) is the main, revamped interactive data portal for browsing and downloading Census datasets, including the ACS
* [censusreporter.org](https://censusreporter.org) is a great resource (built by news nerds) and probably a lot of inspiration for the official census website revamp
* [The US Census **A**pplication **P**rogramming **I**nterface (API)](https://www.census.gov/data/developers/data-sets.html) allows developers to access Census data resources programmatically
## tidycensus
:::: {.columns}
::: {.column width="70%"}
* R interface to the Decennial Census, American Community Survey, Population Estimates Program, and Public Use Microdata Series APIs
* First released in 2017; nearly 500,000 downloads from the Posit CRAN mirror
* [censusapi](https://www.hrecht.com/censusapi/) by data journalist Hannah Recht
* Seeks to be an API wrapper for ALL Census products
:::
::: {.column width="30%"}
![](https://walker-data.com/tidycensus/logo.png)
By [Kyle Walker](https://walker-data.com/)
:::
::::
## Census data issues I
* Groups, sub groups, sub sub groups, etc, are a pain
* Takes forever to tidy up
![](img/badcensus1.png)
## Census data issues II
* Transposing the data helps a bit but
* Still requires a lot of work to clean up
![](img/badcensus2.png)
## Tidycensus: Features
::: {.incremental}
- Wrangles Census data internally to return tidyverse-ready format (or traditional wide format if requested)
- Automatically downloads and merges Census geometries to data for __mapping__
- Includes tools for handling margins of error in the ACS and working with survey weights in the ACS PUMS
- States and counties can be requested by name (no more looking up FIPS codes!)
- Script out your process for re usability
:::
## R and RStudio
* R: programming language and software environment for data analysis (and scraping and visualization and so much more)
* RStudio: integrated development environment (IDE) for R developed by **Posit**
* Built on top of R
* Lets you view your data, write and save R (or Python) scripts or notebooks, and view graphical static and interactive outputs
## RStudio tour
![](https://sahirbhatnagar.com/EPIB607/inst/figures/RStudio_overview.PNG)
## Running code in R
* `<-` assignment saves to the environment/memory
* `#` hashes, commented out code
* Copy and paste code into the console to run (without the hash)
* run code in the console at the bottom or
* in a script, highlight the code and click the 'run' button at the top right
* or put your cursor in the script on the line of code and hit ctrl+enter (or cmd + enter)
## Getting started with tidycensus
* To get started, install the packages and files for this class
* If you are using an IRE laptop, these packages are already installed for you
```{r install-packages, eval = FALSE}
install.packages(c("tidycensus", "tidyverse", "mapview", "usethis"))
usethis::use_course("https://github.com/r-journalism/nicar-2024-tidycensus/archive/master.zip")
```
## Optional: your Census API key
* tidycensus (and the Census API) can be used without an API key, but you will be limited to 500 queries per day
* Power users: visit https://api.census.gov/data/key_signup.html to request a key, then activate the key from the link in your email.
* Once activated, use the `census_api_key()` function to set your key as an environment variable
```{r api-key, eval = FALSE}
library(tidycensus)
census_api_key("YOUR KEY GOES HERE", install = TRUE)
```
# Getting started with ACS data in tidycensus
open 01_tidycensus.R in RStudio
## Using the `get_acs()` function
* The `get_acs()` function is your portal to access ACS data using tidycensus
* The two required arguments are `geography` and `variables`. The function defaults to the latest 5-year ACS (Currently 2018-2022)
```{r acs}
library(tidycensus)
median_income <- get_acs(
geography = "county",
variables = "B25077_001", # median household income
year = 2022
)
```
---
* ACS data are returned with five columns: `GEOID`, `NAME`, `variable`, `estimate`, and `moe`
```{r view-acs}
median_income
```
## Exploring your data with RStudio
![](img/exploredata.png)
```{r explore-acs, eval=F}
View(median_income)
```
## Exporting your data
* You saved the output of the `get_acs()` function to the object **median_income**
* Export that dataframe object to your computer so you can use it wherever you want
```{r exporting}
library(readr)
write_csv(median_income, "whatever_filename_you_want.csv", na="")
```
## Take your data to Excel if you want
![](img/exported.png)
## 1-year ACS data
* 1-year ACS data are more current, but are only available for geographies of population 65,000 and greater
* Access 1-year ACS data with the argument `survey = "acs1"`; defaults to `"acs5"`
```{r acs-1-year}
#| code-line-numbers: "|5"
median_value_1yr <- get_acs(
geography = "place",
variables = "B25077_001", # median value of homes
year = 2022,
survey = "acs1"
)
```
---
```{r view-acs-1yr}
median_value_1yr
```
## Requesting tables of variables
* The `table` parameter can be used to obtain all related variables in a "table" at once
```{r census-table}
#| code-line-numbers: "|3"
income_table <- get_acs(
geography = "county",
table = "B19001",
year = 2022
)
```
---
```{r view-table}
income_table
```
# Understanding geography and variables in tidycensus
---
## US Census Geography
![](img/census_small_area_geography.jpg)
---
## Geography in tidycensus
* Information on available geographies, and how to specify them, can be found [in the tidycensus documentation](https://walker-data.com/tidycensus/articles/basic-usage.html#geography-in-tidycensus-1)
| Geography | Definition | Available by | Available in |
|----------------------------------------|-------------------------------------------------------------|-----------------|----------------------------|
| `"us"` | United States | | `get_acs()`, `get_decennial()` |
| `"region"` | Census region | | `get_acs()`, `get_decennial()` |
| `"division"` | Census division | | `get_acs()`, `get_decennial()` |
| `"state"` | State or equivalent | state | `get_acs()`, `get_decennial()` |
| `"county"` | County or equivalent | state, county | `get_acs()`, `get_decennial()` |
| `"county subdivision"` | County subdivision | state, county | `get_acs()`, `get_decennial()` |
| `"tract"` | Census tract | state, county | `get_acs()`, `get_decennial()` |
| `"block group"` OR `"cbg"` | Census block group | state, county | `get_acs()`, `get_decennial()` |
## Querying by state
* For geographies available below the state level, the `state` parameter allows you to query data for a specific state
* For smaller geographies (Census tracts, block groups), a `county` argument may also need to be included
* __tidycensus__ translates state names and postal abbreviations internally, so you don't need to remember the FIPS codes!
---
## Querying tract data requires county and state
* Example: data on median home value in San Diego County, California by Census tract
```{r query-by-state}
#| code-line-numbers: "|4|5"
sd_value <- get_acs(
geography = "tract",
variables = "B25077_001",
state = "CA",
county = "San Diego",
year = 2022
)
```
---
```{r show-query-by-state}
sd_value
```
## Searching for variables
* To search for variables, use the `load_variables()` function along with a year and dataset
* The `View()` function in RStudio allows for interactive browsing and filtering
```{r search-variables, eval = FALSE}
vars <- load_variables(2022, "acs5")
```
---
```{r eval=F}
View(vars)
```
![](img/vars.png)
## Available ACS datasets in tidycensus
* Detailed Tables
* Data Profile (add `"/profile"` for variable lookup)
* Subject Tables (add `"/subject"`)
* Comparison Profile (add `"/cprofile"`)
* Supplemental Estimates (use `"acsse"`)
* Migration Flows (access with `get_flows()`)
---
class: middle, center, inverse
## Data structure in tidycensus
---
## "Tidy" or long-form data
:::: {.columns}
::: {.column width="40%"}
* The default data structure returned by __tidycensus__ is "tidy" or long-form data, with variables by geography stacked by row
:::
::: {.column width="60%"}
```{r tidy-data}
age_sex_table <- get_acs(
geography = "state",
table = "B01001",
year = 2022,
survey = "acs1",
)
```
:::
::::
---
```{r show-tidy-data}
age_sex_table
```
## "Wide" data
:::: {.columns}
::: {.column width="40%"}
* The argument `output = "wide"` spreads Census variables across the columns, returning one row per geographic unit and one column per variable
:::
::: {.column width="60%"}
```{r wide-data}
#| code-line-numbers: "|6"
age_sex_table_wide <- get_acs(
geography = "state",
table = "B01001",
year = 2022,
survey = "acs1",
output = "wide"
)
```
:::
::::
---
```{r show-wide-data}
age_sex_table_wide
```
## Using named vectors of variables
* Census variables can be hard to remember; using a named vector to request variables will replace the Census IDs with a custom input
* In long form, these custom inputs will populate the `variable` column; in wide form, they will replace the column names
## Renaming variables easily
```{r named-variables}
#| code-line-numbers: "|4|5|6"
ca_education <- get_acs(
geography = "county",
state = "CA",
variables = c(percent_high_school = "DP02_0062P",
percent_bachelors = "DP02_0065P",
percent_graduate = "DP02_0066P"),
year = 2021
)
```
---
```{r show-named-variables}
ca_education
```
# ACS data warnings
## Understanding limitations of the 1-year ACS
* The 1-year American Community Survey is only available for geographies with population 65,000 and greater. This means:
::: {.incremental}
- Only 848 of 3,221 counties are available
- Only 646 of 31,908 cities / Census-designated places are available
- No data for Census tracts, block groups, ZCTAs, or any other geographies that typically have populations below 65,000
:::
## Data sparsity and margins of error
* You may encounter data issues in the 1-year ACS data that are less pronounced in the 5-year ACS. For example:
::: {.incremental}
* Values available in the 5-year ACS may not be available in the corresponding 1-year ACS tables
* If available, they will likely have larger margins of error
* Your job as an data journalist: balance need for _certainty_ vs. need for _recency_ in estimates
:::
## Tagalog speakers by state (1-year ACS)
```{r}
get_acs(
geography = "state",
variables = "B16001_099",
year = 2022,
survey = "acs1"
)
```
## Tagalog speakers by state (5-year ACS)
```{r}
get_acs(
geography = "state",
variables = "B16001_099",
year = 2022,
survey = "acs5"
)
```
## Other warnings
* Variables in the Data Profile and Subject Tables can change names over time
* The 2022 ACS is the first to include the new Connecticut Planning Regions in the "county" geography
* The 2020 1-year ACS was not released (and is not in tidycensus), so your time-series can break if you are using iteration to pull data
# The 2020 Decennial US Census data and R
## What is the decennial US Census?
* Complete count of the US population mandated by Article 1, Sections 2 and 9 in the US Constitution
* Directed by the US Census Bureau (US Department of Commerce); conducted every 10 years since 1790
* Used for proportional representation / congressional redistricting
* Limited set of questions asked about race, ethnicity, age, sex, and housing tenure
## 2020 US Census datasets
* The PL 94-171 Redistricting Data
* The Demographic and Housing Characteristics (DHC) file
* The Demographic Profile (for pre-tabulated variables)
* Tabulations for the 118th Congress & for Island Areas
* The Detailed DHC-A file (with very detailed racial & ethnic categories)
## 2020 US Census in Tidycensus
* The `get_decennial()` function is used to acquire data from the decennial US Census
* The two required arguments are `geography` and `variables` for the functions to work; for 2020 Census data, use `year = 2020`.
```{r}
pop20 <- get_decennial(
geography = "state",
variables = "P1_001N",
year = 2020
)
```
---
* Decennial Census data are returned with four columns: GEOID, NAME, variable, and value
```{r}
pop20
```
## Differential privacy
* When we run `get_decennial()` for the 2020 Census for the first time, we see the following messages:
```
Getting data from the 2020 decennial Census
Using the PL 94-171 Redistricting Data summary file
Note: 2020 decennial Census data use differential privacy, a technique that
introduces errors into data to preserve respondent confidentiality.
ℹ Small counts should be interpreted with caution.
ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
This message is displayed once per session.
```
## What is differential privacy?
* The Census Bureau is using _differential privacy_ in an attempt to preserve respondent confidentiality in the 2020 Census data, which is required under US Code Title 13
* Intentional errors are introduced into data, impacting the accuracy of small area counts (e.g. some blocks with children, but no adults)
* Advocates argue that differential privacy is necessary to satisfy Title 13 requirements given modern database reconstruction technologies; critics contend that the method makes data less useful with no tangible privacy benefit
## Scavenger hunt
Can you look through the `vars` table you loaded earlier and import the table that can answer this?
* How many 18 to 24 year old Korean people are there in the US (2021)?
* What percent of females in 2017 were below poverty level in the US (5 year)?
```{r, eval=F}
vars <- load_variables(2022, "acs5")
get_acs(replace_this_with_the_right_arguments)
```
_How do you find the "right" variables or Census table ID? I do a couple things: Use [CensusReporter.org](https://censusreporter.org/topics/table-codes/) or I ask the oldest data reporter in the newsroom._
# Wrangling and analyzing Census data
open 02_wrangling_census_data.R in RStudio
---
## Tidycensus functions
The basics to wrangle data
* `filter()` gets rid of rows
* `mutate()` adds columns to the dataframe
* `group_by()` and `summarize()` will aggregate the data by groups
* `arrange()` will sort the data
* `select()` will help narrow down columns
* Daisy chain all these functions together with `|>`
## Case study: Racial plurality by county
![](img/race_table.png)
```{r view2, eval=F}
View(vars) # and search for Hispanic or Latino Origin by Race
```
---
## Download race Census data
```{r}
county_diversity <- get_acs(geography = "county",
variables = c("B03002_001", # total
"B03002_003", # white alone
"B03002_004", # black alone
"B03002_005", # native american
"B03002_006", # asian alone
"B03002_007", # pi alone
"B03002_012" # hispanic or latino
),
survey="acs5",
year=2022)
```
---
```{r}
county_diversity
```
## Add a total population column
* With an argument `summary_var`
```{r}
#| code-line-numbers: "|9"
county_diversity <- get_acs(geography = "county",
variables = c("B03002_003", # white alone
"B03002_004", # black alone
"B03002_005", # native american
"B03002_006", # asian alone
"B03002_007", # pi alone
"B03002_012" # hispanic or latino
),
summary_var = "B03002_001", # total population
survey="acs5",
year=2022)
```
---
```{r}
county_diversity
```
## Add a percent column
* Using the __dplyr__ library of data wrangling functions
* `mutate()` to add a new column to the data frame
```{r}
library(dplyr)
county_diversity <- county_diversity |>
mutate(percent=estimate/summary_est*100)
```
---
```{r, eval=F}
county_diversity
```
```{r, echo=F}
county_diversity |> ungroup() |>
select(-summary_moe)
```
## Add better variable names
* `case_when()` to refactor values (within `mutate()`)
* `.default` is __else__ or if none of the factors match
* `|>` are the new pipes, aka "and then"
```{r}
#| code-line-numbers: "|2|9"
county_diversity_race <- county_diversity |>
mutate(race=case_when(
variable=="B03002_003" ~"White",
variable=="B03002_004" ~"Black",
variable=="B03002_005" ~"Native American",
variable=="B03002_006" ~"Asian",
variable=="B03002_007" ~"Pacific Islander",
variable=="B03002_012" ~"Hispanic",
.default = "Other"
))
```
---
```{r, eval=F}
county_diversity_race
```
```{r, echo=F}
county_diversity_race |> ungroup() |>
select(-summary_moe, -moe, -variable)
```
## Group up some smaller groups
* use `group_by()` to group up things
* use `summarize()` to do something (usually math) on these groups
* Let's combine the population for Asian and Pacific Islander
## Group up some smaller groups code
```{r}
#| code-line-numbers: "|6|7|11|12|13"
county_diversity_percent <- county_diversity |>
mutate(race=case_when(
variable=="B03002_003" ~"White",
variable=="B03002_004" ~"Black",
variable=="B03002_005" ~"Native American",
variable=="B03002_006" ~"Asian Pacific Islander",
variable=="B03002_007" ~"Asian Pacific Islander",
variable=="B03002_012" ~"Hispanic",
.default = "Other"
)) |>
group_by(GEOID, NAME, race) |>
summarize(estimate=sum(estimate, na.rm=T),
summary_est=mean(summary_est, na.rm=T)) |>
mutate(percent=estimate/summary_est*100)
```
---
```{r, eval=F}
county_diversity_percent
```
```{r, echo=F}
county_diversity_percent |> ungroup() |>
select(-GEOID)
```
## Sort the data frame low to high
* Use the `arrange()` function
```{r, eval=F}
#| code-line-numbers: "|3"
county_diversity_percent |>
group_by(NAME) |>
arrange(NAME, percent)
```
```{r, echo=F}
county_diversity_percent |>
group_by(NAME) |>
arrange(NAME, percent) |>
ungroup() |>
select(-GEOID, -summary_est)
```
## Sort the data frame high to low
* Use the `arrange()` function
* Use the `desc()` function
```{r, eval=F}
#| code-line-numbers: "|3"
county_diversity_percent_sorted <- county_diversity_percent |>
group_by(NAME) |>
arrange(NAME, desc(percent))
```
```{r, echo=F}
county_diversity_percent_sorted <- county_diversity_percent |>
group_by(NAME) |>
arrange(NAME, desc(percent)) |>
ungroup() |>
select(-GEOID)
```
---
```{r}
county_diversity_percent_sorted
```
Notice there are 16,110 rows...
## Narrow down the rows
* We want one row for every county
* Use the `filter()` function
```{r}
#| code-line-numbers: "|5"
county_diversity_percent_plurality <-
county_diversity_percent |>
group_by(NAME) |>
arrange(NAME, desc(percent)) |>
filter(row_number()==1)
```
---
```{r, eval=F}
county_diversity_percent_plurality
```
```{r, echo=F}
county_diversity_percent_plurality |> ungroup() |>
select(-GEOID)
```
Now there are 3,222 rows.
Which lines up with the county count in the U.S.
## Narrow down the rows II
* Use the `slice()` function
```{r}
#| code-line-numbers: "|5"
county_diversity_percent_plurality <-
county_diversity_percent |>
group_by(NAME) |>
arrange(NAME, desc(percent)) |>
slice(1)
```
## Case study: Evictions in San Diego
```{r}
sd_evictions <- read_csv("san_diego_evictions.csv")
sd_evictions
```
## Go back and modify your code
Copy and paste over the code you worked so hard on and change the geography and add state and county.
```{r}
#| code-line-numbers: "|1|2|3"
sd_tract_diversity <- get_acs(geography = "tract",
state = "California",
county = "San Diego",
variables = c("B03002_003", # white alone
"B03002_004", # black alone
"B03002_005", # native american
"B03002_006", # asian alone
"B03002_007", # pi alone
"B03002_012" # hispanic or latino
),
summary_var = "B03002_001", # total population
survey="acs5",
year=2022)
```
## Wrangle the census tract data
Nothing changes except the names of the data frames
```{r}
sd_tract_diversity_plurality <- sd_tract_diversity |>
mutate(race=case_when(
variable=="B03002_003" ~"White",
variable=="B03002_004" ~"Black",
variable=="B03002_005" ~"Native American",
variable=="B03002_006" ~"Asian Pacific Islander",
variable=="B03002_007" ~"Asian Pacific Islander",
variable=="B03002_012" ~"Hispanic",
.default = "Other"
)) |>
group_by(GEOID, NAME, race) |>
summarize(estimate=sum(estimate, na.rm=T),
summary_est=mean(summary_est, na.rm=T)) |>
mutate(percent=estimate/summary_est*100) |>
group_by(GEOID, NAME) |>
arrange(GEOID, NAME, desc(percent)) |>
slice(1)
```
---
```{r, eval=F}
sd_tract_diversity_plurality
```
```{r, echo=F}
sd_tract_diversity_plurality |> ungroup() |> select(-NAME)
```
## Join data
* Using `inner_join()` from __dplyr__
```{r, eval=F}
sd_joined <- inner_join(sd_tract_diversity_plurality, sd_evictions)
sd_joined
```
```{r, echo=F}
sd_joined <- inner_join(sd_tract_diversity_plurality, sd_evictions) |>
ungroup() |>
select(-NAME)
sd_joined
```
## Summarize the evictions data
Now you can answer which neighborhoods in San Diego had the higher eviction rates.
```{r}
sd_joined |>
group_by(race) |>
summarize(population=sum(summary_est, na.rm=T),
total_evictions=sum(total_evictions, na.rm=T)) |>
mutate(rate_of_evictions=total_evictions/population*1000) |>
arrange(desc(rate_of_evictions))
```
# Common Census queries
open 03_common_census_queries.R in RStudio
---
## Example of iterating with loops
Here's a basic "for loop" which includes setting the limits for the loop to 10.
```{r loop1, exercise=TRUE}
for (i in 1:10) {
print(i)
}
```
## Explaining loops in R
* Manually, this would have looked like `print(1)` then `print(2)` then `print(3)` one by one.
* Each loop iterates the `i` within the established limits (__1:10__)
* But this is a way to run code many times with slight variations to a value or values in the code. It all goes between the `{` and `}`.
## Multiple years of Census data
* Set up a way to append new data to the original data
```{r}
big_census_data <- tibble() # creates a blank data frame
for (i in 2020:2022) {
median_df <- get_acs( # temporary dataframe
geography = "county",
variables = "B25077_001", # median home values
year = i
) |>
mutate(year = i) # so we can identify which year
big_census_data <- bind_rows(big_census_data, median_df) |>
arrange(GEOID, year)
# appends the temporary dataframe to the permanent one
}