slides.qmd

---
title: "Tidycensus will convince you to learn R"
subtitle: "nicar.r-journalism.com/2024/"
author: "Andrew Ba Tran @abtran"
date: March 9, 2024
lightbox: true
format: 
  revealjs: 
    theme: [default, custom.scss]
    embed-resources: true
    logo: img/badge.png

execute: 
  echo: true
---


```{r setup, include = FALSE}
options(tigris_use_cache = TRUE)
```


## Workshop agenda

* nicar.r-journalism.com/2024/ (Follow along here)

* Survey: https://bit.ly/3T6LkQh

* Intro to Tidycensus and RStudio

* Wrangling Census data with Tidyverse functions

* Common Census queries

* Visualizing Census data (if there's time)

# The American Community Survey, R, and tidycensus

## What is the ACS?

* Annual survey of 3.5 million US households 

* Covers more specific topics not available in __decennial__ US Census data (e.g. income, education, language, housing characteristics)

* Available as 1-year estimates (for geographies of population 65,000 and greater) and 5-year estimates (for geographies down to the block group)

* Data delivered as _estimates_ characterized by _margins of error_

## How to get ACS data

* [data.census.gov](https://data.census.gov) is the main, revamped interactive data portal for browsing and downloading Census datasets, including the ACS

* [censusreporter.org](https://censusreporter.org) is a great resource (built by news nerds) and probably a lot of inspiration for the official census website revamp

* [The US Census **A**pplication **P**rogramming **I**nterface (API)](https://www.census.gov/data/developers/data-sets.html) allows developers to access Census data resources programmatically

## tidycensus

:::: {.columns}

::: {.column width="70%"}

* R interface to the Decennial Census, American Community Survey, Population Estimates Program, and Public Use Microdata Series APIs

* First released in 2017; nearly 500,000 downloads from the Posit CRAN mirror

* [censusapi](https://www.hrecht.com/censusapi/) by data journalist Hannah Recht
  * Seeks to be an API wrapper for ALL Census products

:::

::: {.column width="30%"}

![](https://walker-data.com/tidycensus/logo.png)

By [Kyle Walker](https://walker-data.com/)

:::

::::

## Census data issues I

* Groups, sub groups, sub sub groups, etc, are a pain
* Takes forever to tidy up

![](img/badcensus1.png)

## Census data issues II

* Transposing the data helps a bit but
* Still requires a lot of work to clean up

![](img/badcensus2.png)

## Tidycensus: Features

::: {.incremental}

- Wrangles Census data internally to return tidyverse-ready format (or traditional wide format if requested)

- Automatically downloads and merges Census geometries to data for __mapping__

- Includes tools for handling margins of error in the ACS and working with survey weights in the ACS PUMS

- States and counties can be requested by name (no more looking up FIPS codes!)

- Script out your process for re usability
  
:::

## R and RStudio

* R: programming language and software environment for data analysis (and scraping and visualization and so much more)

* RStudio: integrated development environment (IDE) for R developed by **Posit**

  * Built on top of R
  * Lets you view your data, write and save R (or Python) scripts or notebooks, and view graphical static and interactive outputs

## RStudio tour

![](https://sahirbhatnagar.com/EPIB607/inst/figures/RStudio_overview.PNG)
  
## Running code in R

* `<-` assignment saves to the environment/memory 
* `#` hashes, commented out code
  * Copy and paste code into the console to run (without the hash)
* run code in the console at the bottom or
* in a script, highlight the code and click the 'run' button at the top right
* or put your cursor in the script on the line of code and hit ctrl+enter (or cmd + enter)

## Getting started with tidycensus

* To get started, install the packages and files for this class

* If you are using an IRE laptop, these packages are already installed for you

```{r install-packages, eval = FALSE}
install.packages(c("tidycensus", "tidyverse", "mapview", "usethis"))
usethis::use_course("https://github.com/r-journalism/nicar-2024-tidycensus/archive/master.zip")
```


## Optional: your Census API key

* tidycensus (and the Census API) can be used without an API key, but you will be limited to 500 queries per day

* Power users: visit https://api.census.gov/data/key_signup.html to request a key, then activate the key from the link in your email. 

* Once activated, use the `census_api_key()` function to set your key as an environment variable

```{r api-key, eval = FALSE}
library(tidycensus)

census_api_key("YOUR KEY GOES HERE", install = TRUE)
```

# Getting started with ACS data in tidycensus

open 01_tidycensus.R in RStudio

## Using the `get_acs()` function

* The `get_acs()` function is your portal to access ACS data using tidycensus

* The two required arguments are `geography` and `variables`.  The function defaults to the latest 5-year ACS (Currently 2018-2022)

```{r acs}
library(tidycensus)

median_income <- get_acs(
  geography = "county",
  variables = "B25077_001", # median household income
  year = 2022
)
```

---

* ACS data are returned with five columns: `GEOID`, `NAME`, `variable`, `estimate`, and `moe`

```{r view-acs}
median_income
```

## Exploring your data with RStudio

![](img/exploredata.png)
```{r explore-acs, eval=F}
View(median_income)
```

## Exporting your data

* You saved the output of the `get_acs()` function to the object **median_income**
* Export that dataframe object to your computer so you can use it wherever you want

```{r exporting}
library(readr)

write_csv(median_income, "whatever_filename_you_want.csv", na="")
```

## Take your data to Excel if you want

![](img/exported.png)

## 1-year ACS data

* 1-year ACS data are more current, but are only available for geographies of population 65,000 and greater

* Access 1-year ACS data with the argument `survey = "acs1"`; defaults to `"acs5"`

```{r acs-1-year}
#| code-line-numbers: "|5"

median_value_1yr <- get_acs(
  geography = "place",
  variables = "B25077_001", # median value of homes
  year = 2022,
  survey = "acs1"
)
```

---


```{r view-acs-1yr}
median_value_1yr
```


## Requesting tables of variables

* The `table` parameter can be used to obtain all related variables in a "table" at once

```{r census-table}
#| code-line-numbers: "|3"

income_table <- get_acs(
  geography = "county", 
  table = "B19001", 
  year = 2022
)
```

---

```{r view-table}
income_table
```


# Understanding geography and variables in tidycensus

---

## US Census Geography

![](img/census_small_area_geography.jpg)


---

## Geography in tidycensus

* Information on available geographies, and how to specify them, can be found [in the tidycensus documentation](https://walker-data.com/tidycensus/articles/basic-usage.html#geography-in-tidycensus-1)

| Geography                              | Definition                                                  | Available by    | Available in               |
|----------------------------------------|-------------------------------------------------------------|-----------------|----------------------------|
| `"us"`                                   | United States                                             |                 | `get_acs()`, `get_decennial()` |
| `"region"`                               | Census region                                             |                 | `get_acs()`, `get_decennial()` |
| `"division"`                             | Census division                                           |                 | `get_acs()`, `get_decennial()` |
| `"state"`                                | State or equivalent                                         | state           | `get_acs()`, `get_decennial()` |
| `"county"`                               | County or equivalent                                        | state, county   | `get_acs()`, `get_decennial()` |
| `"county subdivision"`                 | County subdivision                                          | state, county | `get_acs()`, `get_decennial()` |
| `"tract"`                                | Census tract                                                | state, county | `get_acs()`, `get_decennial()` |
| `"block group"` OR `"cbg"`               | Census block group                                          | state, county | `get_acs()`, `get_decennial()` |

## Querying by state

* For geographies available below the state level, the `state` parameter allows you to query data for a specific state

* For smaller geographies (Census tracts, block groups), a `county` argument may also need to be included

* __tidycensus__ translates state names and postal abbreviations internally, so you don't need to remember the FIPS codes!

---

## Querying tract data requires county and state

* Example: data on median home value in San Diego County, California by Census tract

```{r query-by-state}
#| code-line-numbers: "|4|5"

sd_value <- get_acs(
  geography = "tract", 
  variables = "B25077_001", 
  state = "CA", 
  county = "San Diego",
  year = 2022
)
```

---

```{r show-query-by-state}
sd_value
```


## Searching for variables

* To search for variables, use the `load_variables()` function along with a year and dataset

* The `View()` function in RStudio allows for interactive browsing and filtering

```{r search-variables, eval = FALSE}
vars <- load_variables(2022, "acs5")
```

---


```{r eval=F}
View(vars)
```
![](img/vars.png)


## Available ACS datasets in tidycensus

* Detailed Tables 

* Data Profile (add `"/profile"` for variable lookup)

* Subject Tables (add `"/subject"`)

* Comparison Profile (add `"/cprofile"`)

* Supplemental Estimates (use `"acsse"`)

* Migration Flows (access with `get_flows()`)

---
class: middle, center, inverse

## Data structure in tidycensus

---

## "Tidy" or long-form data

:::: {.columns}

::: {.column width="40%"}

* The default data structure returned by __tidycensus__ is "tidy" or long-form data, with variables by geography stacked by row

:::

::: {.column width="60%"}


```{r tidy-data}
age_sex_table <- get_acs(
  geography = "state", 
  table = "B01001", 
  year = 2022,
  survey = "acs1",
)

```

:::

::::

---

```{r show-tidy-data}
age_sex_table
```


## "Wide" data 

:::: {.columns}

::: {.column width="40%"}

* The argument `output = "wide"` spreads Census variables across the columns, returning one row per geographic unit and one column per variable

:::

::: {.column width="60%"}

```{r wide-data}
#| code-line-numbers: "|6"

age_sex_table_wide <- get_acs(
  geography = "state", 
  table = "B01001", 
  year = 2022,
  survey = "acs1",
  output = "wide" 
)
```

:::

::::


---

```{r show-wide-data}
age_sex_table_wide
```


## Using named vectors of variables


* Census variables can be hard to remember; using a named vector to request variables will replace the Census IDs with a custom input

* In long form, these custom inputs will populate the `variable` column; in wide form, they will replace the column names

## Renaming variables easily

```{r named-variables}
#| code-line-numbers: "|4|5|6"

ca_education <- get_acs(
  geography = "county",
  state = "CA",
  variables = c(percent_high_school = "DP02_0062P", 
                percent_bachelors = "DP02_0065P",
                percent_graduate = "DP02_0066P"), 
  year = 2021
)
```


---

```{r show-named-variables}
ca_education
```


# ACS data warnings

## Understanding limitations of the 1-year ACS

* The 1-year American Community Survey is only available for geographies with population 65,000 and greater.  This means: 

::: {.incremental}

- Only 848 of 3,221 counties are available
- Only 646 of 31,908 cities / Census-designated places are available
- No data for Census tracts, block groups, ZCTAs, or any other geographies that typically have populations below 65,000

:::


## Data sparsity and margins of error

* You may encounter data issues in the 1-year ACS data that are less pronounced in the 5-year ACS.  For example: 

::: {.incremental}
* Values available in the 5-year ACS may not be available in the corresponding 1-year ACS tables

* If available, they will likely have larger margins of error

* Your job as an data journalist: balance need for _certainty_ vs. need for _recency_ in estimates

:::

## Tagalog speakers by state (1-year ACS)

```{r}
get_acs(
  geography = "state",
  variables = "B16001_099",
  year = 2022,
  survey = "acs1"
)
```

## Tagalog speakers by state (5-year ACS)

```{r}
get_acs(
  geography = "state",
  variables = "B16001_099",
  year = 2022,
  survey = "acs5"
)
```

## Other warnings

* Variables in the Data Profile and Subject Tables can change names over time

* The 2022 ACS is the first to include the new Connecticut Planning Regions in the "county" geography

* The 2020 1-year ACS was not released (and is not in tidycensus), so your time-series can break if you are using iteration to pull data

# The 2020 Decennial US Census data and R

## What is the decennial US Census?

* Complete count of the US population mandated by Article 1, Sections 2 and 9 in the US Constitution

* Directed by the US Census Bureau (US Department of Commerce); conducted every 10 years since 1790

* Used for proportional representation / congressional redistricting

* Limited set of questions asked about race, ethnicity, age, sex, and housing tenure

## 2020 US Census datasets

* The PL 94-171 Redistricting Data
* The Demographic and Housing Characteristics (DHC) file
* The Demographic Profile (for pre-tabulated variables)
* Tabulations for the 118th Congress & for Island Areas
* The Detailed DHC-A file (with very detailed racial & ethnic categories)

## 2020 US Census in Tidycensus

* The `get_decennial()` function is used to acquire data from the decennial US Census

* The two required arguments are `geography` and `variables` for the functions to work; for 2020 Census data, use `year = 2020`.

```{r}
pop20 <- get_decennial(
  geography = "state",
  variables = "P1_001N",
  year = 2020
)
```

---

* Decennial Census data are returned with four columns: GEOID, NAME, variable, and value

```{r}
pop20
```


## Differential privacy

* When we run `get_decennial()` for the 2020 Census for the first time, we see the following messages:

```
Getting data from the 2020 decennial Census
Using the PL 94-171 Redistricting Data summary file
Note: 2020 decennial Census data use differential privacy, a technique that
introduces errors into data to preserve respondent confidentiality.
ℹ Small counts should be interpreted with caution.
ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
This message is displayed once per session.
```

## What is differential privacy?

* The Census Bureau is using _differential privacy_ in an attempt to preserve respondent confidentiality in the 2020 Census data, which is required under US Code Title 13

* Intentional errors are introduced into data, impacting the accuracy of small area counts (e.g. some blocks with children, but no adults)

* Advocates argue that differential privacy is necessary to satisfy Title 13 requirements given modern database reconstruction technologies; critics contend that the method makes data less useful with no tangible privacy benefit

## Scavenger hunt

Can you look through the `vars` table you loaded earlier and import the table that can answer this?

* How many 18 to 24 year old Korean people are there in the US (2021)?
* What percent of females in 2017 were below poverty level in the US (5 year)?

```{r, eval=F}
vars <- load_variables(2022, "acs5")

get_acs(replace_this_with_the_right_arguments)
```

_How do you find the "right" variables or Census table ID? I do a couple things: Use [CensusReporter.org](https://censusreporter.org/topics/table-codes/) or I ask the oldest data reporter in the newsroom._


# Wrangling and analyzing Census data

open 02_wrangling_census_data.R in RStudio

---

## Tidycensus functions

The basics to wrangle data

* `filter()` gets rid of rows
* `mutate()` adds columns to the dataframe
* `group_by()` and `summarize()` will aggregate the data by groups
* `arrange()` will sort the data
* `select()` will help narrow down columns
* Daisy chain all these functions together with `|>`


## Case study: Racial plurality by county

![](img/race_table.png)

```{r view2, eval=F}
View(vars) # and search for Hispanic or Latino Origin by Race
```

---

## Download race Census data

```{r}
county_diversity <- get_acs(geography = "county",
                            variables = c("B03002_001", # total
                                          "B03002_003", # white alone
                                          "B03002_004", # black alone
                                          "B03002_005", # native american
                                          "B03002_006", # asian alone
                                          "B03002_007", # pi alone
                                          "B03002_012" # hispanic or latino
                            ),
                            survey="acs5",
                            year=2022)
```
---

```{r}
county_diversity
```

## Add a total population column

* With an argument `summary_var`

```{r}
#| code-line-numbers: "|9"
county_diversity <- get_acs(geography = "county",
                            variables = c("B03002_003", # white alone
                                          "B03002_004", # black alone
                                          "B03002_005", # native american
                                          "B03002_006", # asian alone
                                          "B03002_007", # pi alone
                                          "B03002_012" # hispanic or latino
                            ),
                            summary_var = "B03002_001", # total population
                            survey="acs5",
                            year=2022)
```

---

```{r}
county_diversity
```

## Add a percent column

* Using the __dplyr__ library of data wrangling functions
* `mutate()` to add a new column to the data frame

```{r}
library(dplyr)

county_diversity <- county_diversity |>
  mutate(percent=estimate/summary_est*100)
```

---

```{r, eval=F}
county_diversity
```

```{r, echo=F}
county_diversity |>   ungroup() |>
select(-summary_moe)
```

## Add better variable names

* `case_when()` to refactor values (within `mutate()`)
* `.default` is __else__ or if none of the factors match
* `|>` are the new pipes, aka "and then"

```{r}
#| code-line-numbers: "|2|9"

county_diversity_race <- county_diversity |>
  mutate(race=case_when(
    variable=="B03002_003" ~"White",
    variable=="B03002_004" ~"Black",
    variable=="B03002_005" ~"Native American",
    variable=="B03002_006" ~"Asian",
    variable=="B03002_007" ~"Pacific Islander",
    variable=="B03002_012" ~"Hispanic",
    .default = "Other"
  ))
```

---


```{r, eval=F}
county_diversity_race
```

```{r, echo=F}
county_diversity_race |>   ungroup() |>
 select(-summary_moe, -moe, -variable)
```

## Group up some smaller groups

* use `group_by()` to group up things
* use `summarize()` to do something (usually math) on these groups
* Let's combine the population for Asian and Pacific Islander

## Group up some smaller groups code

```{r}
#| code-line-numbers: "|6|7|11|12|13"

county_diversity_percent <- county_diversity |>
  mutate(race=case_when(
    variable=="B03002_003" ~"White",
    variable=="B03002_004" ~"Black",
    variable=="B03002_005" ~"Native American",
    variable=="B03002_006" ~"Asian Pacific Islander",
    variable=="B03002_007" ~"Asian Pacific Islander",
    variable=="B03002_012" ~"Hispanic",
    .default = "Other"
  )) |>
  group_by(GEOID, NAME, race) |>
  summarize(estimate=sum(estimate, na.rm=T),
            summary_est=mean(summary_est, na.rm=T)) |>
  mutate(percent=estimate/summary_est*100)
```

---

```{r, eval=F}
county_diversity_percent
```

```{r, echo=F}
county_diversity_percent |>   ungroup() |>
 select(-GEOID)
```


## Sort the data frame low to high

* Use the `arrange()` function

```{r, eval=F}
#| code-line-numbers: "|3"

county_diversity_percent |>
  group_by(NAME) |>
  arrange(NAME, percent)
```

```{r, echo=F}
county_diversity_percent |>
  group_by(NAME) |>
  arrange(NAME, percent) |>
    ungroup() |>
  select(-GEOID, -summary_est)
```

## Sort the data frame high to low

* Use the `arrange()` function
* Use the `desc()` function

```{r, eval=F}
#| code-line-numbers: "|3"

county_diversity_percent_sorted <- county_diversity_percent |>
  group_by(NAME) |>
  arrange(NAME, desc(percent))
```
```{r, echo=F}
county_diversity_percent_sorted <- county_diversity_percent |>
  group_by(NAME) |>
  arrange(NAME, desc(percent)) |>
    ungroup() |>
  select(-GEOID)
```


---

```{r}
county_diversity_percent_sorted
```

Notice there are 16,110 rows...


## Narrow down the rows

* We want one row for every county
* Use the `filter()` function

```{r}
#| code-line-numbers: "|5"

county_diversity_percent_plurality <-
  county_diversity_percent |>
  group_by(NAME) |>
  arrange(NAME, desc(percent)) |>
  filter(row_number()==1)
```

---

```{r, eval=F}
county_diversity_percent_plurality
```
```{r, echo=F}
county_diversity_percent_plurality |>   ungroup() |>
select(-GEOID)
```

Now there are 3,222 rows. 

Which lines up with the county count in the U.S.

## Narrow down the rows II

* Use the `slice()` function

```{r}
#| code-line-numbers: "|5"

county_diversity_percent_plurality <-
  county_diversity_percent |>
  group_by(NAME) |>
  arrange(NAME, desc(percent)) |>
  slice(1)
```


## Case study: Evictions in San Diego

```{r}
sd_evictions <- read_csv("san_diego_evictions.csv")

sd_evictions
```


## Go back and modify your code

Copy and paste over the code you worked so hard on and change the geography and add state and county.

```{r}
#| code-line-numbers: "|1|2|3"

sd_tract_diversity <- get_acs(geography = "tract",
                              state = "California",
                              county = "San Diego",
                            variables = c("B03002_003", # white alone
                                          "B03002_004", # black alone
                                          "B03002_005", # native american
                                          "B03002_006", # asian alone
                                          "B03002_007", # pi alone
                                          "B03002_012" # hispanic or latino
                            ),
                            summary_var = "B03002_001", # total population
                            survey="acs5",
                            year=2022)
```

## Wrangle the census tract data

Nothing changes except the names of the data frames

```{r}
sd_tract_diversity_plurality <- sd_tract_diversity |> 
  mutate(race=case_when(
      variable=="B03002_003" ~"White",
      variable=="B03002_004" ~"Black",
      variable=="B03002_005" ~"Native American",
      variable=="B03002_006" ~"Asian Pacific Islander",
      variable=="B03002_007" ~"Asian Pacific Islander",
      variable=="B03002_012" ~"Hispanic",
      .default = "Other"
    )) |>
  group_by(GEOID, NAME, race) |>
  summarize(estimate=sum(estimate, na.rm=T),
            summary_est=mean(summary_est, na.rm=T)) |>
  mutate(percent=estimate/summary_est*100) |>
  group_by(GEOID, NAME) |>
  arrange(GEOID, NAME, desc(percent)) |>
  slice(1)
```

---

```{r, eval=F}
sd_tract_diversity_plurality
```

```{r, echo=F}
sd_tract_diversity_plurality |> ungroup() |> select(-NAME)
```

## Join data

* Using `inner_join()` from __dplyr__

```{r, eval=F}
sd_joined <- inner_join(sd_tract_diversity_plurality, sd_evictions)

sd_joined
```

```{r, echo=F}
sd_joined <- inner_join(sd_tract_diversity_plurality, sd_evictions) |> 
  ungroup() |>
  select(-NAME)

sd_joined
```

## Summarize the evictions data

Now you can answer which neighborhoods in San Diego had the higher eviction rates.

```{r}
sd_joined |>
  group_by(race) |>
  summarize(population=sum(summary_est, na.rm=T),
            total_evictions=sum(total_evictions, na.rm=T)) |>
  mutate(rate_of_evictions=total_evictions/population*1000) |>
  arrange(desc(rate_of_evictions))
```

# Common Census queries

open 03_common_census_queries.R in RStudio

---

## Example of iterating with loops

Here's a basic "for loop" which includes setting the limits for the loop to 10.

```{r loop1, exercise=TRUE}
for (i in 1:10) {
  print(i)
}
```

## Explaining loops in R

* Manually, this would have looked like `print(1)` then `print(2)` then `print(3)` one by one.

* Each loop iterates the `i` within the established limits (__1:10__)

* But this is a way to run code many times with slight variations to a value or values in the code. It all goes between the `{` and `}`.


## Multiple years of Census data

* Set up a way to append new data to the original data

```{r}
big_census_data <- tibble() # creates a blank data frame

for (i in 2020:2022) {
  median_df <- get_acs( # temporary dataframe
    geography = "county",
    variables = "B25077_001", # median home values
    year = i
    ) |>
    mutate(year = i) # so we can identify which year
  
  big_census_data <- bind_rows(big_census_data, median_df) |>
    arrange(GEOID, year)
  # appends the temporary dataframe to the permanent one
}

```

---

```{r}
big_census_data
```

## Quickly calculate percent change

```{r}
library(tidyr)

home_value_change <- big_census_data |>
  ungroup() |>
  filter(year!=2021) |>
  select(NAME, estimate, year) |>
  pivot_wider(names_from="year", values_from="estimate") |>
  mutate(change=round((`2022`-`2020`)/`2020`*100,2)) |>
  arrange(desc(change))
```
---

```{r}
home_value_change
```

## Looping through states to get tracts

```{r}
state_names <- c("DC", "MD", "VA") # Get a list of state names or abbreviations

tract_data <- tibble()
for (i in 1:length(state_names)) {
    tract_df <- get_acs(
    geography = "tract",
    variables = "B25077_001",
    year = 2022,
    state=state_names[i] # Swap out the state name in the array
    )
  
  tract_data <- bind_rows(tract_data, tract_df)
}
```

---

```{r}
tract_data
```


## Get a list of state names and/or abbreviations

* Pull a list of state names from the depths of R with `state.name`
* Pull a list of state abbreviations from the depths of R with `state.abb`
* Combine them into a dataframe and don't forget to add in DC to make a name/abbreviation relationship file

---

```{r}
state_names <- c(state.name, "District of Columbia")
state_abbs <- c(state.abb, "DC")

state_df <- data.frame(state_names, state_abbs)
state_df
```


## Common census queries

* Diversity scores for counties
* Poverty quantiles for counties
* Living alone and above the ages of 65 for counties
* Check it out in `03_common_census_queries.R`

# Visualizing Census data with maps

open 04_tidycensusviz.R

## "Spatial" ACS data

* One of the best features of tidycensus is the argument `geometry = TRUE`, which gets you the correct Census geometries with no hassle

* `get_acs()` with `geometry = TRUE` returns a _spatial_ Census dataset containing _simple feature geometries_; 


## Downloading "Spatial" ACS data

* `geometry = TRUE` does the hard work for you of acquiring and pre-joining spatial Census data


```{r}
#| code-line-numbers: "|5"
median_value_map <- get_acs(
  geography = "tract",
  state= "MD",
  county="Baltimore City",
  variables = "B25077_001", # median values of home
  year = 2022,
  geometry = TRUE
)
```

---

* We get back a _simple features data frame

```{r, eval=F}
median_value_map
```


```{r, echo=F}
median_value_map |> select(-NAME, -variable)
```

## Exploring Census data interactively

```{r}
library(mapview)

mapview(median_value_map)
```

## Creating a shaded map with `zcol`

```{r}
mapview(median_value_map, zcol = "estimate")
```


## Try all the code again in a different county

```{r, eval=F}
median_value_map <- get_acs(
  geography = "tract",
  state= "MD", # Changeme
  county="Baltimore County", # Change me
  variables = "B25077_001", # median values of home
  year = 2022,
  geometry = TRUE
)

mapview(median_value_map, zcol = "estimate")
```

## Migration data

```{r}

county_migration <- get_flows(
  geography = "county",
  county = "Baltimore City",
  state = "MD"
)

county_migration
```

## Downloading map data

```{r}

county_map <- get_acs(
  geography = "county",
  variable = c("Population"="B03002_001"),
  geometry = TRUE
)

county_map
```

## Prep the migration data

```{r}
county_migration_moved <- county_migration |>
  filter(variable=="MOVEDIN") |>
  filter(!is.na(GEOID2)) |>
  select(GEOID=GEOID2, migration=estimate) 
```

## Join the migration data with the shapefile

```{r}
county_map_migration <- county_map %>%
  inner_join(county_migration_moved)

mapview(county_map_migration, zcol = "migration")
```

# Can you map migration out?

----

```{r}
county_migration_moved <- county_migration |>
  filter(variable=="MOVEDOUT") |>
  filter(!is.na(GEOID2)) |>
  select(GEOID=GEOID2, migration=estimate) 

county_map_migration <- county_map %>%
  inner_join(county_migration_moved)

mapview(county_map_migration, zcol = "migration")
```
# Thank you!

* nicar.r-journalism.com/2024/