Iterative indexed ‘mapping’ in R

r
gis
data science
census
Author

Kyle Walker

Published

January 12, 2024

My book Analyzing US Census Data: Methods, Maps, and Models in R, published last year, covers a lot of the data science tips and tricks I’ve learned over the years. In my academic and consulting work, I apply a lot of additional workflows that the book doesn’t cover. This year on the blog, I’d like to share some brief examples of workflows I’ve found useful with a focus on applications to Census and demographic data.

In my consulting work, I’m commonly asked to build out maps, charts, or reports for a large number of cities or regions at once. The goal here is often to allow for rapid exploration / iteration, so a basic map template might be fine. Doing this for a few cities one-by-one isn’t a problem, but it quickly gets tedious when you have dozens, if not hundreds, of visuals to produce – and keeping all the results organized can be a pain.

I’m a huge fan of R’s list data structure to help with these tasks. I struggled with lists in R when I was first learning the language, but I now find lists essential. Lists are flexible data structures that can basically store whatever objects you want. I’m partial to the named list, in which those objects are accessible by name.

Getting demographic data with tidycensus

Let’s tackle an example. We’re tasked with creating interactive, exploratory maps of the percent of the workforce who works from home by Census tract for the 100 largest metro areas in the United States. First, we’ll need to identify the 100 largest metro areas in the US - a job for the tidycensus R package.

library(tidycensus)
library(tidyverse)

top100metros <- get_acs(
  geography = "cbsa",
  variables = "B01003_001",
  year = 2022,
  survey = "acs1",
  geometry = TRUE
) %>%
  slice_max(estimate, n = 100)

We’ve pulled data here from the 2022 1-year American Community Survey for core-based statistical areas (CBSAs), which includes metropolitan statistical areas. slice_max() gets us the 100 largest values of the estimate column, which in this case stores values on total population in 2022.

Next, we’ll need to grab data on the work-from-home share by Census tract. Census tracts aren’t directly identifiable by metro, so we’ll get data for the entire US. With tidycensus, grabbing demographic data for all 50 US states + DC and Puerto Rico is straightforward; pass an appropriate vector to the state parameter as an argument.

us_wfh_tract <- get_acs(
  geography = "tract",
  variables = "DP03_0024P",
  state = c(state.abb, "DC", "PR"),
  year = 2022,
  geometry = TRUE
)

Iterative spatial overlay

library(sf)
sf_use_s2(FALSE)

us_wfh_metro <- top100metros %>%
  split(~NAME) %>%
  map(~{
    
    tract_ids <- us_wfh_tract %>%
      st_point_on_surface() %>%
      st_filter(.x) %>%
      pull(GEOID)
    
    us_wfh_tract %>%
      filter(GEOID %in% tract_ids)
  })

Let’s examine our data for any metro:

us_wfh_metro$`Baltimore-Columbia-Towson, MD Metro Area`
Simple feature collection with 713 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -77.31136 ymin: 38.71274 xmax: -75.74767 ymax: 39.72131
Geodetic CRS:  NAD83
# A tibble: 713 × 6
   GEOID       NAME            variable estimate   moe                  geometry
 * <chr>       <chr>           <chr>       <dbl> <dbl>        <MULTIPOLYGON [°]>
 1 24013505101 Census Tract 5… DP03_00…     17.8   5.4 (((-76.9689 39.41866, -7…
 2 24013505208 Census Tract 5… DP03_00…     10.3   5.6 (((-76.99383 39.36777, -…
 3 24025301602 Census Tract 3… DP03_00…      9.4   4.6 (((-76.30582 39.4331, -7…
 4 24025306100 Census Tract 3… DP03_00…      4.3   3.4 (((-76.12338 39.53333, -…
 5 24027606804 Census Tract 6… DP03_00…     20.2   6.2 (((-76.85616 39.17558, -…
 6 24027606604 Census Tract 6… DP03_00…     12.3   4.7 (((-76.83466 39.20062, -…
 7 24510151200 Census Tract 1… DP03_00…      8.1   8.9 (((-76.66506 39.33338, -…
 8 24510270702 Census Tract 2… DP03_00…      6     5.3 (((-76.57356 39.36854, -…
 9 24510271802 Census Tract 2… DP03_00…      2.6   3.5 (((-76.68352 39.34294, -…
10 24510272004 Census Tract 2… DP03_00…     20    13.8 (((-76.69473 39.36801, -…
# ℹ 703 more rows

Making maps with an indexed map()

library(mapview)

wfh_maps <- imap(us_wfh_metro, ~{
  mapview(
    .x, 
    zcol = "estimate",
    layer.name = glue::glue("% working from home<br>{.y}")
  )
})

What does the map look like?

wfh_maps$`Dallas-Fort Worth-Arlington, TX Metro Area`