What’s new in tidycensus

2022 1-year ACS and 2020 Detailed DHC-A data

Kyle Walker

Don’t miss my GIS workshop series in October!

Use the discount code DDHCA for 25% off the following:

About me

Getting started

  • To use these new features, make sure that tidycensus 1.5 and tigris 2.0.4 are installed.
install.packages("tidycensus", "tigris", "mapview")

The 2022 1-year American Community Survey data

What is the ACS?

  • Annual survey of 3.5 million US households

  • Covers topics not available in decennial US Census data (e.g. income, education, language, housing characteristics)

  • Available as 1-year estimates (for geographies of population 65,000 and greater) and 5-year estimates (for geographies down to the block group)

  • Data delivered as estimates characterized by margins of error

Working with ACS data in tidycensus

  • The get_acs() function is your portal to access ACS data using tidycensus

  • The two required arguments are geography and variables. The function defaults to the 2017-2021 5-year ACS

library(tidycensus)

median_income <- get_acs(
  geography = "tract",
  variables = "B19013_001",
  state = "TX",
  year = 2021
)
  • ACS data are returned with five columns: GEOID, NAME, variable, estimate, and moe
median_income
# A tibble: 6,896 × 5
   GEOID       NAME                                      variable estimate   moe
   <chr>       <chr>                                     <chr>       <dbl> <dbl>
 1 48001950100 Census Tract 9501, Anderson County, Texas B19013_…    61325  9171
 2 48001950401 Census Tract 9504.01, Anderson County, T… B19013_…    92813 45136
 3 48001950402 Census Tract 9504.02, Anderson County, T… B19013_…       NA    NA
 4 48001950500 Census Tract 9505, Anderson County, Texas B19013_…    41713  6650
 5 48001950600 Census Tract 9506, Anderson County, Texas B19013_…    32552 12274
 6 48001950700 Census Tract 9507, Anderson County, Texas B19013_…    35811  5573
 7 48001950800 Census Tract 9508, Anderson County, Texas B19013_…    52612 12426
 8 48001950901 Census Tract 9509.01, Anderson County, T… B19013_…    47336  3806
 9 48001950902 Census Tract 9509.02, Anderson County, T… B19013_…    47068 10004
10 48001951001 Census Tract 9510.01, Anderson County, T… B19013_…    55063  7833
# ℹ 6,886 more rows

1-year ACS data

  • 1-year ACS data are more current, but are only available for geographies of population 65,000 and greater

  • Access 1-year ACS data with the argument survey = "acs1"; defaults to "acs5"

median_income_1yr <- get_acs(
  geography = "place",
  state = "TX",
  variables = "B19013_001",
  year = 2022,
  survey = "acs1" 
)

The 2022 1-year ACS: best practices

Understanding limitations of the 1-year ACS

  • The 1-year American Community Survey is only available for geographies with population 65,000 and greater. This means:
  • Only 848 of 3,221 counties are available
  • Only 646 of 31,908 cities / Census-designated places are available
  • No data for Census tracts, block groups, ZCTAs, or any other geographies that typically have populations below 65,000

Finding available variables

  • Use load_variables(2022, "acs1") to view available variable codes in the 2022 1-year ACS

  • "acs1/profile" and "acs1/subject" are also available for the Data Profile and Subject Tables respectively

load_variables(2022, "acs1")
# A tibble: 36,607 × 3
   name        label                                    concept                 
   <chr>       <chr>                                    <chr>                   
 1 B01001A_001 Estimate!!Total:                         Sex by Age (White Alone)
 2 B01001A_002 Estimate!!Total:!!Male:                  Sex by Age (White Alone)
 3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years   Sex by Age (White Alone)
 4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years    Sex by Age (White Alone)
 5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years  Sex by Age (White Alone)
 6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years  Sex by Age (White Alone)
 7 B01001A_007 Estimate!!Total:!!Male:!!18 and 19 years Sex by Age (White Alone)
 8 B01001A_008 Estimate!!Total:!!Male:!!20 to 24 years  Sex by Age (White Alone)
 9 B01001A_009 Estimate!!Total:!!Male:!!25 to 29 years  Sex by Age (White Alone)
10 B01001A_010 Estimate!!Total:!!Male:!!30 to 34 years  Sex by Age (White Alone)
# ℹ 36,597 more rows

Data sparsity and margins of error

  • You may encounter data issues in the 1-year ACS data that are less pronounced in the 5-year ACS. For example:
  • Values available in the 5-year ACS may not be available in the corresponding 1-year ACS tables

  • If available, they will likely have larger margins of error

  • Your job as an analyst: balance need for certainty vs. need for recency in estimates

Example: Punjabi speakers by state (1-year ACS)

get_acs(
  geography = "state",
  variables = "B16001_054",
  year = 2022,
  survey = "acs1"
)
# A tibble: 52 × 5
   GEOID NAME                 variable   estimate   moe
   <chr> <chr>                <chr>         <dbl> <dbl>
 1 01    Alabama              B16001_054      666   556
 2 02    Alaska               B16001_054       NA    NA
 3 04    Arizona              B16001_054     1906  1342
 4 05    Arkansas             B16001_054       NA    NA
 5 06    California           B16001_054   154917 14153
 6 08    Colorado             B16001_054     1643  1968
 7 09    Connecticut          B16001_054     4039  2965
 8 10    Delaware             B16001_054        0   203
 9 11    District of Columbia B16001_054       NA    NA
10 12    Florida              B16001_054     3311  1969
# ℹ 42 more rows

Punjabi speakers by state (5-year ACS)

get_acs(
  geography = "state",
  variables = "B16001_054",
  year = 2021,
  survey = "acs5"
)
# A tibble: 52 × 5
   GEOID NAME                 variable   estimate   moe
   <chr> <chr>                <chr>         <dbl> <dbl>
 1 01    Alabama              B16001_054      507   248
 2 02    Alaska               B16001_054       33    55
 3 04    Arizona              B16001_054     3833   758
 4 05    Arkansas             B16001_054      478   279
 5 06    California           B16001_054   142450  6035
 6 08    Colorado             B16001_054     1323   389
 7 09    Connecticut          B16001_054     1455   462
 8 10    Delaware             B16001_054      214   180
 9 11    District of Columbia B16001_054      189    90
10 12    Florida              B16001_054     2544   631
# ℹ 42 more rows

What about mapping 1-year ACS data?

  • One of the best features of tidycensus is the argument geometry = TRUE, which gets you the correct Census geometries with no hassle

  • Typically it is difficult to map 1-year ACS data below the state level as your data will have gaps due to the population restrictions

Example: “mapping” 1-year ACS data

tx_education <- get_acs(
  geography = "county",
  variables = "DP02_0068P",
  state = "TX",
  year = 2022,
  survey = "acs1",
  geometry = TRUE
)

Example: “mapping” 1-year ACS data

library(mapview)

mapview(tx_education, zcol = "estimate")

Mapping small(er) areas with PUMAs

  • Consider using Public Use Microdata Areas (PUMAs) for geographically-consistent substate mapping

  • PUMAs are typically used for microdata geography; however, I find them quite useful to approximate real state submarkets, planning areas, etc.

wa_wfh <- get_acs(
  geography = "puma",
  variables = "DP03_0024P",
  state = "WA",
  survey = "acs1",
  year = 2022,
  geometry = TRUE
)
library(mapview)

mapview(wa_wfh, zcol = "estimate")

Bonus: new Connecticut county-equivalents

  • The 2022 ACS is the first to include the new Connecticut Planning Regions in the “county” geography
ct_income <- get_acs(
  geography = "county",
  variables = "B19013_001",
  state = "CT",
  year = 2022,
  survey = "acs1",
  geometry = TRUE
)
mapview(ct_income, zcol = "estimate")

Time-series analysis with the 1-year ACS: some notes

  • Variables in the Data Profile and Subject Tables can change names over time

  • You’ll need to watch out for the Connecticut issue and changing geographies

  • The 2020 1-year ACS was not released (and is not in tidycensus), so your time-series can break if you are using iteration to pull data

The 2020 Decennial Census Detailed DHC-A File

The Detailed DHC-A File

  • Tabulation of 2020 Decennial Census results for population by sex and age

  • Key feature: break-outs for thousands of racial and ethnic groups

Limitations of the DDHC-A File

  • An “adaptive design” is used, meaning that data for different groups / geographies may be found in different tables

  • There is considerable sparsity in the data, especially when going down to the Census tract level

Getting Decennial Census data in tidycensus

library(tidycensus)

bexar_population <- get_decennial(
  geography = "tract",
  variables = "P1_001N",
  state = "TX",
  county = "Bexar",
  sumfile = "dhc",
  year = 2020
)

Using the DDHC-A File in tidycensus

  • You’ll query the DDHC-A file with the argument sumfile = "ddhca" in get_decennial()

  • A new argument, pop_group, is required to use the DDHC-A; it takes a population group code.

  • Use pop_group = "all" to query for all groups; set pop_group_label = TRUE to return the label for the population group

  • Look up variables with load_variables(2020, "ddhca")

Example usage of the DDHC-A File

mn_population_groups <- get_decennial(
  geography = "state",
  variables = "T01001_001N",
  state = "MN",
  year = 2020,
  sumfile = "ddhca",
  pop_group = "all",
  pop_group_label = TRUE
)
mn_population_groups
# A tibble: 2,996 × 6
   GEOID NAME      pop_group pop_group_label   variable      value
   <chr> <chr>     <chr>     <chr>             <chr>         <dbl>
 1 27    Minnesota 1002      European alone    T01001_001N 3162905
 2 27    Minnesota 1003      Albanian alone    T01001_001N     512
 3 27    Minnesota 1004      Alsatian alone    T01001_001N      27
 4 27    Minnesota 1005      Andorran alone    T01001_001N      NA
 5 27    Minnesota 1006      Armenian alone    T01001_001N     605
 6 27    Minnesota 1007      Austrian alone    T01001_001N    2552
 7 27    Minnesota 1008      Azerbaijani alone T01001_001N     103
 8 27    Minnesota 1009      Basque alone      T01001_001N      52
 9 27    Minnesota 1010      Belarusian alone  T01001_001N    1579
10 27    Minnesota 1011      Belgian alone     T01001_001N    3864
# ℹ 2,986 more rows

Looking up group codes

  • A new function, get_pop_groups(), helps you look up population group codes

  • It works for SF2/SF4 in 2000 and SF2 in 2010 as well!

available_groups <- get_pop_groups(2020, "ddhca")

Understanding sparsity in the DDHC-A File

  • The DDHC-A File uses an “adaptive design” that makes certain tables available for specific geographies

You may see this error…

get_decennial(
  geography = "county",
  variables = "T02001_001N",
  state = "MN",
  county = "Hennepin",
  pop_group = "1325",
  year = 2020,
  sumfile = "ddhca"
)
Error in `get_decennial()`:
! Error in load_data_decennial(geography, variables, key, year, sumfile,  : 
  Your DDHC-A request returned No Content from the API.
ℹ The DDHC-A file uses an 'adaptive design' where data availability varies by geography and by population group.
ℹ Read Section 3-1 at https://www2.census.gov/programs-surveys/decennial/2020/technical-documentation/complete-tech-docs/detailed-demographic-and-housing-characteristics-file-a/2020census-detailed-dhc-a-techdoc.pdf for more information.
ℹ In tidycensus, use the function `check_ddhca_groups()` to see if your data is available.

How to check for data availability

  • A new function, check_ddhca_groups(), can be used to see which tables to use for the data you want
check_ddhca_groups(
  geography = "county", 
  pop_group = "1325", 
  state = "MN", 
  county = "Hennepin"
)

Mapping DDHC-A data

  • Given data sparsity in the DDHC-A data, should you make maps with it?

  • I’m not personally a fan of mapping data that are geographically sparse. But…

  • I think it is OK to map DDHC-A data if you think through the data limitations in your map design

Example: Somali populations by Census tract in Minneapolis

library(tidycensus)

hennepin_somali <- get_decennial(
  geography = "tract",
  variables = "T01001_001N",
  state = "MN",
  county = "Hennepin",
  year = 2020,
  sumfile = "ddhca",
  pop_group = "1325",
  pop_group_label = TRUE,
  geometry = TRUE
)
mapview(hennepin_somali, zcol = "value")

Alternative approach: dot-density mapping

  • I don’t think choropleth maps are advisable with geographically incomplete data in most cases

  • Other map types - like graduated symbols or dot-density maps - may be more appropriate

  • The tidycensus function as_dot_density() allows you to specify the number of people represented in each dot, which means you can represent data-suppressed areas as 0 more confidently

somali_dots <- as_dot_density(
  hennepin_somali,
  value = "value",
  values_per_dot = 25
)

mapview::mapview(somali_dots, cex = 0.01, layer.name = "Somali population<br>1 dot = 25 people",
                 col.regions = "navy", color = "navy")

Notes on differential privacy

  • The use of differential privacy in the 2020 DDHC-A File makes some traditional Census analyses impossible

  • For example, groups within a group hierarchy may not sum to the parent group

  • Small counts (e.g. at the tract level) will be suppressed; the threshold is 22 for detailed groups

  • See the DDHC-A technical documentation for more information

Thank you!