What’s new in tidycensus

2022 1-year ACS and 2020 Detailed DHC-A data

Kyle Walker

Don’t miss my GIS workshop series in October!

Use the discount code DDHCA for 25% off the following:

October 4 (11am CT): Getting Started with Geographic Information Systems in R
October 11 (11am CT): Interactive Mapping with R
Past workshops on 2020 Census Data are available at https://walker-data.com/workshops.html

About me

Associate Professor of Geography at TCU
Spatial data science researcher and consultant
Package developer: tidycensus, tigris, mapboxapi, crsuggest, idbr (R), pygris (Python)
Book: Analyzing US Census Data: Methods, Maps and Models in R

Getting started

To use these new features, make sure that tidycensus 1.5 and tigris 2.0.4 are installed.

install.packages("tidycensus", "tigris", "mapview")

The 2022 1-year American Community Survey data

What is the ACS?

Annual survey of 3.5 million US households
Covers topics not available in decennial US Census data (e.g. income, education, language, housing characteristics)
Available as 1-year estimates (for geographies of population 65,000 and greater) and 5-year estimates (for geographies down to the block group)
Data delivered as estimates characterized by margins of error

Working with ACS data in tidycensus

The get_acs() function is your portal to access ACS data using tidycensus
The two required arguments are geography and variables. The function defaults to the 2017-2021 5-year ACS

library(tidycensus)

median_income <- get_acs(
  geography = "tract",
  variables = "B19013_001",
  state = "TX",
  year = 2021
)

ACS data are returned with five columns: GEOID, NAME, variable, estimate, and moe

median_income

# A tibble: 6,896 × 5
   GEOID       NAME                                      variable estimate   moe
   <chr>       <chr>                                     <chr>       <dbl> <dbl>
 1 48001950100 Census Tract 9501, Anderson County, Texas B19013_…    61325  9171
 2 48001950401 Census Tract 9504.01, Anderson County, T… B19013_…    92813 45136
 3 48001950402 Census Tract 9504.02, Anderson County, T… B19013_…       NA    NA
 4 48001950500 Census Tract 9505, Anderson County, Texas B19013_…    41713  6650
 5 48001950600 Census Tract 9506, Anderson County, Texas B19013_…    32552 12274
 6 48001950700 Census Tract 9507, Anderson County, Texas B19013_…    35811  5573
 7 48001950800 Census Tract 9508, Anderson County, Texas B19013_…    52612 12426
 8 48001950901 Census Tract 9509.01, Anderson County, T… B19013_…    47336  3806
 9 48001950902 Census Tract 9509.02, Anderson County, T… B19013_…    47068 10004
10 48001951001 Census Tract 9510.01, Anderson County, T… B19013_…    55063  7833
# ℹ 6,886 more rows

1-year ACS data

1-year ACS data are more current, but are only available for geographies of population 65,000 and greater
Access 1-year ACS data with the argument survey = "acs1"; defaults to "acs5"

median_income_1yr <- get_acs(
  geography = "place",
  state = "TX",
  variables = "B19013_001",
  year = 2022,
  survey = "acs1" 
)

The 2022 1-year ACS: best practices

Understanding limitations of the 1-year ACS

The 1-year American Community Survey is only available for geographies with population 65,000 and greater. This means:

Only 848 of 3,221 counties are available
Only 646 of 31,908 cities / Census-designated places are available
No data for Census tracts, block groups, ZCTAs, or any other geographies that typically have populations below 65,000

Finding available variables

Use load_variables(2022, "acs1") to view available variable codes in the 2022 1-year ACS
"acs1/profile" and "acs1/subject" are also available for the Data Profile and Subject Tables respectively

load_variables(2022, "acs1")

# A tibble: 36,607 × 3
   name        label                                    concept                 
   <chr>       <chr>                                    <chr>                   
 1 B01001A_001 Estimate!!Total:                         Sex by Age (White Alone)
 2 B01001A_002 Estimate!!Total:!!Male:                  Sex by Age (White Alone)
 3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years   Sex by Age (White Alone)
 4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years    Sex by Age (White Alone)
 5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years  Sex by Age (White Alone)
 6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years  Sex by Age (White Alone)
 7 B01001A_007 Estimate!!Total:!!Male:!!18 and 19 years Sex by Age (White Alone)
 8 B01001A_008 Estimate!!Total:!!Male:!!20 to 24 years  Sex by Age (White Alone)
 9 B01001A_009 Estimate!!Total:!!Male:!!25 to 29 years  Sex by Age (White Alone)
10 B01001A_010 Estimate!!Total:!!Male:!!30 to 34 years  Sex by Age (White Alone)
# ℹ 36,597 more rows

Data sparsity and margins of error

You may encounter data issues in the 1-year ACS data that are less pronounced in the 5-year ACS. For example:

Values available in the 5-year ACS may not be available in the corresponding 1-year ACS tables
If available, they will likely have larger margins of error
Your job as an analyst: balance need for certainty vs. need for recency in estimates

Example: Punjabi speakers by state (1-year ACS)

get_acs(
  geography = "state",
  variables = "B16001_054",
  year = 2022,
  survey = "acs1"
)

# A tibble: 52 × 5
   GEOID NAME                 variable   estimate   moe
   <chr> <chr>                <chr>         <dbl> <dbl>
 1 01    Alabama              B16001_054      666   556
 2 02    Alaska               B16001_054       NA    NA
 3 04    Arizona              B16001_054     1906  1342
 4 05    Arkansas             B16001_054       NA    NA
 5 06    California           B16001_054   154917 14153
 6 08    Colorado             B16001_054     1643  1968
 7 09    Connecticut          B16001_054     4039  2965
 8 10    Delaware             B16001_054        0   203
 9 11    District of Columbia B16001_054       NA    NA
10 12    Florida              B16001_054     3311  1969
# ℹ 42 more rows

Punjabi speakers by state (5-year ACS)

get_acs(
  geography = "state",
  variables = "B16001_054",
  year = 2021,
  survey = "acs5"
)

# A tibble: 52 × 5
   GEOID NAME                 variable   estimate   moe
   <chr> <chr>                <chr>         <dbl> <dbl>
 1 01    Alabama              B16001_054      507   248
 2 02    Alaska               B16001_054       33    55
 3 04    Arizona              B16001_054     3833   758
 4 05    Arkansas             B16001_054      478   279
 5 06    California           B16001_054   142450  6035
 6 08    Colorado             B16001_054     1323   389
 7 09    Connecticut          B16001_054     1455   462
 8 10    Delaware             B16001_054      214   180
 9 11    District of Columbia B16001_054      189    90
10 12    Florida              B16001_054     2544   631
# ℹ 42 more rows

What about mapping 1-year ACS data?

One of the best features of tidycensus is the argument geometry = TRUE, which gets you the correct Census geometries with no hassle
Typically it is difficult to map 1-year ACS data below the state level as your data will have gaps due to the population restrictions

Example: “mapping” 1-year ACS data

tx_education <- get_acs(
  geography = "county",
  variables = "DP02_0068P",
  state = "TX",
  year = 2022,
  survey = "acs1",
  geometry = TRUE
)

Example: “mapping” 1-year ACS data

library(mapview)

mapview(tx_education, zcol = "estimate")

Mapping small(er) areas with PUMAs

Consider using Public Use Microdata Areas (PUMAs) for geographically-consistent substate mapping
PUMAs are typically used for microdata geography; however, I find them quite useful to approximate real state submarkets, planning areas, etc.

wa_wfh <- get_acs(
  geography = "puma",
  variables = "DP03_0024P",
  state = "WA",
  survey = "acs1",
  year = 2022,
  geometry = TRUE
)

library(mapview)

mapview(wa_wfh, zcol = "estimate")

Bonus: new Connecticut county-equivalents

The 2022 ACS is the first to include the new Connecticut Planning Regions in the “county” geography

ct_income <- get_acs(
  geography = "county",
  variables = "B19013_001",
  state = "CT",
  year = 2022,
  survey = "acs1",
  geometry = TRUE
)

mapview(ct_income, zcol = "estimate")

Time-series analysis with the 1-year ACS: some notes

Variables in the Data Profile and Subject Tables can change names over time
You’ll need to watch out for the Connecticut issue and changing geographies
The 2020 1-year ACS was not released (and is not in tidycensus), so your time-series can break if you are using iteration to pull data

The 2020 Decennial Census Detailed DHC-A File

The Detailed DHC-A File

Tabulation of 2020 Decennial Census results for population by sex and age
Key feature: break-outs for thousands of racial and ethnic groups

Limitations of the DDHC-A File

An “adaptive design” is used, meaning that data for different groups / geographies may be found in different tables
There is considerable sparsity in the data, especially when going down to the Census tract level

Getting Decennial Census data in tidycensus

library(tidycensus)

bexar_population <- get_decennial(
  geography = "tract",
  variables = "P1_001N",
  state = "TX",
  county = "Bexar",
  sumfile = "dhc",
  year = 2020
)

Using the DDHC-A File in tidycensus

You’ll query the DDHC-A file with the argument sumfile = "ddhca" in get_decennial()
A new argument, pop_group, is required to use the DDHC-A; it takes a population group code.
Use pop_group = "all" to query for all groups; set pop_group_label = TRUE to return the label for the population group
Look up variables with load_variables(2020, "ddhca")

Example usage of the DDHC-A File

mn_population_groups <- get_decennial(
  geography = "state",
  variables = "T01001_001N",
  state = "MN",
  year = 2020,
  sumfile = "ddhca",
  pop_group = "all",
  pop_group_label = TRUE
)

mn_population_groups

# A tibble: 2,996 × 6
   GEOID NAME      pop_group pop_group_label   variable      value
   <chr> <chr>     <chr>     <chr>             <chr>         <dbl>
 1 27    Minnesota 1002      European alone    T01001_001N 3162905
 2 27    Minnesota 1003      Albanian alone    T01001_001N     512
 3 27    Minnesota 1004      Alsatian alone    T01001_001N      27
 4 27    Minnesota 1005      Andorran alone    T01001_001N      NA
 5 27    Minnesota 1006      Armenian alone    T01001_001N     605
 6 27    Minnesota 1007      Austrian alone    T01001_001N    2552
 7 27    Minnesota 1008      Azerbaijani alone T01001_001N     103
 8 27    Minnesota 1009      Basque alone      T01001_001N      52
 9 27    Minnesota 1010      Belarusian alone  T01001_001N    1579
10 27    Minnesota 1011      Belgian alone     T01001_001N    3864
# ℹ 2,986 more rows

Looking up group codes

A new function, get_pop_groups(), helps you look up population group codes
It works for SF2/SF4 in 2000 and SF2 in 2010 as well!

available_groups <- get_pop_groups(2020, "ddhca")

Understanding sparsity in the DDHC-A File

The DDHC-A File uses an “adaptive design” that makes certain tables available for specific geographies

You may see this error…

get_decennial(
  geography = "county",
  variables = "T02001_001N",
  state = "MN",
  county = "Hennepin",
  pop_group = "1325",
  year = 2020,
  sumfile = "ddhca"
)

Error in `get_decennial()`:
! Error in load_data_decennial(geography, variables, key, year, sumfile,  : 
  Your DDHC-A request returned No Content from the API.
ℹ The DDHC-A file uses an 'adaptive design' where data availability varies by geography and by population group.
ℹ Read Section 3-1 at https://www2.census.gov/programs-surveys/decennial/2020/technical-documentation/complete-tech-docs/detailed-demographic-and-housing-characteristics-file-a/2020census-detailed-dhc-a-techdoc.pdf for more information.
ℹ In tidycensus, use the function `check_ddhca_groups()` to see if your data is available.

How to check for data availability

A new function, check_ddhca_groups(), can be used to see which tables to use for the data you want

check_ddhca_groups(
  geography = "county", 
  pop_group = "1325", 
  state = "MN", 
  county = "Hennepin"
)

Mapping DDHC-A data

Given data sparsity in the DDHC-A data, should you make maps with it?
I’m not personally a fan of mapping data that are geographically sparse. But…

I think it is OK to map DDHC-A data if you think through the data limitations in your map design

Example: Somali populations by Census tract in Minneapolis

library(tidycensus)

hennepin_somali <- get_decennial(
  geography = "tract",
  variables = "T01001_001N",
  state = "MN",
  county = "Hennepin",
  year = 2020,
  sumfile = "ddhca",
  pop_group = "1325",
  pop_group_label = TRUE,
  geometry = TRUE
)

mapview(hennepin_somali, zcol = "value")

Alternative approach: dot-density mapping

I don’t think choropleth maps are advisable with geographically incomplete data in most cases
Other map types - like graduated symbols or dot-density maps - may be more appropriate
The tidycensus function as_dot_density() allows you to specify the number of people represented in each dot, which means you can represent data-suppressed areas as 0 more confidently

somali_dots <- as_dot_density(
  hennepin_somali,
  value = "value",
  values_per_dot = 25
)

mapview::mapview(somali_dots, cex = 0.01, layer.name = "Somali population<br>1 dot = 25 people",
                 col.regions = "navy", color = "navy")

Notes on differential privacy

The use of differential privacy in the 2020 DDHC-A File makes some traditional Census analyses impossible
For example, groups within a group hierarchy may not sum to the parent group
Small counts (e.g. at the tract level) will be suppressed; the threshold is 22 for detailed groups
See the DDHC-A technical documentation for more information

What’s new in tidycensus

Don’t miss my GIS workshop series in October!

About me

Getting started

The 2022 1-year American Community Survey data

What is the ACS?

Working with ACS data in tidycensus

1-year ACS data

The 2022 1-year ACS: best practices

Understanding limitations of the 1-year ACS

Finding available variables

Data sparsity and margins of error

Example: Punjabi speakers by state (1-year ACS)

Punjabi speakers by state (5-year ACS)

What about mapping 1-year ACS data?

Example: “mapping” 1-year ACS data

Example: “mapping” 1-year ACS data

Mapping small(er) areas with PUMAs

Bonus: new Connecticut county-equivalents

Time-series analysis with the 1-year ACS: some notes

The 2020 Decennial Census Detailed DHC-A File

The Detailed DHC-A File

Limitations of the DDHC-A File

Getting Decennial Census data in tidycensus

Using the DDHC-A File in tidycensus

Example usage of the DDHC-A File

Looking up group codes

Understanding sparsity in the DDHC-A File

You may see this error…

How to check for data availability

Mapping DDHC-A data

Example: Somali populations by Census tract in Minneapolis

Alternative approach: dot-density mapping

Notes on differential privacy

Thank you!