2022 1-year ACS and 2020 Detailed DHC-A data
Use the discount code DDHCA for 25% off the following:
October 4 (11am CT): Getting Started with Geographic Information Systems in R
October 11 (11am CT): Interactive Mapping with R
Past workshops on 2020 Census Data are available at https://walker-data.com/workshops.html
Associate Professor of Geography at TCU
Spatial data science researcher and consultant
Package developer: tidycensus, tigris, mapboxapi, crsuggest, idbr (R), pygris (Python)
Book: Analyzing US Census Data: Methods, Maps and Models in R
Annual survey of 3.5 million US households
Covers topics not available in decennial US Census data (e.g. income, education, language, housing characteristics)
Available as 1-year estimates (for geographies of population 65,000 and greater) and 5-year estimates (for geographies down to the block group)
Data delivered as estimates characterized by margins of error
The get_acs()
function is your portal to access ACS data using tidycensus
The two required arguments are geography
and variables
. The function defaults to the 2017-2021 5-year ACS
GEOID
, NAME
, variable
, estimate
, and moe
# A tibble: 6,896 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 48001950100 Census Tract 9501, Anderson County, Texas B19013_… 61325 9171
2 48001950401 Census Tract 9504.01, Anderson County, T… B19013_… 92813 45136
3 48001950402 Census Tract 9504.02, Anderson County, T… B19013_… NA NA
4 48001950500 Census Tract 9505, Anderson County, Texas B19013_… 41713 6650
5 48001950600 Census Tract 9506, Anderson County, Texas B19013_… 32552 12274
6 48001950700 Census Tract 9507, Anderson County, Texas B19013_… 35811 5573
7 48001950800 Census Tract 9508, Anderson County, Texas B19013_… 52612 12426
8 48001950901 Census Tract 9509.01, Anderson County, T… B19013_… 47336 3806
9 48001950902 Census Tract 9509.02, Anderson County, T… B19013_… 47068 10004
10 48001951001 Census Tract 9510.01, Anderson County, T… B19013_… 55063 7833
# ℹ 6,886 more rows
1-year ACS data are more current, but are only available for geographies of population 65,000 and greater
Access 1-year ACS data with the argument survey = "acs1"
; defaults to "acs5"
Use load_variables(2022, "acs1")
to view available variable codes in the 2022 1-year ACS
"acs1/profile"
and "acs1/subject"
are also available for the Data Profile and Subject Tables respectively
# A tibble: 36,607 × 3
name label concept
<chr> <chr> <chr>
1 B01001A_001 Estimate!!Total: Sex by Age (White Alone)
2 B01001A_002 Estimate!!Total:!!Male: Sex by Age (White Alone)
3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years Sex by Age (White Alone)
4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years Sex by Age (White Alone)
5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years Sex by Age (White Alone)
6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years Sex by Age (White Alone)
7 B01001A_007 Estimate!!Total:!!Male:!!18 and 19 years Sex by Age (White Alone)
8 B01001A_008 Estimate!!Total:!!Male:!!20 to 24 years Sex by Age (White Alone)
9 B01001A_009 Estimate!!Total:!!Male:!!25 to 29 years Sex by Age (White Alone)
10 B01001A_010 Estimate!!Total:!!Male:!!30 to 34 years Sex by Age (White Alone)
# ℹ 36,597 more rows
Values available in the 5-year ACS may not be available in the corresponding 1-year ACS tables
If available, they will likely have larger margins of error
Your job as an analyst: balance need for certainty vs. need for recency in estimates
# A tibble: 52 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 01 Alabama B16001_054 666 556
2 02 Alaska B16001_054 NA NA
3 04 Arizona B16001_054 1906 1342
4 05 Arkansas B16001_054 NA NA
5 06 California B16001_054 154917 14153
6 08 Colorado B16001_054 1643 1968
7 09 Connecticut B16001_054 4039 2965
8 10 Delaware B16001_054 0 203
9 11 District of Columbia B16001_054 NA NA
10 12 Florida B16001_054 3311 1969
# ℹ 42 more rows
# A tibble: 52 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 01 Alabama B16001_054 507 248
2 02 Alaska B16001_054 33 55
3 04 Arizona B16001_054 3833 758
4 05 Arkansas B16001_054 478 279
5 06 California B16001_054 142450 6035
6 08 Colorado B16001_054 1323 389
7 09 Connecticut B16001_054 1455 462
8 10 Delaware B16001_054 214 180
9 11 District of Columbia B16001_054 189 90
10 12 Florida B16001_054 2544 631
# ℹ 42 more rows
One of the best features of tidycensus is the argument geometry = TRUE
, which gets you the correct Census geometries with no hassle
Typically it is difficult to map 1-year ACS data below the state level as your data will have gaps due to the population restrictions
Consider using Public Use Microdata Areas (PUMAs) for geographically-consistent substate mapping
PUMAs are typically used for microdata geography; however, I find them quite useful to approximate real state submarkets, planning areas, etc.
Variables in the Data Profile and Subject Tables can change names over time
You’ll need to watch out for the Connecticut issue and changing geographies
The 2020 1-year ACS was not released (and is not in tidycensus), so your time-series can break if you are using iteration to pull data
Tabulation of 2020 Decennial Census results for population by sex and age
Key feature: break-outs for thousands of racial and ethnic groups
An “adaptive design” is used, meaning that data for different groups / geographies may be found in different tables
There is considerable sparsity in the data, especially when going down to the Census tract level
You’ll query the DDHC-A file with the argument sumfile = "ddhca"
in get_decennial()
A new argument, pop_group
, is required to use the DDHC-A; it takes a population group code.
Use pop_group = "all"
to query for all groups; set pop_group_label = TRUE
to return the label for the population group
Look up variables with load_variables(2020, "ddhca")
# A tibble: 2,996 × 6
GEOID NAME pop_group pop_group_label variable value
<chr> <chr> <chr> <chr> <chr> <dbl>
1 27 Minnesota 1002 European alone T01001_001N 3162905
2 27 Minnesota 1003 Albanian alone T01001_001N 512
3 27 Minnesota 1004 Alsatian alone T01001_001N 27
4 27 Minnesota 1005 Andorran alone T01001_001N NA
5 27 Minnesota 1006 Armenian alone T01001_001N 605
6 27 Minnesota 1007 Austrian alone T01001_001N 2552
7 27 Minnesota 1008 Azerbaijani alone T01001_001N 103
8 27 Minnesota 1009 Basque alone T01001_001N 52
9 27 Minnesota 1010 Belarusian alone T01001_001N 1579
10 27 Minnesota 1011 Belgian alone T01001_001N 3864
# ℹ 2,986 more rows
A new function, get_pop_groups()
, helps you look up population group codes
It works for SF2/SF4 in 2000 and SF2 in 2010 as well!
get_decennial(
geography = "county",
variables = "T02001_001N",
state = "MN",
county = "Hennepin",
pop_group = "1325",
year = 2020,
sumfile = "ddhca"
)
Error in `get_decennial()`:
! Error in load_data_decennial(geography, variables, key, year, sumfile, :
Your DDHC-A request returned No Content from the API.
ℹ The DDHC-A file uses an 'adaptive design' where data availability varies by geography and by population group.
ℹ Read Section 3-1 at https://www2.census.gov/programs-surveys/decennial/2020/technical-documentation/complete-tech-docs/detailed-demographic-and-housing-characteristics-file-a/2020census-detailed-dhc-a-techdoc.pdf for more information.
ℹ In tidycensus, use the function `check_ddhca_groups()` to see if your data is available.
check_ddhca_groups()
, can be used to see which tables to use for the data you wantGiven data sparsity in the DDHC-A data, should you make maps with it?
I’m not personally a fan of mapping data that are geographically sparse. But…
I don’t think choropleth maps are advisable with geographically incomplete data in most cases
Other map types - like graduated symbols or dot-density maps - may be more appropriate
The tidycensus function as_dot_density()
allows you to specify the number of people represented in each dot, which means you can represent data-suppressed areas as 0 more confidently
The use of differential privacy in the 2020 DDHC-A File makes some traditional Census analyses impossible
For example, groups within a group hierarchy may not sum to the parent group
Small counts (e.g. at the tract level) will be suppressed; the threshold is 22 for detailed groups