2025 SSDAN Webinar Series
2025-02-26
Professor of Geography at TCU
Spatial data science researcher and consultant
Package developer: tidycensus, tigris, mapgl, mapboxapi, crsuggest, idbr (R), pygris (Python)
Book: Analyzing US Census Data: Methods, Maps and Models in R
Wednesday, February 5: Analyzing Data from the 2023 American Community Survey in R
Wednesday, February 12: Working with Decennial Census Data in R
Today: Mapping and Spatial Analysis with US Census Data in R
Hour 1: How to get and explore spatial US Census data using R
Hour 2: A tour of map types with R, mapgl, and US Census data
Hour 3: Advanced workflows: national and time-series mapping
R: programming language and software environment for data analysis (and wherever else your imagination can take you!)
RStudio: integrated development environment (IDE) for R developed by Posit
Posit Cloud: run RStudio with today’s workshop pre-configured at https://posit.cloud/content/9689451
Complete count of the US population mandated by Article 1, Sections 2 and 9 in the US Constitution
Directed by the US Census Bureau (US Department of Commerce); conducted every 10 years since 1790
Used for proportional representation / congressional redistricting
Limited set of questions asked about race, ethnicity, age, sex, and housing tenure
Annual survey of 3.5 million US households
Covers topics not available in decennial US Census data (e.g. income, education, language, housing characteristics)
Available as 1-year estimates (for geographies of population 65,000 and greater) and 5-year estimates (for geographies down to the block group)
Data delivered as estimates characterized by margins of error
data.census.gov is the main, revamped interactive data portal for browsing and downloading Census datasets
The US Census Application Programming Interface (API) allows developers to access Census data resources programmatically
Wrangles Census data internally to return tidyverse-ready format (or traditional wide format if requested);
Automatically downloads and merges Census geometries to data for mapping;
Includes a variety of analytic tools to support common Census workflows;
States and counties can be requested by name (no more looking up FIPS codes!)
To get started, install the packages you’ll need for today’s workshop
If you are using the Posit Cloud environment, these packages are already installed for you
tidycensus (and the Census API) can be used without an API key, but you will be limited to 500 queries per day
Power users: visit https://api.census.gov/data/key_signup.html to request a key, then activate the key from the link in your email.
Once activated, use the census_api_key()
function to set your key as an environment variable
Traditionally, getting “spatial” Census data required:
Fetching shapefiles from the Census website;
Downloading a CSV of data, then cleaning and formatting it;
Loading geometries and data into your GIS of choice;
Aligning key fields in your GIS and joining your data
Your core functions in tidycensus are get_decennial()
for decennial Census data, and get_acs()
for ACS data
Required arguments are geography
and variables
GEOID
, NAME
, variable
, estimate
, and moe
# A tibble: 254 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 48001 Anderson County, Texas B19013_001 58846 4186
2 48003 Andrews County, Texas B19013_001 76902 8775
3 48005 Angelina County, Texas B19013_001 58847 2883
4 48007 Aransas County, Texas B19013_001 61754 5334
5 48009 Archer County, Texas B19013_001 71958 8215
6 48011 Armstrong County, Texas B19013_001 68462 13055
7 48013 Atascosa County, Texas B19013_001 69413 3545
8 48015 Austin County, Texas B19013_001 75994 6012
9 48017 Bailey County, Texas B19013_001 70625 21709
10 48019 Bandera County, Texas B19013_001 69703 5270
# ℹ 244 more rows
geometry = TRUE
to get pre-joined geometry along with your data!plot()
:The sf package implements a simple features data model for vector spatial data in R
Vector geometries: points, lines, and polygons stored in a list-column of a data frame
GEOID
, NAME
, variable
, estimate
, and moe
, along with a geometry
column representing the shapes of locationsSimple feature collection with 254 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007
Geodetic CRS: NAD83
First 10 features:
GEOID NAME variable estimate moe
1 48355 Nueces County, Texas B19013_001 66021 1570
2 48215 Hidalgo County, Texas B19013_001 52281 1265
3 48061 Cameron County, Texas B19013_001 51334 1640
4 48479 Webb County, Texas B19013_001 62506 2316
5 48057 Calhoun County, Texas B19013_001 71870 10785
6 48323 Maverick County, Texas B19013_001 51270 5723
7 48465 Val Verde County, Texas B19013_001 59673 4312
8 48039 Brazoria County, Texas B19013_001 95155 2480
9 48229 Hudspeth County, Texas B19013_001 39336 17125
10 48203 Harrison County, Texas B19013_001 66040 2877
geometry
1 MULTIPOLYGON (((-97.11172 2...
2 MULTIPOLYGON (((-98.58634 2...
3 MULTIPOLYGON (((-97.24047 2...
4 MULTIPOLYGON (((-100.2122 2...
5 MULTIPOLYGON (((-96.80935 2...
6 MULTIPOLYGON (((-100.6675 2...
7 MULTIPOLYGON (((-101.7603 2...
8 MULTIPOLYGON (((-95.87403 2...
9 MULTIPOLYGON (((-105.998 32...
10 MULTIPOLYGON (((-94.70215 3...
mapview()