2024 SSDAN Webinar Series
2024-03-07
Associate Professor of Geography at TCU
Spatial data science researcher and consultant
Package developer: tidycensus, tigris, mapboxapi, crsuggest, idbr (R), pygris (Python)
Book: Analyzing US Census Data: Methods, Maps and Models in R
Thurday, February 8: Working with the 2022 American Community Survey with R and tidycensus
Thursday, February 22: Analyzing 2020 Decennial US Census Data in R
Today: Doing “GIS” and making maps with US Census Data in R
Hour 1: How to get and explore spatial US Census data using R
Hour 2: A tour of map types with R and US Census data
Hour 3: Advanced workflows: automated mapping and spatial analysis
R: programming language and software environment for data analysis (and wherever else your imagination can take you!)
RStudio: integrated development environment (IDE) for R developed by Posit
Posit Cloud: run RStudio with today’s workshop pre-configured at https://posit.cloud/content/7549022
Complete count of the US population mandated by Article 1, Sections 2 and 9 in the US Constitution
Directed by the US Census Bureau (US Department of Commerce); conducted every 10 years since 1790
Used for proportional representation / congressional redistricting
Limited set of questions asked about race, ethnicity, age, sex, and housing tenure
Annual survey of 3.5 million US households
Covers topics not available in decennial US Census data (e.g. income, education, language, housing characteristics)
Available as 1-year estimates (for geographies of population 65,000 and greater) and 5-year estimates (for geographies down to the block group)
Data delivered as estimates characterized by margins of error
data.census.gov is the main, revamped interactive data portal for browsing and downloading Census datasets
The US Census Application Programming Interface (API) allows developers to access Census data resources programmatically
Wrangles Census data internally to return tidyverse-ready format (or traditional wide format if requested);
Automatically downloads and merges Census geometries to data for mapping;
Includes a variety of analytic tools to support common Census workflows;
States and counties can be requested by name (no more looking up FIPS codes!)
To get started, install the packages you’ll need for today’s workshop
If you are using the Posit Cloud environment, these packages are already installed for you
tidycensus (and the Census API) can be used without an API key, but you will be limited to 500 queries per day
Power users: visit https://api.census.gov/data/key_signup.html to request a key, then activate the key from the link in your email.
Once activated, use the census_api_key()
function to set your key as an environment variable
Traditionally, getting “spatial” Census data required:
Fetching shapefiles from the Census website;
Downloading a CSV of data, then cleaning and formatting it;
Loading geometries and data into your GIS of choice;
Aligning key fields in your GIS and joining your data
Your core functions in tidycensus are get_decennial()
for decennial Census data, and get_acs()
for ACS data
Required arguments are geography
and variables
GEOID
, NAME
, variable
, estimate
, and moe
# A tibble: 254 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 48001 Anderson County, Texas B19013_001 57445 4562
2 48003 Andrews County, Texas B19013_001 86458 16116
3 48005 Angelina County, Texas B19013_001 57055 2484
4 48007 Aransas County, Texas B19013_001 58168 6458
5 48009 Archer County, Texas B19013_001 69954 8482
6 48011 Armstrong County, Texas B19013_001 70417 14574
7 48013 Atascosa County, Texas B19013_001 67442 4309
8 48015 Austin County, Texas B19013_001 73556 4757
9 48017 Bailey County, Texas B19013_001 69830 13120
10 48019 Bandera County, Texas B19013_001 70965 5710
# ℹ 244 more rows
geometry = TRUE
to get pre-joined geometry along with your data!plot()
:The sf package implements a simple features data model for vector spatial data in R
Vector geometries: points, lines, and polygons stored in a list-column of a data frame
GEOID
, NAME
, variable
, estimate
, and moe
, along with a geometry
column representing the shapes of locationsSimple feature collection with 254 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007
Geodetic CRS: NAD83
First 10 features:
GEOID NAME variable estimate moe
1 48273 Kleberg County, Texas B19013_001 52487 4987
2 48391 Refugio County, Texas B19013_001 54304 2650
3 48201 Harris County, Texas B19013_001 70789 482
4 48443 Terrell County, Texas B19013_001 52813 11107
5 48229 Hudspeth County, Texas B19013_001 35163 8159
6 48205 Hartley County, Texas B19013_001 78065 21076
7 48351 Newton County, Texas B19013_001 38871 6573
8 48373 Polk County, Texas B19013_001 57315 2976
9 48139 Ellis County, Texas B19013_001 93248 2485
10 48491 Williamson County, Texas B19013_001 102851 1462
geometry
1 MULTIPOLYGON (((-97.3178 27...
2 MULTIPOLYGON (((-97.54085 2...
3 MULTIPOLYGON (((-94.97839 2...
4 MULTIPOLYGON (((-102.5669 3...
5 MULTIPOLYGON (((-105.998 32...
6 MULTIPOLYGON (((-103.0422 3...
7 MULTIPOLYGON (((-93.91113 3...
8 MULTIPOLYGON (((-95.20018 3...
9 MULTIPOLYGON (((-97.08703 3...
10 MULTIPOLYGON (((-98.04989 3...
mapview()