Building custom regions from 2020 Census blocks in Python

data science

Kyle Walker


June 26, 2023

Earlier this month, I gave a two-part workshop series on analyzing the newly-released 2020 Decennial US Census Data with R. If you missed out on the workshop series, you can buy the videos and materials on my Workshops page. One topic I addressed was how to handle the impact of differential privacy on block-level accuracy in the new Census Data.

Differential privacy refers to a method used by the Census Bureau to infuse “noise” into data products to preserve respondent confidentiality. Counts for larger areas and larger groups will still be accurate, but differential privacy makes smaller counts less reliable. In fact, the Census Bureau makes the following recommendation about block-level data:

DON’T use data for individual blocks. Instead, aggregate data into larger areas, or use statistical models that combine data from many blocks. Block data are published to permit the analysis of user-constructed geographic areas composed of multiple blocks, for example, new voting districts that consist of collections of blocks within a politically defined geography.

This isn’t likely to be satisfactory advice for analysts for a couple key reasons. First, analysts working with Census data in rural areas often need block-level data to understand demographic trends, as block groups (the next level up in the Census hierarchy) may be too large in sparsely-populated areas. Second, “aggregating data” is not as simple as it sounds in the quote. Creating data aggregations requires an understanding of techniques in GIS and data science that may be beyond the knowledge of the average Census data user.

In this post, I’ll illustrate a technique for creating custom regions from Census block data. We’ll be using the pygeoda package for this task, a Python wrapper of the C++ library that powers GeoDa, a GUI tool for exploratory spatial data analysis and spatial modeling. Working with GeoDa in this way is particularly fun for me. I was a qualitative geographer in graduate school before encountering GeoDa. GeoDa was the tool that sparked an interested in spatial data science for me and in many ways motivated my eventual career path.

Let’s grab some block data using pygris for Delta County, Texas, a rural county of about 5,000 residents northeast of the Dallas-Fort Worth metro area. If you haven’t previously cached the Texas block shapefile, this will take a few minutes to download.

import geopandas as gp
import pygeoda
from pygris import blocks, block_groups
from import get_census

# Get the block data for a county in Texas
delta_blocks = blocks(state = "TX", county = "Delta", year = 2020, cache = True)
Using FIPS code '48' for input 'TX'
Using FIPS code '119' for input 'Delta'

Given that Delta County is fairly small, we can use .explore() to make a performant interactive map of the 571 Census blocks.

delta_blocks.explore(tooltip = False, popup = True)
Make this Notebook Trusted to load map: File -> Trust Notebook