Mapping jobs and commutes with 2020 LODES data and deck.gl
Last month, version 8 of the LEHD Origin-Destination Employment Statistics (LODES) dataset was released. This long-awaited release includes data on workplaces, residences, and origin-destination flows for workers in 2020, along with a time series of these statistics back to 2002 enumerated at 2020 Census blocks.
The latest release of the pygris package for Python enables programmatic access to these new data resources with its get_lodes()
function. This new release also allows you to request Census geometry or longitude / latitude coordinates along with your LODES data, making data visualization and mapping straightforward. Let’s try it out!
Mapping job locations by Census block
To get started, let’s take care of some imports. We’ll be using the following:
- The
get_lodes()
function in the pygris package gives us access to the brand-new LODES data. There is a lot more you can do withget_lodes()
; review the package documentation for more examples.
- pydeck is a Python interface to deck.gl, one of the most stunning data visualization libraries around. As you’ll see, deck.gl can help you create performant three-dimensional visualizations with large datasets.
- We’ll also use matplotlib to do some custom color work for our maps.
from pygris.data import get_lodes
import pydeck
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
The first example will visualize the distribution of accommodation and food service workers by Census block in Kentucky. We can get this information from the LODES Worker Area Characteristics (WAC) dataset, which helps us understand the geography of jobs for small areas.
The latest version of pygris (0.1.5) includes some mapping helpers in get_lodes()
. The new return_geometry
parameter identifies the appropriate TIGER/Line shapefile to merge to the requested LODES data and returns a GeoPandas GeoDataFrame with geometry. An alternative approach, which we will be using here, uses the new return_lonlat
parameter. This gives us a Pandas DataFrame with columns representing the centroid of the location. This representation of geography works quite well with deck.gl.
Let’s get WAC data for the state of Kentucky in 2020.
= get_lodes(
ky_lodes_wac = "KY",
state = 2020,
year = "wac",
lodes_type = True,
cache = True
return_lonlat )
The returned data have a host of columns representing jobs by category within that block, along with two additional columns, w_lon
and w_lat
, which represent the longitude and latitude of each block centroid.
Our next step is to write a color-generating function to add some context to our visualization. For cartographers coming to deck.gl from other mapping libraries, color formatting can be tricky. deck.gl expects RGBA colors with values ranging from 0 to 255; while many mapping libraries translate column values to colors for you, we’ll need to do this manually.
The function, column_to_rgba()
, normalizes an input column and converts it to a column where every element is a list of format [R, G, B, A]
for a given color map cmap
. We’ll use this function to add a column to our dataset, 'color'
, that is based on values in the CNS18
column (representing accommodation and food service jobs) and uses the viridis color palette.
def column_to_rgba(column, cmap, alpha):
= (column - column.min()) / (column.max() - column.min())
normalized = plt.get_cmap(cmap)
my_cmap = normalized.apply(lambda x: [int(i * 255) for i in mcolors.to_rgba(my_cmap(x, alpha = alpha))])
colors
return colors
'color'] = column_to_rgba(ky_lodes_wac['CNS18'], "viridis", 0.6) ky_lodes_wac[
The longitude / latitude data will work well for a deck.gl ColumnLayer. A column layer is a three-dimensional visualization that renders each location as a column, with height and color optionally scaled to a given characteristic in the dataset. This is a nice alternative to a choropleth map of jobs by block, as block polygons can be very irregular.
= pydeck.Layer(
layer "ColumnLayer",
ky_lodes_wac,=["w_lon", "w_lat"],
get_position=True,
auto_highlight=20,
elevation_scale=True,
pickable= "CNS18",
get_elevation = "color",
get_fill_color =[0, 1000],
elevation_range=True,
extruded=1
coverage
)
# Set the viewport location
= pydeck.ViewState(
view_state =-85.4095567,
longitude=37.2086276,
latitude=6,
zoom=5,
min_zoom=15,
max_zoom=40.5,
pitch=-27.36
bearing
)
= {"html": "Number of accommodation / food service jobs: {CNS18}"}
tooltip
# Render
= pydeck.Deck(
r =[layer],
layers=view_state,
initial_view_state= "light",
map_style = tooltip
tooltip
)
"ky_service.html") r.to_html(
Browse the map and look for interesting patterns. Note how seamlessly deck.gl visualizes all 30,000 block locations in the dataset!
Mapping origin-destination flows
The return_lonlat
feature in get_lodes()
also works great for representing origin-destination flows. The origin-destination dataset in LODES, acquired with lodes_type = "od"
, returns block-to-block flows for all home-to-work combinations in a given state.
Given that block-to-block flows could quickly get visually overwhelming, we may want to aggregate our data to a parent geography. Let’s acquire origin-destination flows for the state of Texas, and aggregate to the Census tract level with the argument agg_level = "tract"
.
= get_lodes(
tx_od = "TX",
state = 2020,
year ="od",
lodes_type= "tract",
agg_level = True,
cache = True
return_lonlat )
The data we get back includes h_lon
and h_lat
columns representing the centroid of the home Census tract, and w_lon
and w_lat
columns for the centroid of the work Census tract.
We’ll visualize these flows with a deck.gl ArcLayer; incidentally, the PyDeck documentation uses LODES data to show how ArcLayers work.
Let’s refine the data first to answer a specific question. I live in Fort Worth, Texas, and a major growth area for the city is AllianceTexas, a fast-developing industrial and commercial corridor. We’ll generate a new object, top_commutes
, that identifies those Census tracts sending at least 25 commuters to the Census tract containing the southern part of the Alliance airport.
= tx_od.query('w_geocode == "48439113932" & S000 >= 25') top_commutes
From here, we can basically replicate the example from the PyDeck documentation, but apply it to commute flows to Alliance in Fort Worth.
import pydeck
= [0, 255, 0, 200]
GREEN_RGB = [240, 100, 0, 200]
RED_RGB
= pydeck.Layer(
arc_layer "ArcLayer",
=top_commutes,
data="S000 / 5",
get_width=["h_lon", "h_lat"],
get_source_position=["w_lon", "w_lat"],
get_target_position=15,
get_tilt=RED_RGB,
get_source_color=GREEN_RGB,
get_target_color=True,
pickable=True
auto_highlight
)
= pydeck.ViewState(
view_state =32.708664,
latitude=-97.360546,
longitude=45,
bearing=50,
pitch=8
zoom
)
= {"html": "{S000} jobs <br /> Home of commuter in red; work location in green"}
tooltip = pydeck.Deck(
r
arc_layer, =view_state,
initial_view_state=tooltip,
tooltip= "road"
map_style
)
"alliance_commuters.html") r.to_html(
We get a compelling origin-destination flow map showing the locations that sent the most commuters to AllianceTexas in 2020.
Working with LODES data can have massive benefits for your projects and your business. If you’d like to discuss how to integrate these insights into your work, please don’t hesitate to reach out!