Mapping jobs and commutes with 2020 LODES data and deck.gl

python
gis
data science
Author

Kyle Walker

Published

May 17, 2023

Last month, version 8 of the LEHD Origin-Destination Employment Statistics (LODES) dataset was released. This long-awaited release includes data on workplaces, residences, and origin-destination flows for workers in 2020, along with a time series of these statistics back to 2002 enumerated at 2020 Census blocks.

The latest release of the pygris package for Python enables programmatic access to these new data resources with its get_lodes() function. This new release also allows you to request Census geometry or longitude / latitude coordinates along with your LODES data, making data visualization and mapping straightforward. Let’s try it out!

Mapping job locations by Census block

To get started, let’s take care of some imports. We’ll be using the following:

from pygris.data import get_lodes
import pydeck
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

The first example will visualize the distribution of accommodation and food service workers by Census block in Kentucky. We can get this information from the LODES Worker Area Characteristics (WAC) dataset, which helps us understand the geography of jobs for small areas.

The latest version of pygris (0.1.5) includes some mapping helpers in get_lodes(). The new return_geometry parameter identifies the appropriate TIGER/Line shapefile to merge to the requested LODES data and returns a GeoPandas GeoDataFrame with geometry. An alternative approach, which we will be using here, uses the new return_lonlat parameter. This gives us a Pandas DataFrame with columns representing the centroid of the location. This representation of geography works quite well with deck.gl.

Let’s get WAC data for the state of Kentucky in 2020.

ky_lodes_wac = get_lodes(
  state = "KY", 
  year = 2020, 
  lodes_type = "wac",
  cache = True,
  return_lonlat = True
)

The returned data have a host of columns representing jobs by category within that block, along with two additional columns, w_lon and w_lat, which represent the longitude and latitude of each block centroid.

Our next step is to write a color-generating function to add some context to our visualization. For cartographers coming to deck.gl from other mapping libraries, color formatting can be tricky. deck.gl expects RGBA colors with values ranging from 0 to 255; while many mapping libraries translate column values to colors for you, we’ll need to do this manually.

The function, column_to_rgba(), normalizes an input column and converts it to a column where every element is a list of format [R, G, B, A] for a given color map cmap. We’ll use this function to add a column to our dataset, 'color', that is based on values in the CNS18 column (representing accommodation and food service jobs) and uses the viridis color palette.

def column_to_rgba(column, cmap, alpha):
    normalized = (column - column.min()) / (column.max() - column.min())
    my_cmap = plt.get_cmap(cmap)
    colors = normalized.apply(lambda x: [int(i * 255) for i in mcolors.to_rgba(my_cmap(x, alpha = alpha))])

    return colors
  
  
ky_lodes_wac['color'] = column_to_rgba(ky_lodes_wac['CNS18'], "viridis", 0.6)

The longitude / latitude data will work well for a deck.gl ColumnLayer. A column layer is a three-dimensional visualization that renders each location as a column, with height and color optionally scaled to a given characteristic in the dataset. This is a nice alternative to a choropleth map of jobs by block, as block polygons can be very irregular.

layer = pydeck.Layer(
  "ColumnLayer",
  ky_lodes_wac,
  get_position=["w_lon", "w_lat"],
  auto_highlight=True,
  elevation_scale=20,
  pickable=True,
  get_elevation = "CNS18",
  get_fill_color = "color",
  elevation_range=[0, 1000],
  extruded=True,
  coverage=1
)

# Set the viewport location
view_state = pydeck.ViewState(
  longitude=-85.4095567,
  latitude=37.2086276,
  zoom=6,
  min_zoom=5,
  max_zoom=15,
  pitch=40.5,
  bearing=-27.36
)

tooltip = {"html": "Number of accommodation / food service jobs: {CNS18}"}

# Render
r = pydeck.Deck(
  layers=[layer], 
  initial_view_state=view_state, 
  map_style = "light", 
  tooltip = tooltip
)

r.to_html("ky_service.html")

Browse the map and look for interesting patterns. Note how seamlessly deck.gl visualizes all 30,000 block locations in the dataset!

Mapping origin-destination flows

The return_lonlat feature in get_lodes() also works great for representing origin-destination flows. The origin-destination dataset in LODES, acquired with lodes_type = "od", returns block-to-block flows for all home-to-work combinations in a given state.

Given that block-to-block flows could quickly get visually overwhelming, we may want to aggregate our data to a parent geography. Let’s acquire origin-destination flows for the state of Texas, and aggregate to the Census tract level with the argument agg_level = "tract".

tx_od = get_lodes(
  state = "TX", 
  year = 2020, 
  lodes_type="od",
  agg_level = "tract",
  cache = True, 
  return_lonlat = True
)

The data we get back includes h_lon and h_lat columns representing the centroid of the home Census tract, and w_lon and w_lat columns for the centroid of the work Census tract.

We’ll visualize these flows with a deck.gl ArcLayer; incidentally, the PyDeck documentation uses LODES data to show how ArcLayers work.

Let’s refine the data first to answer a specific question. I live in Fort Worth, Texas, and a major growth area for the city is AllianceTexas, a fast-developing industrial and commercial corridor. We’ll generate a new object, top_commutes, that identifies those Census tracts sending at least 25 commuters to the Census tract containing the southern part of the Alliance airport.

top_commutes = tx_od.query('w_geocode == "48439113932" & S000 >= 25')

From here, we can basically replicate the example from the PyDeck documentation, but apply it to commute flows to Alliance in Fort Worth.

import pydeck

GREEN_RGB = [0, 255, 0, 200]
RED_RGB = [240, 100, 0, 200]

arc_layer = pydeck.Layer(
  "ArcLayer",
  data=top_commutes,
  get_width="S000 / 5",
  get_source_position=["h_lon", "h_lat"],
  get_target_position=["w_lon", "w_lat"],
  get_tilt=15,
  get_source_color=RED_RGB,
  get_target_color=GREEN_RGB,
  pickable=True,
  auto_highlight=True
)

view_state = pydeck.ViewState(
  latitude=32.708664, 
  longitude=-97.360546, 
  bearing=45, 
  pitch=50, 
  zoom=8
)

tooltip = {"html": "{S000} jobs <br /> Home of commuter in red; work location in green"}
r = pydeck.Deck(
  arc_layer, 
  initial_view_state=view_state, 
  tooltip=tooltip, 
  map_style = "road"
)

r.to_html("alliance_commuters.html")

We get a compelling origin-destination flow map showing the locations that sent the most commuters to AllianceTexas in 2020.

Working with LODES data can have massive benefits for your projects and your business. If you’d like to discuss how to integrate these insights into your work, please don’t hesitate to reach out!