A common use-case when working with time-series small-area Census data is to transfer data from one set of shapes (e.g. 2010 Census tracts) to another set of shapes (e.g. 2020 Census tracts). Population-weighted interpolation is one such solution to this problem that takes into account the distribution of the population within a Census unit to intelligently transfer data between incongruent units.

interpolate_pw(
  from,
  to,
  to_id = NULL,
  extensive,
  weights,
  weight_column = NULL,
  weight_placement = c("surface", "centroid"),
  crs = NULL
)

Arguments

from

The spatial dataset from which numeric attributes will be interpolated to target zones. By default, all numeric columns in this dataset will be interpolated.

to

The target geometries (zones) to which numeric attributes will be interpolated.

to_id

(optional) An ID column in the target dataset to be retained in the output. For data obtained with tidycensus, this will be "GEOID" by convention. If NULL, the output dataset will include a column id that uniquely identifies each row.

extensive

if TRUE, return weighted sums; if FALSE, return weighted means.

weights

An input spatial dataset to be used as weights. If the dataset is not of geometry type POINT, it will be converted to points by the function with sf::st_point_on_surface(). For US-based applications, this will commonly be a Census block dataset obtained with the tigris or tidycensus packages.

weight_column

(optional) a column in weights used for weighting in the interpolation process. Typically this will be a column representing the population (or other weighting metric, like housing units) of the input weights dataset. If NULL (the default), each feature in weights is given an equal weight of 1.

weight_placement

(optional) One of "surface", where weight polygons are converted to points on polygon surfaces with sf::st_point_on_surface(), or "centroid", where polygon centroids are used instead with sf::st_centroid(). Defaults to "surface". This argument is not necessary if weights are already of geometry type POINT.

crs

(optional) The EPSG code of the output projected coordinate reference system (CRS). Useful as all input layers (from, to, and weights) must share the same CRS for the function to run correctly.

Value

A dataset of class sf with the geometries and an ID column from to (the target shapes) but with numeric attributes of from interpolated to those shapes.

Details

The approach implemented here is based on Esri's data apportionment algorithm, in which an "apportionment layer" of points (referred to here as the weights) is used to determine how to weight areas of overlap between origin and target zones. Users must supply a "from" dataset as an sf object (the dataset from which numeric columns will be interpolated) and a "to" dataset, also of class sf, that contains the target zones. A third sf object, the "weights", may be an object of geometry type POINT or polygons from which points will be derived using sf::st_point_on_surface().

An intersection is computed between from and to, and a spatial join is computed between the intersection layer and the weights layer, represented as points. A specified weight_column in weights will be used to determine the relative influence of each point on the allocation of values between from and to; if no weight column is specified, all points will be weighted equally.

The extensive parameter (logical) should reflect the values being interpolated correctly. If TRUE, the function returns a weighted sum for each zone. If FALSE, a weighted mean will be returned. For Census data, extensive = TRUE should be used for transferring counts / estimated counts between zones. Derived metrics (e.g. population density, percentages, etc.) should use extensive = FALSE. Margins of error in the ACS will not be transferred correctly with this function, so please use with caution.

Examples

if (FALSE) {
# Example: interpolating work-from-home from 2011-2015 ACS
# to 2020 shapes
library(tidycensus)
library(tidyverse)
library(tigris)
options(tigris_use_cache = TRUE)

wfh_15 <- get_acs(
  geography = "tract",
  variables = "B08006_017",
  year = 2015,
  state = "AZ",
  county = "Maricopa",
  geometry = TRUE
) %>%
select(estimate)

wfh_20 <- get_acs(
  geography = "tract",
  variables = "B08006_017",
  year = 2020,
  state = "AZ",
  county = "Maricopa",
  geometry = TRUE
 )

maricopa_blocks <- blocks(
  "AZ",
  "Maricopa",
  year = 2020
)

wfh_15_to_20 <- interpolate_pw(
  from = wfh_15,
  to = wfh_20,
  to_id = "GEOID",
  weights = maricopa_blocks,
  weight_column = "POP20",
  crs = 26949,
  extensive = TRUE
)

}