Convert polygon geometry to dots for dot-density mapping

Dot-density maps are a compelling alternative to choropleth maps for cartographic visualization of demographic data as they allow for representation of the internal heterogeneity of geographic units. This function helps users generate dots from an input polygon dataset intended for dot-density mapping. Dots are placed randomly within polygons according to a given data:dots ratio; for example, a ratio of 100:1 for an input population value column will place approximately 1 dot in the polygon for every 100 people in the geographic unit. Users can then map the dots using tools like ggplot2::geom_sf() or tmap::tm_dots().

as_dot_density(
  input_data,
  value,
  values_per_dot,
  group = NULL,
  erase_water = FALSE,
  area_threshold = NULL,
  water_year = 2020
)

Arguments

input_data: An input sf object of geometry type POLYGON or MULTIPOLYGON that includes some information that can be converted to dots. While the function is designed for use with data acquired with the tidycensus package, it will work for arbitrary polygon datasets.
value: The value column to be used to determine the number of dots to generate. For tidycensus users, this will typically be the "value" column for decennial Census data or the "estimate" column for American Community Survey estimates.
values_per_dot: The number of values per dot; used to determine the output data:dots ratio. A value of 100 means that each dot will represent approximately 100 values in the value column.
group: A column in the dataset that identifies salient groups within which dots should be generated. For a long-form tidycensus dataset, this will typically be the "variable" column or some derivative of it. The output dataset will be randomly shuffled to prevent "stacking" of groups in downstream dot-density maps.
erase_water: If TRUE, calls tigris::erase_water() to remove water areas from the polygons prior to generating dots, allowing for dasymetric dot placement. This option is recommended if your location includes significant water area. If using this option, it is recommended that you first transform your data to a projected coordinate reference system using sf::st_transform() to improve performance. This argument only works for data in the United States.
area_threshold: The area percentile threshold to be used when erasing water; ranges from 0 (all water area included) to 1 (no water area included)
water_year: The year of the TIGER/Line water area shapefiles to use if erasing water. Defaults to 2020; ignore if not using the erase_water feature.

Value

The original dataset but of geometry type POINT, with the number of point features corresponding to the given value:dot ratio for a given group.

Details

as_dot_density() uses terra::dots() internally for fast creation of dots. As terra is not a hard dependency of the tidycensus package, users must first install terra before using this function.

The erase_water parameter will internally call tigris::erase_water() to fetch water area for a given location in the United States and remove that water area from the polygons before placing dots in polygons. This will slow down performance of the function, but can improve cartographic accuracy in locations with significant water area. It is recommended that users transform their data into a projected coordinate reference system with sf::st_transform() prior to using this option in order to improve performance.

Examples

if (FALSE) {

library(tidycensus)
library(ggplot2)

# Identify variables for mapping
race_vars <- c(
  Hispanic = "P2_002N",
  White = "P2_005N",
  Black = "P2_006N",
  Asian = "P2_008N"
)

# Get data from tidycensus
baltimore_race <- get_decennial(
  geography = "tract",
  variables = race_vars,
  state = "MD",
  county = "Baltimore city",
  geometry = TRUE,
  year = 2020
)

# Convert data to dots
baltimore_dots <- as_dot_density(
  baltimore_race,
  value = "value",
  values_per_dot = 100,
  group = "variable"
)

# Use one set of polygon geometries as a base layer
baltimore_base <- baltimore_race[baltimore_race$variable == "Hispanic", ]

# Map with ggplot2
ggplot() +
  geom_sf(data = baltimore_base,
          fill = "white",
          color = "grey") +
  geom_sf(data = baltimore_dots,
          aes(color = variable),
          size = 0.01) +
  theme_void()

}