Circle clusters and heatmaps for dense point data in R

r
gis
data science
spatial analysis
Author

Kyle Walker

Published

October 7, 2024

One of my favorite examples to use in data science / cartography teaching is Clickhole’s article We Put 700 Red Dots On A Map.

Map of randomly-placed red dots from satirical publication Clickhole

Source: https://clickhole.com/we-put-700-red-dots-on-a-map-1825122391/

The article explains:

Seven hundred of them. Seven hundred dots. That’s more than 500 dots—well on the way to 1,000. That could represent 700 people, or crime scenes, or cities. Or something that happens in this country every 20 seconds. These dots could potentially be anything—they’re red dots, so they could definitely mean something bad.

The article is of course satirical, and is poking fun at “amazing maps” published on social media from which much is inferred, but in reality don’t say much of anything.

I was reminded of this article when I read Brian Timoney’s recent blog post, “When we sell ‘Mapping’, What Precisely Is The Product?” He points out that while technical innovations in geospatial data science sell solutions like “mapping a billion points in your browser,” the real value is in the ability to solve a customer’s problem, not necessarily the level of technical achievement.

This motivated me to put together a tutorial on some features in my new R package, mapgl, for visualizing clusters of dense point data without showing a bunch of “dots on a map.” Let’s walk through some examples.

Data setup: public intoxication violations in Fort Worth, Texas

Let’s get started with a dataset I’ve used for the past few years in my data science teaching: public intoxication violations in the city of Fort Worth, Texas from the crime dataset in the city’s open data catalog. The data cover 2019 through March of 2020.

We can plot the data as “red dots” using standard R plotting tools (in this case, ggplot2) over a backdrop of the boundary of the oddly-shaped city of Fort Worth. st_jitter() is used to slightly separate out dots at the same addresses.

library(mapgl)
library(tidyverse)
library(sf)
library(tigris)
options(tigris_use_cache = TRUE)

intox <- read_csv("https://raw.githubusercontent.com/walkerke/geog30323/refs/heads/master/intoxication.csv") %>%
  na.omit() %>%
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
  st_jitter(factor = 0.0001)

ft_worth <- places(cb = TRUE, year = 2023) |> 
  filter(NAME == "Fort Worth")

ggplot() + 
  geom_sf(data = ft_worth, fill = "navy", alpha = 0.2) + 
  geom_sf(data = intox, color = "red") + 
  theme_void()

The visualization, in its basic form, doesn’t tell us much. We do see a few possible “clusters” of data, but our visual doesn’t do much more at this point than the satirical Clickhole map does.

An alternative approach involves visualizing the data on an interactive map so users can at least zoom and pan around to explore the clusters themselves. We’ll use the MapLibre engine in R’s mapgl package to accomplish this with OpenStreetMap tiles beneath the city boundary and the red dots.

ftw_map <- maplibre(
  style = maptiler_style("openstreetmap"), 
  bounds = ft_worth
) |>
  add_fill_layer(
    id = "city",
    source = ft_worth,
    fill_color = "navy",
    fill_opacity = 0.2
  ) 

ftw_map |> 
  add_circle_layer(
    id = "circles",
    source = intox,
    circle_color = "red",
    circle_stroke_color = "white",
    circle_stroke_width = 1
  ) 

We can explore the data distribution better when zoomed in, but we still don’t get much clarity about patterns when zoomed out. Fortunately, the mapgl package includes some solutions. Let’s take a look at a couple: circle clustering and heatmaps.

Circle clustering in mapgl

A big challenge when mapping dense point data - as we see in this example - is that points will overlap each other when zoomed out, making it difficult to understand the size of point clusters in dense areas. A solution to this is circle clustering, where points within a given radius of one another are packed into clusters, and those clusters are visualized instead of the individual circles. Clusters will dynamically change depending on the user’s zoom level, revealing individual points when a max zoom level is reached.

Circle clustering is implemented in both Mapbox GL JS and MapLibre GL JS, the JavaScript mapping libraries included in the mapgl R package. I’ve built out an interface to the circle clustering functionality in these libraries to try to make it as simple as possible for R users. To cluster circles with default options set, just add cluster_options = cluster_options() to a call to add_circle_layer().

ftw_map |> 
  add_circle_layer(
    id = "circles",
    source = intox,
    circle_color = "red",
    circle_stroke_color = "white",
    circle_stroke_width = 1,
    cluster_options = cluster_options()
  )