Presenter: Kyle Walker (@kyle_e_walker)
This tutorial was presented on October 9, 2020 for the Master of Urban Spatial Analytics program at the University of Pennsylvania’s Weitzman School of Design.
To run the examples from the workshop yourself, open a terminal then clone the repository to your computer:
git clone https://github.com/walkerke/MUSAmasterclass.git
Open the project in RStudio and navigate to the tutorial
folder, then open the index.Rmd
document. The examples in that document will run correctly if code chunks are set to be evaluated in that directory.
Alternatively, if you are unfamiliar with git, click the “Code” drop-down button in the upper right corner of this tutorial, and choose “Download Rmd.” This will download this .Rmd file to your computer. Put the .Rmd file in a directory of your choice. Next, download the data for this workshop from https://walker-data.com/MUSAmasterclass/tutorial/data.zip. Unzip the folder in the same directory as your downloaded .Rmd file.
To get started with mapboxapi, you’ll need to first install some packages. mapboxapi was just released to CRAN this week, so we can install with install.packages()
:
install.packages("mapboxapi", dependencies = TRUE)
If you’ve been working with R Spatial packages before, installation should go smoothly. If you are new to R/R Spatial, you may need to do some configuration prior to successful installation of the package. mapboxapi depends heavily on the sf package for spatial data processing in R. On Ubuntu, use the following commands in a terminal to install required dependencies:
sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
sudo apt-get update
sudo apt-get install -y libudunits2-dev libgdal-dev libgeos-dev libproj-dev
mapboxapi also uses the protolite package for interacting with Mapbox vector tiles and the magick package for image processing and display. On Ubuntu, install dependencies with:
sudo apt-get install -y libprotobuf-dev protobuf-compiler libmagick++-dev
Instructions for other Linux distributions can be found on the package websites linked above.
To run all of the examples in this workshop, you’ll also need to install the following packages that don’t get picked up as mapboxapi dependencies:
install.packages(c("shiny", "fasterize", "tidycensus", "tidyverse"))
Before we get started using Mapbox services in R, you’ll need a valid Mapbox account with an access token. Fortunately, Mapbox has generously provided a coupon code for you to use as workshop participants. To set up your account, visit https://account.mapbox.com/auth/signup/ to establish an account - all you need to provide is an email address to sign up! Fill out the form and verify your account through the email Mapbox sends you; you’ll be taken directly to your Mapbox account dashboard page.
Note the “default public token” that appears on your screen - you’ll come back to this page in a moment. First, look to the right side of your screen and click “View billing.” This is where Mapbox will handle your billing information. Nothing you’ll do today will be intensive enough to incur charges - but your next three months of work will be covered by the coupons Mapbox has provided to this workshop. Scroll down and enter the coupon code you’ve received in the appropriate box, then click Add. Once you’ve entered your coupon code, return to your Mapbox dashboard. Copy the access token that appears on your screen to your clipboard, then return to R.
All features in mapboxapi require a valid Mapbox access token to work. Now that you have yours in hand, you can set yours up! Load the mapboxapi package and install your token as follows:
my_token <- "YOUR TOKEN GOES HERE"
library(mapboxapi)
mb_access_token(my_token, install = TRUE)
The optional argument install = TRUE
saves the token to your .Renviron, allowing you to use mapboxapi functions in the future without having to worry about setting your token. To use this feature, restart your R session.
The most well-known feature of Mapbox services is its ability to create stunning web maps which are used on applications all around the world. While mapboxapi is not an interface to Mapbox GL JS, Mapbox’s JavaScript library for building web maps, it does include some tools to help you use Mapbox maps in your R projects. This is important as the Mapbox Terms of Service require that Mapbox API outputs be visualized on Mapbox maps.
Mapbox maps are accessed through styles, which are custom design configurations applied to OpenStreetMap or even user-generated vector map tilesets. You’ll learn how to create and use your own map style with Mapbox later in this workshop. However, Mapbox provides a number of their styles to all users with a Mapbox access token. The most recent versions of these styles (as of the workshop date) are as follows:
streets-v11
: The core Mapbox Streets basemapoutdoors-v11
: A basemap designed for outdoor recreation useslight-v10
: A light, greyscale background suitable for thematic overlaydark-v10
: A dark basemap suitable for thematic overlaysatellite-v9
: A global satellite basemap derived from MODIS, Landsat, & proprietary imagery sourcessatellite-streets-v11
: The satellite basemap with a streets overlayOne of the most popular R packages for interactive data visualization in R is the Leaflet package maintained by RStudio, which wraps the Leaflet JavaScript library for web mapping. Years ago, I wrote a tutorial on how to use Mapbox maps in R Leaflet projects. Now, mapboxapi provides a convenience function, addMapboxTiles()
, to help you do this in a more straightforward way. This function queries the Mapbox Static Tiles API and converts a Mapbox style into static tiles for web mapping.
Let’s load the leaflet and mapboxapi libraries and set up an interactive map:
library(leaflet)
library(mapboxapi)
mapbox_map <- leaflet() %>%
addMapboxTiles(style_id = "streets-v11",
username = "mapbox")
mapbox_map
We get a browseable Leaflet map using Mapbox tiles as a basemap.
Once we’ve set up our Leaflet map with a Mapbox basemap, we’ll likely want to focus it on a specific location. mapboxapi includes functionality for R users to interact with the Mapbox Search API. Implemented functions include mb_geocode()
for forward geocoding, which refers to the conversion of a description of a place (like an address) into longitude/latitude coordinates; and mb_reverse_geocode()
, which converts coordinates into a place description.
Both functions default to using the mapbox.places
API endpoint, which is to be used for temporary geocoding. This means that the endpoint cannot be used to store geocoded information nor can it be used for batch geocoding (e.g., a spreadsheet of addresses). These tasks are permissible with the mapbox.places-permanent
endpoint, which is not included with free accounts. In turn, R users looking for free batch geocoding solutions should use other packages like the tidygeocoder package. Mapbox geocoding with the mapbox.places
endpoint can be used to focus web maps and guide navigation services, which will be illustrated in the following sections.
Let’s use mb_geocode()
to identify the coordinates representing the University of Pennsylvania (specifically here, the university bookstore).
penn <- mb_geocode("3601 Walnut St, Philadelphia, PA 19104")
penn
## [1] -75.19600 39.95368
By default, mb_geocode()
returns a length-2 vector representing the longitude and latitude coordinates of the geocoded location. The function can also return an sf POINT object or an R list representing the full API response, if requested. Using the returned coordinates, we can focus our Leaflet Mapbox map with the setView()
function:
mapbox_map %>%
setView(lng = penn[1],
lat = penn[2],
zoom = 14)
Try it out! Make a Leaflet map in R using a Mapbox basemap of your choice, focused on a location of your choice. For locations in non-English-speaking countries: mb_geocode()
has a language
argument that can be used to improve the accuracy of queries in languages other than English. Supported languages (and how to specify them) are found in the Mapbox documentation here.
At the time of this workshop (October 9, 2020), the November 3rd election is less than a month away. This election is accompanied by massive questions around voter safety during the COVID-19 pandemic and voter suppression with unfounded concerns about voter fraud and mail-in ballots. In my home state of Texas, the governor has limited absentee ballot drop-off sites to one per county, creating significant accessibility issues for residents of large Texas counties.
Election accessibility can be analyzed using Mapbox services and the mapboxapi package. While the above examples are useful for quick queries and web mapping, my primary motivation for writing mapboxapi was to use Mapbox services for spatial data science tasks in R. As I already used Mapbox services heavily for my visualization projects, it made sense to write mapboxapi to connect these services with my existing sf-based data science workflows.
In this section of the workshop, we’ll explore three more advanced applications of mapboxapi within practical spatial data science workflows. We’ll examine how to visualize accessibility to a ballot drop-off location in Houston; identify areas where populations may have difficulty reaching early voting locations in Fort Worth; and build a routing app with Shiny that identifies the closest polling place to a user’s address. This section may include some new concepts or techniques - but it is designed to illustrate where you can go with mapboxapi in your work!
The tools we’ve learned how to use with mapboxapi can be used to analyze relative accessibility - or inaccessibility - to polling or ballot drop-off locations. Limiting ballot drop-off locations in Texas counties creates significant accessibility issues for Texas voters. For example, Harris County (Houston) will have one drop-off location for its 4.6 million residents, whereas many other counties in Texas have the same number of drop-off locations for populations smaller than 1,000.
We can visualize this situation in Harris County with layered isochrones. We already used this technique to show multiple drive times around the University of Pennsylvania earlier in this tutorial. In this case, we will use mb_isochrone()
to generate dozens of isochrones, then visualize them simultaneously to illustrate an accessibility gradient in the region.
We’ll first generate the isochrones using a vector of times, 1 through 45 at 1-minute intervals, around NRG Arena (the ballot drop-off site).
library(mapboxapi)
isos <- mb_isochrone(
location = "1 NRG Pkwy, Houston, TX 77054",
profile = "driving",
time = 1:45
)
Next, we can visualize our overlapping isochrones. We’ll use the viridis color palette as we did previously in the tutorial, and generate a color palette derived from the time
column in our dataset. Once specified, we can add these polygons to our Mapbox basemap with a mostly-transparent fill opacity.
pal <- colorNumeric("viridis", isos$time, na.color = "transparent")
mapbox_map %>%
addPolygons(data = isos,
fillColor = ~pal(time),
stroke = FALSE,
fillOpacity = 0.1) %>%
addLegend(values = isos$time,
pal = pal,
title = "Drive-time to NRG Arena")
The result illustrates some of the wide differences in accessibility between various parts of the region. One notable issue with this visualization approach, however, is that the layering of isochrones in the interior of Houston makes it difficult to view the basemap beneath them. This can be resolved by converting to a raster dataset and generating an “accessibility surface” for improved visualization.
Accessibility surfaces are commonly used in geographic information systems applications to identify the distance from any particular location to a geographic feature of interest. We can apply this concept to network-based accessibility by using mapboxapi tools. To create the accessibility surface, we will convert our isochrones to a raster dataset using the fasterize package. Raster datasets represent geographic information as grid cells defined by a cell size. Higher-resolution raster datasets are represented with smaller cell sizes.
To generate the accessibility surface raster, we will need to apply a coordinate system transformation to “project” our data to two-dimensional coordinates. This will allow us to specify the raster’s resolution in meters. We generate a 100m resolution raster, and use the fasterize()
function to allocate the minimum overlapping value from our isochrones to each grid cell. The result can then be mapped with Leaflet’s addRasterImage()
function.
library(fasterize)
library(sf)
isos_proj <- st_transform(isos, 32615)
template <- raster(isos_proj, resolution = 100)
iso_surface <- fasterize(isos_proj, template, field = "time", fun = "min")
mapbox_map %>%
addRasterImage(iso_surface, colors = pal, opacity = 0.5) %>%
addLegend(values = isos$time, pal = pal,
title = "Drive-time to NRG Arena")
Accessibility is now represented in a similar way, but with a clearer view of the basemap around NRG Arena.
The previous example illustrated how to model and visualize accessibility in Houston; however, it does not speak directly to who may have difficulties dropping off their ballots. Households with access to cars will have a much easier time reaching NRG Arena to drop off their ballots, for example, than those who need to rely on other methods of transportation. It also does not integrate other spatial data showing the boundaries of Harris County. In turn, a clearer analysis would cross-reference accessibility data with other data sources using spatial analysis. Fortunately, all of this can be completed within R!
Our task in this section is to find neighborhoods with limited access to early voting locations in Fort Worth, Texas, and cross-reference this with demographic data from the most recent American Community Survey, the US Census Bureau’s annual social and economic survey of US households. To get started, let’s load in some core packages for spatial data analysis. We’ll be using the following R packages:
To get started, we’ll load the required packages for analysis. We’ll also set the option tigris_use_cache = TRUE
to cache downloaded shapefiles (spatial data) from the Census website; this will store them for future use and guard against occasional website downtime.
library(tidyverse)
library(tidycensus)
options(tigris_use_cache = TRUE)
For this analysis, we’ll be using a dataset of early voting locations for Tarrant County, Texas, which represents the areas around Fort Worth and Arlington. There are 50 such locations around the county, allowing voters to cast their ballots between October 13 and October 30. This is a helpful alternative for voters who might not want to (or cannot) vot on Election Day on November 3rd.
We’ll read in a dataset of these early voting sites that I’ve already geocoded and converted to an sf POINT object. This dataset can be used to analyze which areas are immediately covered by accessible early voting options, and which are not. We’ll measure accessibility using isochrones as above, and consider a 20 minute walk-time around each polling location. mb_isochrone()
can accept sf objects as input, and will retain an ID from the input sf object if the column name is specified.
ev_sites <- read_rds("data/tarrant_EV_sites.rds")
walking_isos <- mb_isochrone(
ev_sites,
profile = "walking",
time = 20,
id = "name"
)
These results can be visualized on our Mapbox map:
mapbox_map %>%
addPolygons(data = walking_isos,
popup = ~id)
The map represents the reachable area within a 20-minute walk, modeled at an average walking speed for an able-bodied adult (about 5.1 km/hour). For individuals with disabilities, the elderly, or households without access to a car, getting to these polling sites may prove difficult in areas outside these isochrones. However, accessibility may be less of an issue in areas where car ownership is widespread. We can analyze this additional variable with demographic data, also obtained within R.
We’ll be using tidycensus to request data from the US Census Bureau API about the percentage of households who do not have access to an automobile. A full discussion of how to use tidycensus is beyond the scope of today’s tutorial, but you’ll learn a few things here. To use tidycensus, you must first obtain a Census API key, available at this link. The key will be emailed to you; once you activate it, you can pass it to the census_api_key()
function to set it (or install it) in your environment.
We can then request data from the American Community Survey’s 2014-2018 5-year dataset with the get_acs()
function. The variable we want is the percentage of households without access to a car, designated with the variable code DP04_0058P
and available in the ACS Data Profile. Please see the tidycensus documentation for more information about identifying appropriate variable IDs. We’ll request this data for Tarrant County, TX at the census tract level, which is the smallest available geography available for this information. The argument geometry = TRUE
uses the tigris package to download spatial data from the Census website and joins it internally to the ACS data you’ve acquired.
If you don’t already have a key (or cannot get one at this time), un-comment the appropriate line below and read in a saved version of the dataset.
# census_api_key("your key goes here", install = TRUE)
# no_cars <- read_rds("data/no_cars.rds")
no_cars <- get_acs(
geography = "tract",
variables = "DP04_0058P",
state = "TX",
county = "Tarrant",
geometry = TRUE
)
Let’s visualize this information on our Mapbox map:
driving_pal <- colorNumeric("viridis", no_cars$estimate)
mapbox_map %>%
addPolygons(data = no_cars,
fillColor = ~driving_pal(estimate),
fillOpacity = 0.5,
stroke = FALSE,
smoothFactor = 0.1,
label = ~round(estimate, 1)) %>%
addLegend(values = no_cars$estimate,
pal = driving_pal,
title = "% without access<br/>to automobile")
As shown visually in the map, a majority of households in all Tarrant County Census tracts have access to an automobile. However, there are some Census tracts where the percentage without access exceeds 15 or even 20 percent. That said, if those tracts are within a reasonable walk of a polling location, accessibility may not be as large of an issue. We can analyze this topic using spatial overlay.
Spatial overlay is a very common operation when working with spatial data. It can be used to determine which features in one spatial layer overlap with another spatial layer, or extract data from a layer based on geographic information. In R, spatial overlay can be integrated directly into tidyverse-style data analysis pipelines using functions in sf. In our example, we want to determine the areas in Tarrant County with the greatest proportion of households without access to a car that also are beyond a 20 minute walk from an early voting polling site.
To do this, we use this following steps:
no_cars
dataset to 4326, the same CRS used by the isochrones;st_difference()
function to “cut out” areas from those Census tracts that overlap the 20-minute walking isochrones.Once we complete this operation, we can visualize the result on our Mapbox map.
target_areas <- no_cars %>%
st_transform(4326) %>%
filter(estimate >= 15) %>%
st_difference(
st_union(walking_isos)
)
mapbox_map %>%
addPolygons(data = target_areas)
As the map illustrates, there are several areas within Tarrant County that are located beyond a 20-minute walk from an early voting location and have proportionally lower access to automobiles. Notable clusters of neighborhoods that meet this criteria are located in Fort Worth to the south of downtown and on the city’s East Side. Granted, this analysis is not definitive, but gives us some insights into potential issues with voting accessibility and how we might resolve them.
All of this information can be put together to build informative dashboards for the public using mapboxapi tools. The application shown below and available at this link uses mb_geocode()
, mb_matrix()
, and mb_directions()
to identify the closest early voting location to a user-specified address in Tarrant County, calculates the driving directions to that location, then visualizes the route along with driving instructions on the map. The code used to build this app is available in the Masterclass repository; app_local.R
is the minimal code to get the app working on your computer (assuming a Mapbox access token has been installed), and app.R
includes additional details necessary to deploy the app on the ShinyApps.io hosting service.
We’ll take a look right now at the Shiny app code; you can also view a live version of the app embedded below!
We’ve just scratched the surface of what you can do with Mapbox tools in R. While mapboxapi does not do map generation directly, there are options available for you. For more advanced (and fast!) visualization using Mapbox, I strongly recommend checking out the mapdeck package. This package is an interface to Uber’s deck.gl library, which is built on Mapbox tools.
Another option for visualization is to build your own custom maps using Mapbox Studio, Mapbox’s interactive web-based tool for cartographic design. Studio allows you to customize every aspect of their vector tiles for web mapping, making basemaps that are exactly to your specification. For comprehensive tutorials on how to work with Mapbox Studio, check out their tutorials. Here, I’ll just show you how to make a custom basemap very quickly and use it in your R projects.
Mapbox has created a fun tool called Cartogram that allows you to upload an image of your choice, which will be used to create a custom map style based on that image. Visit https://apps.mapbox.com/cartogram and upload an image of your choice! I’m using Penn’s athletics logo, though you can use whatever you’d like. If you are signed into your Mapbox account (which you should be from earlier in this tutorial), the style will save automatically to your account.
Click the “Saved style!” button at the top of the screen, and you’ll be transported to the Mapbox Studio editor with your custom Cartogram style loaded. There is much you can do here - but for now, click the “Share” button in the upper right of your screen to display the “Share and Develop” options.
Copy the “Style URL” and paste it in your R Markdown so you can see it; mine here is mapbox://styles/upenn-masterclass-demo1/ckfzordv11ha519nz3qw1v7nx
. After mapbox://styles/
, you’ll see your username and style ID. You may recall the beginning of the workshop when we used the Mapbox Streets style as a template for our R Leaflet maps. You can use this custom style in much the same way with addMapboxTiles()
:
leaflet() %>%
addMapboxTiles(style_id = "ckfzordv11ha519nz3qw1v7nx",
username = "upenn-masterclass-demo1") %>%
setView(lng = penn[1],
lat = penn[2],
zoom = 14)
Thanks for participating today! If you have more questions about mapboxapi or any of my other packages, feel free to get in touch. Also, be sure to share anything you’ve created based on what you’ve learned today on Twitter with the #rstats and #MusaMasterClass hashtags.