Skip to contents

Geocoding with Mapbox

A core skill for analysts practicing location intelligence is geocoding. Geocoding refers to the conversion of a description of a place to geographic coordinates - typically longitude (X) and latitude (Y). The most common place description that we use is an address, which will vary in form around the world.

mapboxapi offers an interface to Mapbox’s brand-new v6 Geocoder as of package version 0.6. Let’s explore some of its features.

Basic address geocoding, or forward geocoding, is implemented with the mb_geocode() function. mb_geocode() accepts a description of a place, then returns a length-2 vector of XY coordinates representing the location of that place. Let’s try it out.

## Usage of the Mapbox APIs is governed by the Mapbox Terms of Service.
## Please visit https://www.mapbox.com/legal/tos/ for more information.
mb_geocode("445 5th Ave, New York NY 10016")
## [1] -73.98188  40.75162

mb_geocode() also accepts structured input as an R list, which is useful when you want to clearly specify components of an address. Here’s an example of how that works:

mb_geocode(
  structured_input = list(
    address_line1 = "445 5th Ave",
    place = "New York",
    region = "NY",
    postcode = "10016"
  )
)
## [1] -73.98188  40.75162

mb_geocode() can also return a simple features object with the option output = "sf" . Let’s assign the result of mb_geocode() to a variable, get an sf object back, then map it interactively with Mapbox GL JS via the mapgl R package.

library(mapgl)

office <- mb_geocode("445 5th Ave, New York NY 10016", output = "sf")

mapboxgl(center = c(office$longitude, office$latitude), zoom = 15) |> 
  add_markers(office, popup = "full_address")

Mapbox also offers reverse geocoding, which takes XY coordinates and attempts to convert those coordinates into a description of a place (like an address) at that location. Reverse geocoding is available in mapboxapi with the mb_reverse_geocode() function.

mb_reverse_geocode(c(-73.98188, 40.75162))
## [1] "445 5th Avenue, New York, New York 10016, United States"

Workflow: batch geocoding

One-off geocoding as illustrated above is very useful in targeted analyses and when building web mapping apps (more on this later in the vignette). For larger analyses, however, you’ll want to geocode addresses in bulk. This process is called batch geocoding.

Batch geocoding typically involves sending a table of addresses to a geocoding service and getting back XY coordinates for all of those addresses. With v6 of its geocoder, Mapbox opened up batch geocoding to all users, which is now implemented in the latest release of mapboxapi.

Let’s try it out with a real-world dataset. We’ll be working with a dataset of Adult Residential Care facilities in the state of California, obtained from the State of California Open Data portal. You can find this dataset in mapboxapi’s GitHub repository in vignettes/data. We’ll read in the dataset with readr::read_csv().

library(tidyverse)

ca_care <- read_csv("data/community-care-licensing-adult-residential-facility-locations.csv")

ca_care
## # A tibble: 25,402 × 16
##    `Facility Type`                    `Facility Number` `Facility Name` Licensee
##    <chr>                                          <dbl> <chr>           <chr>   
##  1 Adult Res Facility for Persons wi…         435201914 LIFE SERVICES … LIFE SE…
##  2 Adult Res Facility for Persons wi…         415201928 SAINT FRANCIS … ALBACRU…
##  3 Adult Res Facility for Persons wi…         415201932 ATENAR HOME, I… ATENAR …
##  4 Adult Res Facility for Persons wi…          15201936 CHABLIS HOME    NATIONA…
##  5 Adult Res Facility for Persons wi…          15201935 REGENT HOME     NATIONA…
##  6 Adult Res Facility for Persons wi…         435201962 VAST HORIZONS,… VAST HO…
##  7 Adult Res Facility for Persons wi…         435201963 VAST HORIZONS,… VAST HO…
##  8 Adult Res Facility for Persons wi…         435202005 CA NATIONAL ME… NATIONA…
##  9 Adult Res Facility for Persons wi…         435202006 CA NATIONAL ME… NATIONA…
## 10 Adult Res Facility for Persons wi…         435202007 FLORA HOME      NATIONA…
## # ℹ 25,392 more rows
## # ℹ 12 more variables: `Facility Administrator` <chr>,
## #   `Facility Telephone Number` <chr>, `Facility Address` <chr>,
## #   `Facility City` <chr>, `Facility State` <chr>, `Facility Zip` <chr>,
## #   `Regional Office` <dbl>, `County Name` <chr>, FAC_CAPACITY <dbl>,
## #   `Facility Status` <chr>, `Closed Date` <chr>, `License First Date` <chr>

We notice that the dataset has over 25,000 rows. It is a perfect candidate for geocoding, as it describes the locations of each adult care facility but doesn’t include longitude and latitude, so it can’t currently be mapped. While we could geocode all of these facilities - Mapbox’s free tier offers 100,000 free geocodes per month - this isn’t necessary. Let’s instead clean up the dataset and filter down to a specific county - Ventura County to the west of Los Angeles.

library(janitor)

ventura_care <- read_csv("data/community-care-licensing-adult-residential-facility-locations.csv") |> 
  clean_names() |> 
  filter(facility_status == "Licensed", county_name == "VENTURA")

ventura_care
## # A tibble: 118 × 16
##    facility_type   facility_number facility_name licensee facility_administrator
##    <chr>                     <dbl> <chr>         <chr>    <chr>                 
##  1 Adult Resident…       561701197 IBARRA ADULT… IBARRA,… MARIA S. CASTILLO     
##  2 Adult Resident…       561702356 COTTONWOOD, … CORTES,… FLORO CORTES          
##  3 Adult Resident…       561703268 MOUNTAIN VIE… VENIS C… CACCAM, VENIS 98      
##  4 Adult Resident…       561703272 CACCAM'S RES… VENIS C… VENIS CACCAM          
##  5 Adult Resident…       561703412 CUDAL BOARD … PERFECT… PERFECTO P. CUDAL     
##  6 Adult Resident…       565800397 RMC RESIDENT… CARINO … RICHARD T. CARINO II  
##  7 Adult Resident…       561703776 BARNARD FAMI… IBARRA,… KARLA IBARRA          
##  8 Adult Resident…       561703832 JOSEPHINE'S … CARINO … RICHARD T. CARINO II  
##  9 Adult Resident…       565800005 GEMILAN HOME… CAFUIR,… MELANIE MARIN         
## 10 Adult Resident…       565800010 MOUNTAIN VIE… CACCAM,… ADELINA ANDERSON      
## # ℹ 108 more rows
## # ℹ 11 more variables: facility_telephone_number <chr>, facility_address <chr>,
## #   facility_city <chr>, facility_state <chr>, facility_zip <chr>,
## #   regional_office <dbl>, county_name <chr>, fac_capacity <dbl>,
## #   facility_status <chr>, closed_date <chr>, license_first_date <chr>

Our dataset represents all currently licensed adult care facilities in Ventura County, which number 118.

The data are ready to be passed to mb_batch_geocode(). mb_batch_geocode() can take a single column, search_column, which contains full addresses. In this case, the address is split across multiple columns, which we can map to their corresponding arguments in the function.

ventura_care_sf <- ventura_care |> 
  mb_batch_geocode(
    address_line1 = "facility_address",
    place = "facility_city",
    region = "facility_state",
    postcode = "facility_zip"
  )

ventura_care_sf
## Simple feature collection with 118 features and 19 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -119.2763 ymin: 34.14292 xmax: -118.6592 ymax: 34.44916
## Geodetic CRS:  WGS 84
## # A tibble: 118 × 20
##    facility_type   facility_number facility_name licensee facility_administrator
##  * <chr>                     <dbl> <chr>         <chr>    <chr>                 
##  1 Adult Resident…       561701197 IBARRA ADULT… IBARRA,… MARIA S. CASTILLO     
##  2 Adult Resident…       561702356 COTTONWOOD, … CORTES,… FLORO CORTES          
##  3 Adult Resident…       561703268 MOUNTAIN VIE… VENIS C… CACCAM, VENIS 98      
##  4 Adult Resident…       561703272 CACCAM'S RES… VENIS C… VENIS CACCAM          
##  5 Adult Resident…       561703412 CUDAL BOARD … PERFECT… PERFECTO P. CUDAL     
##  6 Adult Resident…       565800397 RMC RESIDENT… CARINO … RICHARD T. CARINO II  
##  7 Adult Resident…       561703776 BARNARD FAMI… IBARRA,… KARLA IBARRA          
##  8 Adult Resident…       561703832 JOSEPHINE'S … CARINO … RICHARD T. CARINO II  
##  9 Adult Resident…       565800005 GEMILAN HOME… CAFUIR,… MELANIE MARIN         
## 10 Adult Resident…       565800010 MOUNTAIN VIE… CACCAM,… ADELINA ANDERSON      
## # ℹ 108 more rows
## # ℹ 15 more variables: facility_telephone_number <chr>, facility_address <chr>,
## #   facility_city <chr>, facility_state <chr>, facility_zip <chr>,
## #   regional_office <dbl>, county_name <chr>, fac_capacity <dbl>,
## #   facility_status <chr>, closed_date <chr>, license_first_date <chr>,
## #   matched_address <chr>, accuracy <chr>, confidence <chr>,
## #   geometry <POINT [°]>

A simple features object of geometry type POINT is returned. mb_batch_geocode() tries to make geocoding as simple as possible for you: table of addresses in, sf object out ready for mapping and analysis. We note that new accuracy and confidence columns are returned in the output object. accuracy gives you information about the type of geocode (see the Mapbox documentation for explanations) and confidence gives you the level of confidence Mapbox has in the geocoding result, ranging from “exact” to “low”.

Let’s map our geocoded results with clustered circles using the mapgl package.

mapboxgl(bounds = ventura_care_sf) |> 
  add_circle_layer(
    id = "care",
    source = ventura_care_sf,
    circle_color = "blue",
    circle_stroke_color = "white",
    circle_stroke_width = 2,
    cluster_options = cluster_options(
      count_stops = c(0, 25, 50)
    ),
    tooltip = "facility_name"
  )

Using Mapbox’s geocoder in Shiny

mapboxapi also helps you build Mapbox’s geocoder into your Shiny apps. The mapboxGeocoderInput() function allows you to use the Mapbox geocoder as a Shiny input. The geocoding result is captured as the value of the named input (e.g. input$geocode), which can be passed downstream to your analyses or maps in your Shiny app. The package also includes two functions to help you convert the geocoder’s result into a usable output: geocoder_as_xy(), which converts to a length-2 vector of longitude and latitude coordinates; and geocoder_as_sf(), which converts the result to an sf POINT object.

Here’s a minimal example of how to use mapboxGeocoderInput() in a Shiny app with the Leaflet package. The code follows below the image - try it out!

library(shiny)
library(bslib)
library(leaflet)
library(mapboxapi)

ui <- page_sidebar(
  title = "Address finder",
  sidebar = sidebar(
    p("Use the geocoder to find an address!"),
    mapboxGeocoderInput("geocoder",
                        placeholder = "Search for an address"),
    width = 300
  ), 
  card(
    leafletOutput("map")
  )
)

server <- function(input, output) {
  output$map <- renderLeaflet({
    leaflet() |> 
      addProviderTiles(provider = providers$OpenStreetMap) |> 
      setView(lng = -96.805,
              lat = 32.793,
              zoom = 12)
  })

  observe({
    xy <- geocoder_as_xy(input$geocoder)
    
    leafletProxy("map") |> 
      clearMarkers() |> 
      addMarkers(
        lng = xy[1],
        lat = xy[2]
      ) |> 
      flyTo(lng = xy[1],
            lat = xy[2],
            zoom = 14)
  }) |> 
    bindEvent(input$geocoder, ignoreNULL = TRUE)
  
}

shinyApp(ui, server)