If requested, tidycensus can return simple feature geometry for geographic units along with variables from the decennial US Census or American Community survey. By setting
geometry = TRUE in a tidycensus function call, tidycensus will use the tigris package to retrieve the corresponding geographic dataset from the US Census Bureau and pre-merge it with the tabular data obtained from the Census API. As of tidycensus version 0.9.9.2,
geometry = TRUE is supported for all geographies currently available in the package.
The following example shows median household income from the 2014-2018 ACS for Census tracts in Orange County, California:
## Simple feature collection with 6 features and 5 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -117.9766 ymin: 33.91732 xmax: -117.9374 ymax: 33.94607 ## geographic CRS: NAD83 ## GEOID NAME variable ## 1 06059001101 Census Tract 11.01, Orange County, California B19013_001 ## 2 06059001102 Census Tract 11.02, Orange County, California B19013_001 ## 3 06059001103 Census Tract 11.03, Orange County, California B19013_001 ## 4 06059001201 Census Tract 12.01, Orange County, California B19013_001 ## 5 06059001202 Census Tract 12.02, Orange County, California B19013_001 ## 6 06059001301 Census Tract 13.01, Orange County, California B19013_001 ## estimate moe geometry ## 1 95764 10990 MULTIPOLYGON (((-117.9765 3... ## 2 85903 9041 MULTIPOLYGON (((-117.9765 3... ## 3 67000 30125 MULTIPOLYGON (((-117.9681 3... ## 4 61401 9519 MULTIPOLYGON (((-117.9592 3... ## 5 65132 10761 MULTIPOLYGON (((-117.9505 3... ## 6 71034 10916 MULTIPOLYGON (((-117.9765 3...
orange looks much like the basic tidycensus output, but with a
geometry list-column describing the geometry of each feature, using the geographic coordinate system NAD 1983 (EPSG: 4269) which is the default for Census shapefiles. tidycensus uses the Census cartographic boundary shapefiles for faster processing; if you prefer the TIGER/Line shapefiles, set
cb = FALSE in the function call.
As the dataset is in a tidy format, it can be quickly visualized with the
geom_sf functionality currently in the development version of ggplot2:
Please note that the UTM Zone 11N coordinate system (
26911) is appropriate for Southern California but may not be for your area of interest.
One of the most powerful features of ggplot2 is its support for small multiples, which works very well with the tidy data format returned by tidycensus. Many Census and ACS variables return counts, however, which are generally inappropriate for choropleth mapping. In turn,
get_acs have an optional argument,
summary_var, that can work as a multi-group denominator when appropriate. Let’s use the following example of the racial geography of Harris County, Texas. First, we’ll request data for non-Hispanic whites, non-Hispanic blacks, non-Hispanic Asians, and Hispanics by Census tract for the 2010 Census, and specify total population as the summary variable.
year is not necessary here as the default is 2010.
## Simple feature collection with 6 features and 5 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -95.37528 ymin: 29.74486 xmax: -95.34125 ymax: 29.81385 ## geographic CRS: NAD83 ## # A tibble: 6 x 6 ## GEOID NAME variable value summary_value geometry ## <chr> <chr> <chr> <dbl> <dbl> <MULTIPOLYGON [°]> ## 1 48201… Census … White 2082 4690 (((-95.37348 29.751, -95.37… ## 2 48201… Census … White 2893 9652 (((-95.34125 29.76967, -95.… ## 3 48201… Census … White 332 5328 (((-95.36043 29.78975, -95.… ## 4 48201… Census … White 225 4882 (((-95.35039 29.80006, -95.… ## 5 48201… Census … White 935 5497 (((-95.35754 29.81019, -95.… ## 6 48201… Census … White 85 2485 (((-95.35028 29.81243, -95.…
We notice that there are four entries for each Census tract, with each entry representing one of our requested variables. The
summary_value column represents the value of the summary variable, which is total population in this instance. When a summary variable is specified in
summary_moe columns will be returned.
With this information, we can set up an analysis pipeline in which we calculate a new percent-of-total column; recode the Census variable names into more intuitive labels; and visualize the result for each group in a faceted plot.
Geometries in tidycensus default to the Census Bureau’s cartographic boundary shapefiles. Cartographic boundary shapefiles are preferred to the core TIGER/Line shapefiles in tidycensus as their smaller size speeds up processing and because they are pre-clipped to the US coastline.
However, there may be circumstances in which your mapping requires more detail. A good example of this would be maps of New York City, in which even the cartographic boundary shapefiles include water area. For example, take this example of median household income by Census tract in Manhattan (New York County), NY:
As illustrated in the graphic, the boundaries of Manhattan include water boundaries - stretching into the Hudson and East Rivers. In turn, a more accurate representation of Manhattan’s land area might be desired. To accomplish this, a tidycensus user can use the core TIGER/Line shapefiles instead, then erase water area from Manhattan’s geometry.
tidycensus allows users to get TIGER/Line instead of cartographic boundary shapefiles with the keyword argument
cb = FALSE. This argument will be familiar to users of the tigris package, as it is used by tigris to distinguish between cartographic boundary and TIGER/Line shapefiles in the package.
Next, tools in the tigris and sf package can be used to remove the water area from Manhattan’s Census tracts. sf allows users to “erase” one geometry from another, akin to tools available in desktop GIS software. The
st_erase() function defined below is not exported by the package, but is defined in the documentation for
The geometry used to “erase” water area from the tract polygons is obtained by the
area_water() function in tigris, making sure to choose the option
class = "sf".
After performing this operation, we can visualize the result:
The map appears as before, but instead the polygons now hug the shoreline of Manhattan.
Beyond this, you might be interested in writing your dataset to a shapefile or GeoJSON for use in external GIS or visualization applications. You can accomplish this with the
st_write function in the sf package:
Your tidycensus-obtained dataset can now be used in ArcGIS, QGIS, Tableau, or any other application that reads shapefiles.
There is a lot more you can do with the spatial functionality in tidycensus, including more sophisticated visualization and spatial analysis; look for updates on my blog and in this space.