Get data from the US Census Bureau Population Estimates Program

The get_estimates() function requests data from the US Census Bureau's Population Estimates Program (PEP) datasets. The PEP datasets are defined by the US Census Bureau as follows: "The Census Bureau's Population Estimates Program (PEP) produces estimates of the population for the United States, its states, counties, cities, and towns, as well as for the Commonwealth of Puerto Rico and its municipios. Demographic components of population change (births, deaths, and migration) are produced at the national, state, and county levels of geography. Additionally, housing unit estimates are produced for the nation, states, and counties. PEP annually utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census and produce a time series of estimates of population, demographic components of change, and housing units. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously-produced estimates for those dates."

get_estimates(
  geography = c("us", "region", "division", "state", "county", "county subdivision",
    "place/balance (or part)", "place", "consolidated city", "place (or part)",
    "metropolitan statistical area/micropolitan statistical area", "cbsa",
    "metropolitan division", "combined statistical area"),
  product = NULL,
  variables = NULL,
  breakdown = NULL,
  breakdown_labels = FALSE,
  vintage = 2022,
  year = vintage,
  state = NULL,
  county = NULL,
  time_series = FALSE,
  output = "tidy",
  geometry = FALSE,
  keep_geo_vars = FALSE,
  shift_geo = FALSE,
  key = NULL,
  show_call = FALSE,
  ...
)

Arguments

geography

The geography of your data. Available geographies for the most recent data vintage are listed here. "cbsa" may be used an alias for "metropolitan statistical area/micropolitan statistical area".

product

The data product (optional). "population", "components" "housing", and "characteristics" are supported.

For 2020 and later, the only supported product is "characteristics".

variables

A character string or vector of character strings of requested variables. For years 2020 and later, use variables = "all" to request all available variables.

breakdown

The population breakdown used when product = "characteristics". Acceptable values are "AGEGROUP", "RACE", "SEX", and "HISP", for Hispanic/Not Hispanic. These values can be combined in a vector, returning population estimates in the value column for all combinations of these breakdowns. For years 2020 and later, "AGE" is also available for single-year age when using geography = "state".

breakdown_labels

Whether or not to label breakdown elements returned when product = "characteristics". Defaults to FALSE.

vintage

It is recommended to use the most recent vintage available for a given decennial series (so, year = 2019 for the 2010s, and year = 2023 for the 2020s). Will default to 2022 until the full PEP for 2023 is released.

year

The data year (defaults to the vintage requested). Use time_series = TRUE to access time-series estimates.

state

The state for which you are requesting data. State names, postal codes, and FIPS codes are accepted. Defaults to NULL.

county

The county for which you are requesting data. County names and FIPS codes are accepted. Must be combined with a value supplied to `state`. Defaults to NULL.

time_series

If TRUE, the function will return a time series of observations back to the decennial Census of 2010. The returned column is either "DATE", representing a particular estimate date, or "PERIOD", representing a time period (e.g. births between 2016 and 2017), and contains integers representing those values. Integer to date or period mapping is available at https://www.census.gov/data/developers/data-sets/popest-popproj/popest/popest-vars/2019.html.

output

One of "tidy" (the default) in which each row represents an enumeration unit-variable combination, or "wide" in which each row represents an enumeration unit and the variables are in the columns.

geometry

if FALSE (the default), return a regular tibble of ACS data. if TRUE, uses the tigris package to return an sf tibble with simple feature geometry in the `geometry` column.

keep_geo_vars

if TRUE, keeps all the variables from the Census shapefile obtained by tigris. Defaults to FALSE.

shift_geo

(deprecated) if TRUE, returns geometry with Alaska and Hawaii shifted for thematic mapping of the entire US. As of May 2021, we recommend using tigris::shift_geometry() instead.

key

Your Census API key. Obtain one at https://api.census.gov/data/key_signup.html. Can be stored in your .Renviron with census_api_key("YOUR KEY", install = TRUE)

show_call

if TRUE, display call made to Census API. This can be very useful in debugging and determining if error messages returned are due to tidycensus or the Census API. Copy to the API call into a browser and see what is returned by the API directly. Defaults to FALSE.

...

other keyword arguments

Value

A tibble, or sf tibble, of population estimates data

Details

get_estimates() requests data from the Population Estimates API for years 2019 and earlier; however the Population Estimates are no longer supported on the API as of 2020. For recent years, get_estimates() reads a flat file from the Census website and parses it. This means that arguments and output for 2020 and later datasets may differ slightly from datasets acquired for 2019 and earlier.

As of April 2022, variables available for 2020 and later datasets are as follows: ESTIMATESBASE, POPESTIMATE, NPOPCHG, BIRTHS, DEATHS, NATURALCHG, INTERNATIONALMIG, DOMESTICMIG, NETMIG, RESIDUAL, GQESTIMATESBASE, GQESTIMATES, RBIRTH, RDEATH, RNATURALCHG, RINTERNATIONALMIG, RDOMESTICMIG, and RNETMIG.

Get data from the US Census Bureau Population Estimates Program

Arguments

Value

Details

See also