library(spopt)
library(tidycensus)
library(tidyverse)
library(sf)
library(mapgl)
# Get block group data for Travis, Williamson, and Hays counties
austin_counties <- c("Travis", "Williamson", "Hays")
demand <- get_acs(
geography = "block group",
variables = c(pop = "B01003_001", income = "B19013_001"),
state = "TX",
county = austin_counties,
geometry = TRUE,
year = 2023,
output = "wide"
) |>
st_transform(4326) |>
filter(!is.na(incomeE)) |>
mutate(
# Spending potential: population weighted by relative income
spending = popE * (incomeE / median(incomeE, na.rm = TRUE))
)The Huff model (Huff 1963, 1964) is a classic approach to modeling consumer spatial behavior and market share. It predicts the probability that a consumer will visit a particular store based on two factors: the store’s attractiveness (often measured by size) and the distance the consumer must travel.
This probabilistic model is widely used in retail site selection, market analysis, and competitive assessment. In this vignette, we’ll use spopt’s huff() function to analyze grocery store competition in the Austin, Texas metropolitan area.
The Huff model formula
The probability that a consumer at location will visit store is:
Where:
- is the attractiveness of store (e.g., square footage)
- is the distance from consumer to store
- is the attractiveness exponent (typically positive)
- is the distance decay exponent (typically negative)
The model assumes consumers trade off attractiveness and distance - a larger store might attract customers from farther away, but consumers also prefer closer stores, all else being equal.
Setting up the analysis
Let’s analyze grocery store competition in the Austin metro area. We’ll use Census block group data to represent consumer demand locations, and a set of real store locations.
We’ve created a “spending potential” variable that combines population with income - block groups with higher incomes have more spending potential per capita. This will serve as our demand weight.
Now let’s define some store locations. For this example, I’ll create a simulated set of HEB and Whole Foods locations in the Austin area:
stores <- tibble(
id = paste0("Store_", 1:8),
name = c(
"HEB Mueller", "HEB Tech Ridge", "HEB Hancock", "HEB South Congress",
"Whole Foods Downtown", "Whole Foods Domain", "HEB Round Rock", "HEB Cedar Park"
),
chain = c(rep("HEB", 4), rep("Whole Foods", 2), rep("HEB", 2)),
sqft = c(80000, 75000, 55000, 70000, 40000, 35000, 85000, 72000),
lon = c(-97.7025, -97.6920, -97.7215, -97.7830,
-97.7495, -97.7235, -97.6790, -97.8200),
lat = c(30.2950, 30.4420, 30.3030, 30.2280,
30.2690, 30.4020, 30.5080, 30.5100)
) |>
st_as_sf(coords = c("lon", "lat"), crs = 4326)Running the Huff model
The huff() function calculates market probabilities and expected sales for each store. We need to provide:
-
demand: Consumer locations - can be polygons (like block groups) or points. For polygons, distances are computed from centroids automatically. -
stores: Store locations with attractiveness data -
attractiveness_col: The column measuring store attractiveness -
distance_exponent: Controls how quickly distance reduces attractiveness (negative values)
result <- huff(
demand = demand,
stores = stores,
attractiveness_col = "sqft",
attractiveness_exponent = 1.0,
distance_exponent = -1.5,
sales_potential_col = "spending"
)The function returns a list with two sf objects that include all results:
-
$demand: The original demand sf with added columns:-
.primary_store: ID of the highest-probability store for each location -
.entropy: Competition measure (higher = more competition between stores) -
.prob_<store_id>: Probability column for each store
-
-
$stores: The original stores sf with added columns:-
.market_share: Proportion of total market captured -
.expected_sales: Expected sales (sum of probability × sales potential)
-
Visualizing market areas
Let’s map the primary store for each block group - the store with the highest probability of being visited. Since result$demand is already an sf object with .primary_store included, we can map it directly:
maplibre(bounds = result$demand) |>
add_fill_layer(
id = "market_areas",
source = result$demand,
fill_color = match_expr(
column = ".primary_store",
values = 1:8,
stops = c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3",
"#ff7f00", "#a65628", "#f781bf", "#999999")
),
fill_opacity = 0.6
) |>
add_line_layer(
id = "borders",
source = result$demand,
line_color = "white",
line_width = 0.3
) |>
add_circle_layer(
id = "stores",
source = result$stores,
circle_color = "black",
circle_radius = 8,
circle_stroke_color = "white",
circle_stroke_width = 2
)The map shows each block group colored by its primary store. This gives us a quick sense of each store’s trade area - the geographic region from which it draws most customers. Note how assignments are not an exclusive function of distance; these probabilities will also be influenced by store size, which we are using as our attractiveness metric. In a real-world example, other retail metrics (traffic counts, foot traffic, etc.) might be used here.
Understanding market share
The stores result contains market share and expected sales for each location:
result$stores |>
st_drop_geometry() |>
select(name, chain, sqft, .expected_sales, .market_share) |>
arrange(desc(.market_share))# A tibble: 8 × 5
name chain sqft .expected_sales .market_share
<chr> <chr> <dbl> <dbl> <dbl>
1 HEB Round Rock HEB 85000 397953. 0.168
2 HEB Cedar Park HEB 72000 358571. 0.151
3 HEB South Congress HEB 70000 353076. 0.149
4 HEB Tech Ridge HEB 75000 335597. 0.142
5 HEB Mueller HEB 80000 328209. 0.139
6 HEB Hancock HEB 55000 234390. 0.0990
7 Whole Foods Downtown Whole Foods 40000 194714. 0.0822
8 Whole Foods Domain Whole Foods 35000 165036. 0.0697
The market share column (.market_share) shows each store’s share of total expected sales across the study area. Larger stores in less competitive areas tend to capture more market share.
Competition and entropy
The Huff model also calculates entropy for each demand location - a measure of competition intensity. Entropy is high when consumers have multiple similarly-attractive options, and low when one store dominates. This is included directly in result$demand:
maplibre(bounds = result$demand) |>
add_fill_layer(
id = "entropy",
source = result$demand,
fill_color = interpolate(
column = ".entropy",
values = c(0, 1.5, 2),
stops = c("#2166ac", "#f7f7f7", "#b2182b")
),
fill_opacity = 0.7,
tooltip = ".entropy"
) |>
add_circle_layer(
id = "stores",
source = result$stores,
circle_color = "black",
circle_radius = 8,
circle_stroke_color = "white",
circle_stroke_width = 2
)Areas with high entropy (red) have intense competition - consumers there have multiple good options. Low-entropy areas (blue) are dominated by a single store; these will often be areas immediately adjacent to a store location.
Evaluating new store locations
One of the most powerful applications of the Huff model is what-if analysis - evaluating how a new store would affect market share. Let’s test a potential new location in south Austin:
# Add a hypothetical new store
new_store <- tibble(
id = "Store_9",
name = "New HEB South Austin",
chain = "HEB",
sqft = 70000,
lon = -97.78,
lat = 30.20
) |>
st_as_sf(coords = c("lon", "lat"), crs = 4326)
stores_with_new <- bind_rows(stores, new_store)
# Re-run the model
result_new <- huff(
demand = demand,
stores = stores_with_new,
attractiveness_col = "sqft",
attractiveness_exponent = 1.0,
distance_exponent = -1.5,
sales_potential_col = "spending"
)
# Compare market shares
comparison <- result$stores |>
st_drop_geometry() |>
select(name, original_share = .market_share) |>
left_join(
result_new$stores |>
st_drop_geometry() |>
select(name, new_share = .market_share),
by = "name"
) |>
mutate(change = new_share - original_share)
comparison# A tibble: 8 × 4
name original_share new_share change
<chr> <dbl> <dbl> <dbl>
1 HEB Mueller 0.139 0.121 -0.0176
2 HEB Tech Ridge 0.142 0.131 -0.0105
3 HEB Hancock 0.0990 0.0865 -0.0125
4 HEB South Congress 0.149 0.114 -0.0349
5 Whole Foods Downtown 0.0822 0.0697 -0.0125
6 Whole Foods Domain 0.0697 0.0639 -0.00585
7 HEB Round Rock 0.168 0.158 -0.0105
8 HEB Cedar Park 0.151 0.141 -0.01000
This analysis shows how the new store would cannibalize sales from existing locations. Stores near the new location see their market share decline, while distant stores are not as affected.
Let’s visualize how the new store reshapes market areas:
maplibre(bounds = result_new$demand) |>
add_fill_layer(
id = "market_areas",
source = result_new$demand,
fill_color = match_expr(
column = ".primary_store",
values = 1:9,
stops = c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3",
"#ff7f00", "#a65628", "#f781bf", "#999999", "#66c2a5")
),
fill_opacity = 0.6
) |>
add_line_layer(
id = "borders",
source = result_new$demand,
line_color = "white",
line_width = 0.3
) |>
add_circle_layer(
id = "stores",
source = result_new$stores,
circle_color = "black",
circle_radius = 8,
circle_stroke_color = "white",
circle_stroke_width = 2
)Compare this to the original market area map above - the new store (teal) carves out its own trade area in south Austin, primarily at the expense of the nearby HEB South Congress location.
Adjusting model parameters
The distance_exponent and attractiveness_exponent parameters significantly affect results:
Distance exponent (typically -0.5 to -3.0):
- Values closer to 0: Consumers willing to travel far for attractive stores
- More negative values: Strong preference for nearby stores
Attractiveness exponent (typically 0.5 to 2.0):
- Values near 1: Linear relationship between size and attractiveness
- Values > 1: Larger stores disproportionately attractive
- Values < 1: Diminishing returns to size
# Test with stronger distance decay
result_steep <- huff(
demand = demand,
stores = stores,
attractiveness_col = "sqft",
attractiveness_exponent = 1.0,
distance_exponent = -2.5, # Much steeper decay
sales_potential_col = "spending"
)With a steeper distance decay (-2.5 vs -1.5), trade areas become smaller and more localized. This might be more realistic for convenience-oriented shopping where consumers prioritize proximity.
Using composite attractiveness
Store attractiveness isn’t always just about size. You might want to combine multiple factors - size, brand perception, parking availability, product selection. spopt supports multiple attractiveness variables directly:
# Add parking data to stores
stores_extended <- stores |>
mutate(parking = c(300, 250, 150, 200, 120, 100, 350, 275))
# Use both sqft and parking as attractiveness factors
result_multi <- huff(
demand = demand,
stores = stores_extended,
attractiveness_col = c("sqft", "parking"),
attractiveness_exponent = c(1.0, 0.5), # parking has diminishing returns
distance_exponent = -1.5,
sales_potential_col = "spending"
)The composite attractiveness is computed as . You can also pre-compute a single attractiveness column if you prefer:
stores_composite <- stores |>
mutate(
# HEB stores get a 20% brand premium
brand_factor = if_else(chain == "HEB", 1.2, 1.0),
attractiveness = sqft * brand_factor
)
result_composite <- huff(
demand = demand,
stores = stores_composite,
attractiveness_col = "attractiveness",
distance_exponent = -1.5,
sales_potential_col = "spending"
)Next steps
- Facility Location - Optimize facility placement
- Regionalization - Build custom regions from Census data
- Travel-Time Cost Matrices - Use driving times instead of straight-line distances
