class: center, middle, inverse, title-slide # Spatial Demography with Open Source Tools ### Kyle Walker ### May 5, 2021 --- ## Software and spatial demography <img src=img/commercial.png style="width: 700px"> --- ## Problems with traditional workflows * Point-and-click software can inhibit reproducibility and collaboration -- * Even if an analysis is scripted: -- - The use of commercial software makes workflows inaccessible to analysts who cannot afford the license fees; -- - Data access is rarely automated and is difficult to reproduce; -- - Data wrangling, modeling, and visualization are "siloed" in different commercial software packages --- ## Open source workflows * __Data sources__: open data APIs * __Data models__: GeoJSON, simple features (attributes + geometry in single file/object) * __Modeling & analytics tools__: sf + spdep (R), GeoPandas + PySAL (Python) * __Presentation tools__: cartography (ggplot2, tmap) + dashboarding (Shiny) in same software environment as analysis --- ## Open data APIs .pull-left[ * Open data APIs allow analysts to programmatically access data resources * Examples include official government APIs (US Census) and third-party APIs (e.g. Canada's CensusMapper) * Software tools like __censusapi__ and __tigris__ (US) and __cancensus__ (Canada) allow for programmatic access to data resources ] .pull-right[ ```r library(cancensus) vancouver_extract <- get_census( dataset = "CA16", regions = list(CSD = "5915022"), vectors = c("v_CA16_1364", "v_CA16_2397", "v_CA16_244"), level = "CT", geo_format = "sf", labels = "short" ) ``` ] --- ## Data models .pull-left[ * Frameworks like the __tidyverse__ (R) and __pandas__ (Python) are widely used for data representation in open-source workflows * Geospatial extensions to these frameworks (__sf__ for R, __GeoPandas__ for Python) allow spatial demographers to analyze data in similar ways to a regular dataset without dedicated GIS software ] .pull-right[ ```r library(tidyverse) library(sf) vancouver_data <- vancouver_extract %>% transmute( tract_id = GeoUID, pct_english = 100 * (v_CA16_1364 / Population), median_income = v_CA16_2397, pct_65_up = 100 * (v_CA16_244 / Population) ) %>% st_transform(3005) ``` ] --- ```r vancouver_data ``` ``` ## Simple feature collection with 117 features and 4 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: 1202113 ymin: 468924.5 xmax: 1217107 ymax: 481725.6 ## Projected CRS: NAD83 / BC Albers ## First 10 features: ## tract_id pct_english median_income pct_65_up ## 1 9330017.02 44.10434 63936 15.153800 ## 2 9330008.02 74.76190 125184 19.047619 ## 3 9330014.01 34.37095 64128 19.455253 ## 4 9330014.02 35.86957 53029 19.481605 ## 5 9330059.13 69.74542 72082 4.839049 ## 6 9330059.14 72.17253 78822 8.207857 ## 7 9330005.00 65.11915 44160 10.202684 ## 8 9330009.00 61.66409 60544 19.428411 ## 9 9330011.00 46.93878 63334 17.762661 ## 10 9330012.00 56.65268 71851 15.490002 ## geometry ## 1 MULTIPOLYGON (((1215538 473... ## 2 MULTIPOLYGON (((1206463 470... ## 3 MULTIPOLYGON (((1214371 471... ## 4 MULTIPOLYGON (((1214432 472... ## 5 MULTIPOLYGON (((1210995 477... ## 6 MULTIPOLYGON (((1210995 477... ## 7 MULTIPOLYGON (((1210212 470... ## 8 MULTIPOLYGON (((1208479 472... ## 9 MULTIPOLYGON (((1212078 471... ## 10 MULTIPOLYGON (((1212063 472... ``` --- ```r plot(vancouver_data$geometry) ``` <!-- --> --- ## Modeling & analytics .pull-left[ * Models fit to spatial data require modeling _spatial dependence_, often represented with a neighborhood spatial weights matrix * Example workflow: regionalization with the SKATER algorithm, equivalent to ArcGIS's Spatially Constrained Multivariate Clustering ] .pull-right[ <!-- --> ] --- ## Modeling and analytics <iframe src="img/mapview.html" frameborder="0" seamless scrolling="no" height="450" width="800"></iframe> --- ## Presentation tools <iframe src="img/ggiraph.html" frameborder="0" seamless scrolling="no" height="450" width="800"></iframe> --- ## Sharing and reproducibility .pull-left[ * This entire workflow is fully reproducible - [just take a look at the source code](https://github.com/walkerke/paa2021/blob/main/code/paa_code.R) and run it for yourselves! * Adoption of open science practices in spatial demography facilitates _collaboration_, _education_, and _innovation_ ] .pull-right[ <img src=img/github.png style="width: 400px"> ] --- class: middle, center, inverse ## Thank you!