Getting Started with spatialAtomizeR

Yunzhe Qian (Bella), Principal Investigators: Dr. Nancy Krieger and Dr. Rachel Nethery

2026-02-08

Overview

spatialAtomizeR implements atom-based Bayesian regression methods (ABRM) for spatial data with misaligned grids. This vignette demonstrates the basic workflow for analyzing misaligned spatial data using both simulated and real-world examples.

Installation

# Install from CRAN
install.packages("spatialAtomizeR")

# Or install development version from GitHub
devtools::install_github("bellayqian/spatialAtomizeR")

Load Required Packages

Important: Always load nimble before using spatialAtomizeR functions that involve model fitting.

library(spatialAtomizeR)
library(nimble)  # Required for ABRM models

Example 1: Simulated Data

This example demonstrates the complete workflow using simulated data.

Step 1: Simulate Misaligned Data

Generate spatial data with misaligned grids. The function creates two non-overlapping grids (X-grid and Y-grid) and their intersection (atoms).

sim_data <- simulate_misaligned_data(
  seed = 42,
  
  # Distribution specifications for covariates
  dist_covariates_x = c('normal', 'poisson', 'binomial'),
  dist_covariates_y = c('normal', 'poisson', 'binomial'),
  dist_y = 'poisson',  # Outcome distribution
  
  # Intercepts for data generation (REQUIRED)
  x_intercepts = c(4, -1, -1),      # One per X covariate
  y_intercepts = c(4, -1, -1),      # One per Y covariate
  beta0_y = -1,                     # Outcome model intercept
  
  # Spatial correlation parameters
  x_correlation = 0.5,  # Correlation between X covariates
  y_correlation = 0.5,  # Correlation between Y covariates
  
  # True effect sizes for outcome model
  beta_x = c(-0.03, 0.1, -0.2),    # Effects of X covariates
  beta_y = c(0.03, -0.1, 0.2)      # Effects of Y covariates
)

The simulated data object contains: - gridx: X-grid spatial features with covariates - gridy: Y-grid spatial features with covariates and outcome - atoms: Atom-level spatial features (intersection of grids) - true_params: True parameter values used in simulation

Step 2: Examine Simulated Data

The simulate_misaligned_data function returns an object of class misaligned_data with useful S3 methods:

# Check the class
class(sim_data)

# Print method shows basic information
print(sim_data)

# Summary method shows more details
summary(sim_data)

Step 3: Get ABRM Model Code

Retrieve the pre-compiled NIMBLE model specification:

model_code <- get_abrm_model()

Step 4: Run ABRM Analysis

Fit the ABRM model. You must specify which covariates follow which distributions by providing their indices:

abrm_results <- run_abrm(
  gridx = sim_data$gridx,
  gridy = sim_data$gridy,
  atoms = sim_data$atoms,
  model_code = model_code,
  true_params = sim_data$true_params,  # Optional: for validation
  
  # Map distribution indices to positions in dist_covariates_x/y
  norm_idx_x = 1,   # 'normal' is 1st in dist_covariates_x
  pois_idx_x = 2,   # 'poisson' is 2nd
  binom_idx_x = 3,  # 'binomial' is 3rd
  norm_idx_y = 1,   # Same for Y covariates
  pois_idx_y = 2,
  binom_idx_y = 3,
  
  # Outcome distribution: 1=normal, 2=poisson, 3=binomial
  dist_y = 2,
  
  # MCMC parameters
  niter = 50000,    # Total iterations per chain
  nburnin = 30000,  # Burn-in iterations
  nchains = 2       # Number of chains
)

Step 5: Examine ABRM Results

The run_abrm function returns an object of class abrm with S3 methods:

# Check the class
class(abrm_results)

# Print method shows summary statistics
print(abrm_results)

# Summary method shows detailed parameter estimates
summary(abrm_results)

# Plot method shows MCMC diagnostics (if available)
plot(abrm_results)

Access specific results:

# Parameter estimates table
abrm_results$parameter_estimates

# MCMC convergence diagnostics
abrm_results$mcmc_results$convergence

# Full MCMC samples
# abrm_results$mcmc_results$samples

Example 2: Real Data - Utah Counties and School Districts

This example demonstrates using real spatial data from Utah, matching the analysis in the manuscript.

Load Required Packages for Spatial Data

library(tigris)    # For US Census shapefiles
library(sf)        # Spatial features
library(sp)        # Spatial objects
library(spdep)     # Spatial dependence
library(raster)    # Spatial data manipulation
library(dplyr)     # Data manipulation
library(ggplot2)   # Plotting

Step 1: Load Utah Shapefiles

set.seed(500)

# Load Utah counties (Y-grid)
cnty <- counties(state = 'UT')
gridy <- cnty %>%
  rename(ID = GEOID) %>%
  mutate(ID = as.numeric(ID))  # ID must be numeric

# Load Utah school districts (X-grid)
scd <- school_districts(state = 'UT')
gridx <- scd %>%
  rename(ID = GEOID) %>%
  mutate(ID = as.numeric(ID))

Step 2: Visualize Spatial Misalignment

ggplot() +
  geom_sf(data = gridx, fill = NA, color = "orange", linewidth = 1.2) +
  geom_sf(data = gridy, fill = NA, color = "blue", 
          linetype = 'dashed', linewidth = 0.5) +
  labs(title = "Spatial Misalignment in Utah",
       subtitle = "Orange: School Districts | Blue: Counties") +
  theme_void()

Step 3: Create Atoms by Intersecting Grids

# Intersect the two grids to create atoms
atoms <- raster::intersect(as(gridy, 'Spatial'), as(gridx, 'Spatial'))
atoms <- sf::st_as_sf(atoms)

# Rename ID variables to match expected names
atoms <- atoms %>%
  rename(ID_y = ID_1,    # Y-grid (county) ID
         ID_x = ID_2)     # X-grid (school district) ID

Step 4: Add Atom-Level Population Counts

ABRM requires population counts at the atom level. Here we use LandScan gridded population data:

# NOTE: You must download LandScan data separately
# Available at: https://landscan.ornl.gov/
# This example assumes the file is in your working directory

pop_raster <- raster("landscan-global-2022.tif")

# Extract population for each atom
pop_atoms <- raster::extract(
  pop_raster, 
  st_transform(atoms, crs(pop_raster)),
  fun = sum, 
  na.rm = TRUE
)

atoms$population <- pop_atoms

Step 5: Generate Semi-Synthetic Outcome and Covariates

For demonstration, we generate synthetic outcome and covariate data at the atom level:

# Create atom-level spatial adjacency matrix
W_a <- nb2mat(
  poly2nb(as(atoms, "Spatial"), queen = TRUE),
  style = "B", 
  zero.policy = TRUE
)

# Generate spatial random effects
atoms <- atoms %>%
  mutate(
    re_x = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8),
    re_z = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8),
    re_y = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8)
  )

# Generate atom-level covariates and outcome
atoms <- atoms %>%
  mutate(
    # X-grid covariate (Poisson)
    covariate_x_1 = rpois(
      n = nrow(atoms), 
      lambda = population * exp(-1 + re_x)
    ),
    # Y-grid covariate (Normal)
    covariate_y_1 = rnorm(
      n = nrow(atoms), 
      mean = population * (3 + re_z), 
      sd = 1 * population
    )
  ) %>%
  mutate(
    # Outcome (Poisson)
    y = rpois(
      n = nrow(atoms),
      lambda = population * exp(
        -5 + 
        1 * (covariate_x_1 / population) - 
        0.3 * (covariate_y_1 / population) + 
        re_y
      )
    )
  )

Step 6: Aggregate to Observed Grids

In reality, we only observe aggregated data at the grid level, not at atoms:

# Aggregate X covariates to X-grid
gridx[["covariate_x_1"]] <- sapply(gridx$ID, function(j) {
  atom_indices <- which(atoms$ID_x == j)
  sum(atoms[["covariate_x_1"]][atom_indices])
})

# Aggregate Y covariates to Y-grid
gridy[["covariate_y_1"]] <- sapply(gridy$ID, function(j) {
  atom_indices <- which(atoms$ID_y == j)
  sum(atoms[["covariate_y_1"]][atom_indices])
})

# Aggregate outcome to Y-grid
gridy$y <- sapply(gridy$ID, function(j) {
  atom_indices <- which(atoms$ID_y == j)
  sum(atoms$y[atom_indices])
})

Step 7: Run ABRM on Real Data

model_code <- get_abrm_model()

mcmc_results <- run_abrm(
  gridx = gridx,
  gridy = gridy,
  atoms = atoms,
  model_code = model_code,
  
  # Specify covariate distributions
  pois_idx_x = 1,  # X covariate is Poisson
  norm_idx_y = 1,  # Y covariate is Normal
  dist_y = 2,      # Outcome is Poisson
  
  # MCMC settings (increase for real analysis)
  niter = 300000,
  nburnin = 200000,
  nchains = 2
)

# View results
summary(mcmc_results)

Understanding Distribution Indices

When you specify covariates, the indices correspond to their positions:

dist_covariates_x = c('normal', 'poisson', 'binomial')
#                       ^1        ^2         ^3

# So you would specify:
norm_idx_x = 1    # First position
pois_idx_x = 2    # Second position  
binom_idx_x = 3   # Third position

Distribution Type Codes

Code Distribution String Use Case
1 Normal ‘normal’ Continuous data
2 Poisson ‘poisson’ Count/rate data
3 Binomial ‘binomial’ Proportion/binary data

Method Comparison

Compare ABRM with traditional dasymetric mapping:

comparison <- run_both_methods(
  sim_data = sim_data,
  sim_metadata = list(sim_number = 1, x_correlation = 0.5, y_correlation = 0.5),
  model_code = model_code,
  nimble_params = list(niter = 1000, nburnin = 500, thin = 2, nchains = 2),
  output_dir = tempdir(),
  norm_idx_x = c(1),
  pois_idx_x = c(2),
  binom_idx_x = c(3),
  norm_idx_y = c(1),
  pois_idx_y = c(2),
  binom_idx_y = c(3),
  dist_y = 2,
  outcome_type = 'poisson'
)

# View comparison results
print(comparison)
summary(comparison)

Troubleshooting

Common Errors

Error: “could not find function ‘getNimbleOption’” - Solution: Load nimble before calling get_abrm_model(): library(nimble)

Error: “unused arguments (res1 = c(5, 5), res2 = c(10, 10))” - Solution: The simulate_misaligned_data() function does not have res1 or res2 parameters. Grid resolutions are fixed internally.

Error: “gridx must contain an area ID variable named ‘ID’” - Solution: Ensure your spatial dataframes have a numeric column named ID

Warnings

Warning: “pop_atoms length doesn’t match D!” - Usually harmless. The model should still run correctly.

Key Parameters Reference

Critical Parameters (Don’t Omit!)

MCMC Parameters

Getting Help

Citation

citation("spatialAtomizeR")

Reference: Qian, Y., Johnson, N., Krieger, N., & Nethery, R.C. (2024). spatialAtomizeR: Atom-Based Regression Models for Spatially Misaligned Data in R. R package version 0.2.4.