Example 1: Simulated Data

This example demonstrates the complete workflow using simulated data.

Step 1: Simulate Misaligned Data

Generate spatial data with misaligned grids. The function creates two non-overlapping grids (X-grid and Y-grid) and their intersection (atoms).

sim_data <- simulate_misaligned_data(
  seed = 42,
  
  # Distribution specifications for covariates
  dist_covariates_x = c('normal', 'poisson', 'binomial'),
  dist_covariates_y = c('normal', 'poisson', 'binomial'),
  dist_y = 'poisson',  # Outcome distribution
  
  # Intercepts for data generation (REQUIRED)
  x_intercepts = c(4, -1, -1),      # One per X covariate
  y_intercepts = c(4, -1, -1),      # One per Y covariate
  beta0_y = -1,                     # Outcome model intercept
  
  # Spatial correlation parameters
  x_correlation = 0.5,  # Correlation between X covariates
  y_correlation = 0.5,  # Correlation between Y covariates
  
  # True effect sizes for outcome model
  beta_x = c(-0.03, 0.1, -0.2),    # Effects of X covariates
  beta_y = c(0.03, -0.1, 0.2)      # Effects of Y covariates
)

The simulated data object contains: - gridx: X-grid spatial features with covariates - gridy: Y-grid spatial features with covariates and outcome - atoms: Atom-level spatial features (intersection of grids) - true_params: True parameter values used in simulation

Step 2: Examine Simulated Data

The simulate_misaligned_data function returns an object of class misaligned_data with useful S3 methods:

# Check the class
class(sim_data)

# Print method shows basic information
print(sim_data)

# Summary method shows more details
summary(sim_data)

Step 3: Get ABRM Model Code

Retrieve the pre-compiled NIMBLE model specification:

model_code <- get_abrm_model()

Step 4: Run ABRM Analysis

Fit the ABRM model. You must specify which covariates follow which distributions by providing their indices:

abrm_results <- run_abrm(
  gridx = sim_data$gridx,
  gridy = sim_data$gridy,
  atoms = sim_data$atoms,
  model_code = model_code,
  true_params = sim_data$true_params,  # Optional: for validation
  
  # Map distribution indices to positions in dist_covariates_x/y
  norm_idx_x = 1,   # 'normal' is 1st in dist_covariates_x
  pois_idx_x = 2,   # 'poisson' is 2nd
  binom_idx_x = 3,  # 'binomial' is 3rd
  norm_idx_y = 1,   # Same for Y covariates
  pois_idx_y = 2,
  binom_idx_y = 3,
  
  # Outcome distribution: 1=normal, 2=poisson, 3=binomial
  dist_y = 2,
  
  # MCMC parameters
  niter = 50000,    # Total iterations per chain
  nburnin = 30000,  # Burn-in iterations
  nchains = 2,      # Number of chains
  seed = 123
)

Step 5: Examine ABRM Results

The run_abrm function returns an object of class abrm with S3 methods:

# Check the class
class(abrm_results)

# Print method shows summary statistics
print(abrm_results)

# Summary method shows detailed parameter estimates
summary(abrm_results)

# Plot method shows MCMC diagnostics (if available)
plot(abrm_results)

Access vairance-covariance matrices from the MCMC posterior sample

# Get variance-covariance matrices
vcov_matrices <- vcov(abrm_results)

# Print summary
print(vcov_matrices)

# Access specific matrices
vcov_matrices$vcov_beta_x   # Variance-covariance for X-grid coefficients
vcov_matrices$vcov_beta_y   # Variance-covariance for Y-grid coefficients
vcov_matrices$vcov_beta_0   # Variance of intercept
vcov_matrices$vcov_all      # Combined matrix for all parameters

# Compute standard errors from diagonal
sqrt(diag(vcov_matrices$vcov_beta_x))

# Compute correlation matrix from covariance matrix
cov2cor(vcov_matrices$vcov_beta_y)

# Check if parameters are correlated
vcov_matrices$vcov_beta_x[1, 2]  # Covariance between first two X coefficients

Access specific results:

# Parameter estimates table
abrm_results$parameter_estimates

# MCMC convergence diagnostics
abrm_results$mcmc_results$convergence

# Full MCMC samples
# abrm_results$mcmc_results$samples

Example 2: Real Data - Utah Counties and School Districts

This example demonstrates using real spatial data from Utah, matching the analysis in the manuscript.

Load Required Packages for Spatial Data

library(tigris)    # For US Census shapefiles
library(sf)        # Spatial features
library(sp)        # Spatial objects
library(spdep)     # Spatial dependence
library(raster)    # Spatial data manipulation
library(dplyr)     # Data manipulation
library(ggplot2)   # Plotting

Step 1: Load Utah Shapefiles

set.seed(500)

# Load Utah counties (Y-grid)
cnty <- counties(state = 'UT')
gridy <- cnty %>%
  rename(ID = GEOID) %>%
  mutate(ID = as.numeric(ID))  # ID must be numeric

# Load Utah school districts (X-grid)
scd <- school_districts(state = 'UT')
gridx <- scd %>%
  rename(ID = GEOID) %>%
  mutate(ID = as.numeric(ID))

Step 2: Visualize Spatial Misalignment

ggplot() +
  geom_sf(data = gridx, fill = NA, color = "orange", linewidth = 1.2) +
  geom_sf(data = gridy, fill = NA, color = "blue", 
          linetype = 'dashed', linewidth = 0.5) +
  labs(title = "Spatial Misalignment in Utah",
       subtitle = "Orange: School Districts | Blue: Counties") +
  theme_void()

Step 3: Create Atoms by Intersecting Grids

# Intersect the two grids to create atoms
atoms <- raster::intersect(as(gridy, 'Spatial'), as(gridx, 'Spatial'))
atoms <- sf::st_as_sf(atoms)

# Rename ID variables to match expected names
atoms <- atoms %>%
  rename(ID_y = ID_1,    # Y-grid (county) ID
         ID_x = ID_2)     # X-grid (school district) ID

Step 4: Add Atom-Level Population Counts

ABRM requires population counts at the atom level. Here we use LandScan gridded population data:

# NOTE: You must download LandScan data separately
# Available at: https://landscan.ornl.gov/
# This example assumes the file is in your working directory

pop_raster <- raster("landscan-global-2022.tif")

# Extract population for each atom
pop_atoms <- raster::extract(
  pop_raster, 
  st_transform(atoms, crs(pop_raster)),
  fun = sum, 
  na.rm = TRUE
)

atoms$population <- pop_atoms

Step 5: Generate Semi-Synthetic Outcome and Covariates

For demonstration, we generate synthetic outcome and covariate data at the atom level:

# Create atom-level spatial adjacency matrix
W_a <- nb2mat(
  poly2nb(as(atoms, "Spatial"), queen = TRUE),
  style = "B", 
  zero.policy = TRUE
)

# Generate spatial random effects
atoms <- atoms %>%
  mutate(
    re_x = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8),
    re_z = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8),
    re_y = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8)
  )

# Generate atom-level covariates and outcome
atoms <- atoms %>%
  mutate(
    # X-grid covariate (Poisson)
    covariate_x_1 = rpois(
      n = nrow(atoms), 
      lambda = population * exp(-1 + re_x)
    ),
    # Y-grid covariate (Normal)
    covariate_y_1 = rnorm(
      n = nrow(atoms), 
      mean = population * (3 + re_z), 
      sd = 1 * population
    )
  ) %>%
  mutate(
    # Outcome (Poisson)
    y = rpois(
      n = nrow(atoms),
      lambda = population * exp(
        -5 + 
        1 * (covariate_x_1 / population) - 
        0.3 * (covariate_y_1 / population) + 
        re_y
      )
    )
  )

Step 6: Aggregate to Observed Grids

In reality, we only observe aggregated data at the grid level, not at atoms:

# Aggregate X covariates to X-grid
gridx[["covariate_x_1"]] <- sapply(gridx$ID, function(j) {
  atom_indices <- which(atoms$ID_x == j)
  sum(atoms[["covariate_x_1"]][atom_indices])
})

# Aggregate Y covariates to Y-grid
gridy[["covariate_y_1"]] <- sapply(gridy$ID, function(j) {
  atom_indices <- which(atoms$ID_y == j)
  sum(atoms[["covariate_y_1"]][atom_indices])
})

# Aggregate outcome to Y-grid
gridy$y <- sapply(gridy$ID, function(j) {
  atom_indices <- which(atoms$ID_y == j)
  sum(atoms$y[atom_indices])
})

Step 7: Run ABRM on Real Data

model_code <- get_abrm_model()

mcmc_results <- run_abrm(
  gridx = gridx,
  gridy = gridy,
  atoms = atoms,
  model_code = model_code,
  
  # Specify covariate distributions
  pois_idx_x = 1,  # X covariate is Poisson
  norm_idx_y = 1,  # Y covariate is Normal
  dist_y = 2,      # Outcome is Poisson
  
  # MCMC settings (increase for real analysis)
  niter = 300000,
  nburnin = 200000,
  nchains = 2,
  seed = 123
)

# View results
summary(mcmc_results)

Understanding Distribution Indices

When you specify covariates, the indices correspond to their positions:

dist_covariates_x = c('normal', 'poisson', 'binomial')
#                       ^1        ^2         ^3

# So you would specify:
norm_idx_x = 1    # First position
pois_idx_x = 2    # Second position  
binom_idx_x = 3   # Third position

Distribution Type Codes

Code	Distribution	String	Use Case
1	Normal	‘normal’	Continuous data
2	Poisson	‘poisson’	Count/rate data
3	Binomial	‘binomial’	Proportion/binary data

Troubleshooting

Common Errors

Error: “could not find function ‘getNimbleOption’” - Solution: Load nimble before calling get_abrm_model(): library(nimble)

Error: “unused arguments (res1 = c(5, 5), res2 = c(10, 10))” - Solution: The simulate_misaligned_data() function does not have res1 or res2 parameters. Grid resolutions are fixed internally.

Error: “gridx must contain an area ID variable named ‘ID’” - Solution: Ensure your spatial dataframes have a numeric column named ID

Key Parameters Reference

Critical Parameters (Don’t Omit!)

x_intercepts: Intercepts for X covariates (must match length of dist_covariates_x)
y_intercepts: Intercepts for Y covariates (must match length of dist_covariates_y)
beta0_y: Outcome model intercept
beta_x: True coefficients for X covariates (if simulating)
beta_y: True coefficients for Y covariates (if simulating)

MCMC Parameters

niter: Total MCMC iterations per chain (default: 50000)
nburnin: Burn-in iterations to discard (default: 30000)
nchains: Number of independent chains (default: 2)
thin: Thinning interval (default: 10)

Getting Help

Function documentation: ?simulate_misaligned_data, ?run_abrm
Report issues: https://github.com/bellayqian/spatialAtomizeR/issues
Package help: ?spatialAtomizeR

Getting Started with spatialAtomizeR

Yunzhe Qian (Bella), Principal Investigators: Dr. Nancy Krieger and Dr. Rachel Nethery

2026-02-13

Overview

Installation

Load Required Packages