spatialAtomizeR implements atom-based Bayesian
regression methods (ABRM) for spatial data with misaligned grids. This
vignette demonstrates the basic workflow for analyzing misaligned
spatial data using both simulated and real-world examples.
Important: Always load nimble before
using spatialAtomizeR functions that involve model fitting.
This example demonstrates the complete workflow using simulated data.
Generate spatial data with misaligned grids. The function creates two non-overlapping grids (X-grid and Y-grid) and their intersection (atoms).
sim_data <- simulate_misaligned_data(
seed = 42,
# Distribution specifications for covariates
dist_covariates_x = c('normal', 'poisson', 'binomial'),
dist_covariates_y = c('normal', 'poisson', 'binomial'),
dist_y = 'poisson', # Outcome distribution
# Intercepts for data generation (REQUIRED)
x_intercepts = c(4, -1, -1), # One per X covariate
y_intercepts = c(4, -1, -1), # One per Y covariate
beta0_y = -1, # Outcome model intercept
# Spatial correlation parameters
x_correlation = 0.5, # Correlation between X covariates
y_correlation = 0.5, # Correlation between Y covariates
# True effect sizes for outcome model
beta_x = c(-0.03, 0.1, -0.2), # Effects of X covariates
beta_y = c(0.03, -0.1, 0.2) # Effects of Y covariates
)The simulated data object contains: - gridx: X-grid
spatial features with covariates - gridy: Y-grid spatial
features with covariates and outcome - atoms: Atom-level
spatial features (intersection of grids) - true_params:
True parameter values used in simulation
The simulate_misaligned_data function returns an object
of class misaligned_data with useful S3 methods:
Retrieve the pre-compiled NIMBLE model specification:
Fit the ABRM model. You must specify which covariates follow which distributions by providing their indices:
abrm_results <- run_abrm(
gridx = sim_data$gridx,
gridy = sim_data$gridy,
atoms = sim_data$atoms,
model_code = model_code,
true_params = sim_data$true_params, # Optional: for validation
# Map distribution indices to positions in dist_covariates_x/y
norm_idx_x = 1, # 'normal' is 1st in dist_covariates_x
pois_idx_x = 2, # 'poisson' is 2nd
binom_idx_x = 3, # 'binomial' is 3rd
norm_idx_y = 1, # Same for Y covariates
pois_idx_y = 2,
binom_idx_y = 3,
# Outcome distribution: 1=normal, 2=poisson, 3=binomial
dist_y = 2,
# MCMC parameters
niter = 50000, # Total iterations per chain
nburnin = 30000, # Burn-in iterations
nchains = 2 # Number of chains
)The run_abrm function returns an object of class
abrm with S3 methods:
# Check the class
class(abrm_results)
# Print method shows summary statistics
print(abrm_results)
# Summary method shows detailed parameter estimates
summary(abrm_results)
# Plot method shows MCMC diagnostics (if available)
plot(abrm_results)Access specific results:
This example demonstrates using real spatial data from Utah, matching the analysis in the manuscript.
set.seed(500)
# Load Utah counties (Y-grid)
cnty <- counties(state = 'UT')
gridy <- cnty %>%
rename(ID = GEOID) %>%
mutate(ID = as.numeric(ID)) # ID must be numeric
# Load Utah school districts (X-grid)
scd <- school_districts(state = 'UT')
gridx <- scd %>%
rename(ID = GEOID) %>%
mutate(ID = as.numeric(ID))ABRM requires population counts at the atom level. Here we use LandScan gridded population data:
# NOTE: You must download LandScan data separately
# Available at: https://landscan.ornl.gov/
# This example assumes the file is in your working directory
pop_raster <- raster("landscan-global-2022.tif")
# Extract population for each atom
pop_atoms <- raster::extract(
pop_raster,
st_transform(atoms, crs(pop_raster)),
fun = sum,
na.rm = TRUE
)
atoms$population <- pop_atomsFor demonstration, we generate synthetic outcome and covariate data at the atom level:
# Create atom-level spatial adjacency matrix
W_a <- nb2mat(
poly2nb(as(atoms, "Spatial"), queen = TRUE),
style = "B",
zero.policy = TRUE
)
# Generate spatial random effects
atoms <- atoms %>%
mutate(
re_x = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8),
re_z = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8),
re_y = gen_correlated_spat(W = W_a, n_vars = 1, correlation = 1, rho = 0.8)
)
# Generate atom-level covariates and outcome
atoms <- atoms %>%
mutate(
# X-grid covariate (Poisson)
covariate_x_1 = rpois(
n = nrow(atoms),
lambda = population * exp(-1 + re_x)
),
# Y-grid covariate (Normal)
covariate_y_1 = rnorm(
n = nrow(atoms),
mean = population * (3 + re_z),
sd = 1 * population
)
) %>%
mutate(
# Outcome (Poisson)
y = rpois(
n = nrow(atoms),
lambda = population * exp(
-5 +
1 * (covariate_x_1 / population) -
0.3 * (covariate_y_1 / population) +
re_y
)
)
)In reality, we only observe aggregated data at the grid level, not at atoms:
# Aggregate X covariates to X-grid
gridx[["covariate_x_1"]] <- sapply(gridx$ID, function(j) {
atom_indices <- which(atoms$ID_x == j)
sum(atoms[["covariate_x_1"]][atom_indices])
})
# Aggregate Y covariates to Y-grid
gridy[["covariate_y_1"]] <- sapply(gridy$ID, function(j) {
atom_indices <- which(atoms$ID_y == j)
sum(atoms[["covariate_y_1"]][atom_indices])
})
# Aggregate outcome to Y-grid
gridy$y <- sapply(gridy$ID, function(j) {
atom_indices <- which(atoms$ID_y == j)
sum(atoms$y[atom_indices])
})model_code <- get_abrm_model()
mcmc_results <- run_abrm(
gridx = gridx,
gridy = gridy,
atoms = atoms,
model_code = model_code,
# Specify covariate distributions
pois_idx_x = 1, # X covariate is Poisson
norm_idx_y = 1, # Y covariate is Normal
dist_y = 2, # Outcome is Poisson
# MCMC settings (increase for real analysis)
niter = 300000,
nburnin = 200000,
nchains = 2
)
# View results
summary(mcmc_results)When you specify covariates, the indices correspond to their positions:
dist_covariates_x = c('normal', 'poisson', 'binomial')
# ^1 ^2 ^3
# So you would specify:
norm_idx_x = 1 # First position
pois_idx_x = 2 # Second position
binom_idx_x = 3 # Third position| Code | Distribution | String | Use Case |
|---|---|---|---|
| 1 | Normal | ‘normal’ | Continuous data |
| 2 | Poisson | ‘poisson’ | Count/rate data |
| 3 | Binomial | ‘binomial’ | Proportion/binary data |
Compare ABRM with traditional dasymetric mapping:
comparison <- run_both_methods(
sim_data = sim_data,
sim_metadata = list(sim_number = 1, x_correlation = 0.5, y_correlation = 0.5),
model_code = model_code,
nimble_params = list(niter = 1000, nburnin = 500, thin = 2, nchains = 2),
output_dir = tempdir(),
norm_idx_x = c(1),
pois_idx_x = c(2),
binom_idx_x = c(3),
norm_idx_y = c(1),
pois_idx_y = c(2),
binom_idx_y = c(3),
dist_y = 2,
outcome_type = 'poisson'
)
# View comparison results
print(comparison)
summary(comparison)Error: “could not find function ‘getNimbleOption’” -
Solution: Load nimble before calling
get_abrm_model(): library(nimble)
Error: “unused arguments (res1 = c(5, 5), res2 = c(10,
10))” - Solution: The
simulate_misaligned_data() function does not have
res1 or res2 parameters. Grid resolutions are
fixed internally.
Error: “gridx must contain an area ID variable named
‘ID’” - Solution: Ensure your spatial
dataframes have a numeric column named ID
Warning: “pop_atoms length doesn’t match D!” - Usually harmless. The model should still run correctly.
x_intercepts: Intercepts for X covariates (must match
length of dist_covariates_x)y_intercepts: Intercepts for Y covariates (must match
length of dist_covariates_y)beta0_y: Outcome model interceptbeta_x: True coefficients for X covariates (if
simulating)beta_y: True coefficients for Y covariates (if
simulating)niter: Total MCMC iterations per chain (default:
50000)nburnin: Burn-in iterations to discard (default:
30000)nchains: Number of independent chains (default: 2)thin: Thinning interval (default: 10)?simulate_misaligned_data,
?run_abrm?spatialAtomizeR