When fitting multi-breakpoint models, a central question is: how
many change-points does the data actually support?
smoothbp_ss() addresses this with a Kuo &
Mallick (1998) spike-and-slab prior on each slope-change
coefficient \(\delta_k\).
Each \(\delta_k\) is paired with a binary inclusion indicator \(\gamma_k\):
The posterior mean of each \(\gamma_k\) is the posterior inclusion probability (PIP) — the Bayesian probability that change-point \(k\) is real. A PIP near 1 means strong evidence for that breakpoint; a PIP near 0 means the data are consistent with no structural break there.
Tip: If you already know where a breakpoint occurred (e.g., a policy change or an intervention), you don’t need to estimate its location. Use the
fixed()helper to test the intervention effect — seevignette("intervention-analysis").
library(smoothbp)
library(posterior)
library(ggplot2)
library(dplyr)
library(tidyr)
set.seed(42)
n <- 200
tau <- seq(0, 10, length.out = n)
# True model: one breakpoint at omega = 5, delta = -1.0
om_true <- 5
rho_true <- 4
b0_true <- 2
b1_true <- 0.5
delta_true <- -1.0
di <- tau - om_true
si <- plogis(di * rho_true)
mu <- b0_true + b1_true * di + delta_true * di * si
y <- mu + rnorm(n, sd = 0.2)
dat <- data.frame(y = y, tau = tau)We let the model have 3 candidate breakpoints but expect only one to receive a high PIP:
fit_ss <- smoothbp_ss(
formula = y ~ tau,
b0 = ~ 1,
b1 = ~ 1,
deltas = list(~ 1, ~ 1, ~ 1),
omega = list(~ 1, ~ 1, ~ 1),
rho = list(~ 1, ~ 1, ~ 1),
data = dat,
spike = prior_spike_slab(pi = 0.1, learn_pi = TRUE),
priors = smoothbp_priors(
omega = space_omega_priors(K = 3, tau_min = 0, tau_max = 10)
),
chains = 2, iter = 2000, warmup = 1000, seed = 42
)pip(fit_ss)
#> gamma_b1_(Intercept) gamma_delta1_(Intercept) gamma_delta2_(Intercept) gamma_delta3_(Intercept)
#> 1.00 0.06 0.98 0.04
# Full posterior summary (just the gammas)
summarise_draws(fit_ss$draws) |>
filter(grepl("^gamma_delta", variable))The second candidate breakpoint has a PIP near 1, matching the data-generating truth (omega = 5 maps to the second candidate position). The other two breakpoints are effectively excluded.
We demonstrate that fitting a 3-BP model to 1-BP data does not overfit when using spike-and-slab priors.
The spike-and-slab model allows us to perform model selection and estimation simultaneously. We can overlay the posterior mean trajectory and its 95% credible interval on the true data-generating function:
# Get posterior predictions (mean and intervals)
pred <- fitted(fit_ss)
plot_df <- dat %>%
mutate(
mu_true = mu, # 'mu' is from the simulation step
mu_fit = pred$fitted_mean,
lo = pred$fitted_Q2.5,
hi = pred$fitted_Q97.5
)
ggplot(plot_df, aes(x = tau)) +
geom_point(aes(y = y), alpha = 0.3, size = 1) +
geom_ribbon(aes(ymin = lo, ymax = hi), fill = "#2c7bb6", alpha = 0.2) +
geom_line(aes(y = mu_true, colour = "Truth"), linewidth = 1) +
geom_line(aes(y = mu_fit, colour = "smoothbp_ss"), linewidth = 0.8, linetype = "dashed") +
scale_colour_manual(values = c("Truth" = "black", "smoothbp_ss" = "#d7191c")) +
labs(
title = "Model fit: Truth vs Posterior Predictions",
subtitle = "The SS model correctly ignores the 2 redundant breakpoints and recovers the trajectory.",
x = "Time (tau)", y = "Response (y)", colour = NULL
) +
theme_minimal()When you have many candidate breakpoints, setting a fixed prior \(\pi\) can be influential. Setting
learn_pi = TRUE places a Beta hyperprior on a shared \(\pi\) and lets the data inform the overall
sparsity level.
fit_lpi <- smoothbp_ss(
formula = y ~ tau,
b0 = ~ 1,
b1 = ~ 1,
deltas = rep(list(~ 1), 5), # 5 candidate breakpoints
omega = rep(list(~ 1), 5),
rho = rep(list(~ 1), 5),
data = dat,
spike = prior_spike_slab(
learn_pi = TRUE,
a = 1, # Beta(1, 4) prior on pi: mean = 0.2
b = 4
),
chains = 2, iter = 3000, warmup = 1500, seed = 42
)
# Posterior for pi (sparsity level)
pi_draws <- as.numeric(as_draws_matrix(fit_lpi$draws)[, "pi"])
quantile(pi_draws, c(0.025, 0.5, 0.975))
#> 2.5% 50% 97.5%
#> 0.04 0.20 0.47
# Contracted toward sparsity: only ~1/5 breakpoints are realThe slab standard deviation controls the range of plausible non-zero slope changes. Match the slab width to your expected effect scale:
| Expected slope change | Suggested slab |
|---|---|
| Small (< 0.5) | prior_normal(0, 0.5) |
| Moderate (0.5–2) | prior_normal(0, 1) |
| Large / uncertain | prior_normal(0, 2) |
| PIP | Interpretation |
|---|---|
| > 0.95 | Strong evidence for that breakpoint |
| 0.75–0.95 | Moderate evidence |
| 0.25–0.75 | Inconclusive |
| < 0.25 | Little evidence — breakpoint not needed |
For formal model selection, the median probability model includes all breakpoints with PIP > 0.5.
The gamma indicators should switch between 0 and 1 frequently during sampling. Check with:
# Gamma trace plots — look for healthy switching
trace_plot(fit_ss, pars = grep("^gamma_delta", posterior::variables(fit_ss$draws), value = TRUE))If a gamma is stuck at 1 across all iterations, consider: - Widening the slab (the prior may be too narrow to allow exclusion). - Checking the prior on omega (a badly-placed omega prior can pin the breakpoint in an implausible region).
If a gamma is stuck at 0, the breakpoint is genuinely not supported by the data — this is the desired behaviour.
smoothbp_ss against smoothbp
with LOOSpike-and-slab and LOO are complementary model-selection tools:
# Fit explicit 0-BP and 1-BP models for context
fit0 <- smoothbp(y ~ tau, deltas = list(), omega = list(), rho = list(),
data = dat, chains = 2, iter = 2000, warmup = 1000)
fit1 <- smoothbp(y ~ tau, deltas = list(~ 1), omega = list(~ 1), rho = list(~ 1),
data = dat, chains = 2, iter = 2000, warmup = 1000)
# LOO comparison: fixed-dimension vs spike-and-slab
loo::loo_compare(loo(fit1), loo(fit_ss))
#> elpd_diff se_diff
#> fit1 0.0 0.0
#> fit_ss -0.2 0.4 # Both models perform identically
# PIP from the SS model agrees:
pip(fit_ss)["gamma_delta2_(Intercept)",]
#> 0.98Both approaches correctly identify that one breakpoint is present.