This vignette describes how to use the tidybayes package
to extract tidy
data frames of draws from residuals of Bayesian models, and also acts as
a demo for the construction of randomized quantile residuals, a generic
form of residual applicable to a wide range of models, including
censored regressions and models with discrete response variables. For a
more general introduction to tidybayes, see
vignette("tidybayes").
The following libraries are required to run this vignette:
library(dplyr)
library(purrr)
library(tidyr)
library(ggdist)
library(tidybayes)
library(ggplot2)
library(cowplot)
library(rstan)
library(brms)
library(gganimate)
theme_set(theme_tidybayes() + panel_border())These options help Stan run faster:
We’re going to use a simple dataset to demonstrate residuals that can be applied to discrete and censored outcomes. We’ll generate data from a Normal distribution:
\[ \begin{align*} y^*_i &\sim \textrm{Normal}(0.5,1) \\ y^\textrm{lower}_i &= \lfloor{y^*_i}\rfloor \\ y^\textrm{upper}_i &= \lceil{y^*_i}\rceil \end{align*} \] Or we can write the vectorized version of the model:
\[ \begin{align*} \mathbf{y}^* &\sim \textrm{Normal}(0.5,1) \\ \mathbf{y}^\textrm{lower} &= \lfloor{\mathbf{y}^*}\rfloor \\ \mathbf{y}^\textrm{upper} &= \lceil{\mathbf{y}^*}\rceil \end{align*} \]
We don’t observe each \(y^*_i\), only \(y^\textrm{lower}_i\) and \(y^\textrm{upper}_i\), i.e., the upper and lower bounds of an interval in which \(y^*_i\) lies. This is interval-censored data.
We can generate it as follows:
set.seed(4118)
n = 100
cens_df =
  tibble(
    y_star = rnorm(n, 0.5, 1),
    y_lower = floor(y_star),
    y_upper = ceiling(y_star),
    censoring = "interval"
  )A snapshot of the data looks like this:
| y_star | y_lower | y_upper | censoring | 
|---|---|---|---|
| 0.1799335 | 0 | 1 | interval | 
| 1.0524210 | 1 | 2 | interval | 
| 2.0469207 | 2 | 3 | interval | 
| -0.5123964 | -1 | 0 | interval | 
| 0.0323215 | 0 | 1 | interval | 
| 1.1841464 | 1 | 2 | interval | 
| -0.7074085 | -1 | 0 | interval | 
| 0.1163080 | 0 | 1 | interval | 
| 0.1826247 | 0 | 1 | interval | 
| 0.3851024 | 0 | 1 | interval | 
This is a typical tidy format data frame: one observation per row.
A graphical depiction of the censoring process might be something like this:
uncensored_plot = cens_df %>%
  ggplot(aes(y = "", x = y_star)) +
  stat_slab() +
  geom_jitter(aes(y = 0.75, color = ordered(y_lower)), position = position_jitter(height = 0.2), show.legend = FALSE) +
  ylab(NULL) +
  scale_x_continuous(breaks = -4:4, limits = c(-4, 4)) +
  background_grid("x")
censored_plot = cens_df %>%
  ggplot(aes(y = "", x = (y_lower + y_upper)/2)) +
  geom_dotplot(
    aes(fill = ordered(y_lower)),
    method = "histodot", origin = -4, binwidth = 1, dotsize = 0.5, stackratio = .8, show.legend = FALSE,
    stackgroups = TRUE, binpositions = "all", color = NA
  ) +
  geom_segment(
    aes(x = y + 0.5, xend = y + 0.5, y = 1.75, yend = 1.5, color = ordered(y)),
    data = data.frame(y = unique(cens_df$y_lower)), show.legend = FALSE,
    arrow = arrow(type = "closed", length = unit(7, "points")), linewidth = 1
  ) +
  ylab(NULL) +
  xlab("interval-censored y") +
  scale_x_continuous(breaks = -4:4, limits = c(-4, 4)) +
  background_grid("x")
plot_grid(align = "v", ncol = 1, rel_heights = c(1, 2.5),
  uncensored_plot,
  censored_plot
)y_starBefore we fit a model to the censored data, let’s see what an ideal
model looks like: we’ll fit a model to y_star. With
censored or discrete data in the real-world, we would not be able to fit
such a model, because we would not have observations of
y_star, only the intervals. But this might help us
understand what we’re looking for in a fit that’s working.
m_ideal = brm(
  y_star ~ 1, 
  data = cens_df, 
  family = student,
  
  file = "models/tidybayes-residuals_m_ideal.rds"  # cache model (can be removed)
)The results look like this:
##  Family: student 
##   Links: mu = identity; sigma = identity; nu = identity 
## Formula: y_star ~ 1 
##    Data: cens_df (Number of observations: 100) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.52      0.11     0.31     0.72 1.00     3247     2968
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.02      0.09     0.84     1.20 1.00     2666     1782
## nu       21.21     13.28     5.31    55.04 1.00     2671     2041
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).add_residual_draws() operates much like
add_epred_draws() and add_predicted_draws():
it gives us draws from the posterior distribution for each residual:
cens_df %>%
  add_residual_draws(m_ideal) %>%
  ggplot(aes(x = .row, y = .residual)) +
  stat_pointinterval()This is a typical residual plot, though with the addition of the uncertainty associated with each residual afforded by generating draws from the distribution associated with each residual.
Let’s check the QQ plot:
cens_df %>%
  add_residual_draws(m_ideal) %>%
  median_qi() %>%
  ggplot(aes(sample = .residual)) +
  geom_qq() +
  geom_qq_line()This is a typical way of looking at residuals. However, this type of residual will not generalize to discrete and censored models, so let’s consider something else…
To derive a more general form of residual, we will use the posterior predictive distribution, \(\mathbf{y}^{*\textrm{rep}}|\mathbf{y}^*\). This gives us the model’s predictive distribution for a replication of \(\mathbf{y}^*\), denoted \(\mathbf{y}^{*\textrm{rep}}\). In particular, for each observation, we will calculate the predicted probability of seeing an observation less than or equal to the observation we actually got:
\[
p^{*\textrm{resid}}_i = P(y^{*\textrm{rep}}_i \le y^*_i|\mathbf{y}^*)
\] This is one type of Bayesian posterior predictive
p-value; I will call it a probability residual
(p_residual) by analogy to quantile residuals. If the
predictive distribution is well-calibrated, these probabilities should
be uniform (in general these will not be quite uniform due to the double
use of \(\mathbf{y}*\); a future
iteration of this article will demonstrate the use of leave-one-out
posterior predictive distributions to address that issue).
If we apply the inverse cumulative distribution function (CDF) of the
standard Normal distribution to these probability residuals, the result
should be approximately standard Normal. These are quantile
residuals (z_residual):
\[ z^{*\textrm{resid}}_i = F_{\textrm{Normal}}^{-1}(p^{*\textrm{resid}}_i) \]
cens_df %>%
  add_predicted_draws(m_ideal) %>%
  summarise(
    p_residual = mean(.prediction < y_star),
    z_residual = qnorm(p_residual),
    .groups = "drop_last"
  ) %>%
  ggplot(aes(sample = z_residual)) +
  geom_qq() +
  geom_abline()This QQ plot shows us the model fits okay.
Assuming a situation where we do not have access to the original
(uncensored) data, we will perform a regression on the interval-censored
data. In brms, we can do this by specifying the lower
(y_lower) and upper (y_upper) bounds on each
observation, and the censoring type (censoring), which in
this case is "interval" for all observations.
m = brm(
  y_lower | cens(censoring, y_upper) ~ 1, 
  data = cens_df,
  
  file = "models/tidybayes-residuals_m.rds"  # cache model (can be removed)
)The results look like this:
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y_lower | cens(censoring, y_upper) ~ 1 
##    Data: cens_df (Number of observations: 100) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.56      0.11     0.35     0.78 1.00     3419     2451
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.09      0.08     0.94     1.26 1.00     2720     2243
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).Again, add_residual_draws() lets us look at the
residuals:
## Warning: Results may not be meaningful for censored models.But notice the horizontal striations due to censoring. This obscures
patterns we might wish to see in the diagnostic plots. Patterns like
this often show up in censored regression, logistic regression, Poisson
regression, and other discrete regression models. Notice how
brms also gives us a warning message that the residuals may
not be meaningful due to censoring.
Let’s look at the QQ plot:
cens_df %>%
  add_residual_draws(m) %>%
  median_qi(.residual) %>%
  ggplot(aes(sample = .residual)) +
  geom_qq() +
  geom_qq_line()## Warning: Results may not be meaningful for censored models.Again, the striations are making interpretation difficult. But there is hope!
We can extend the ideas of probability residuals to make randomized probability residuals.
Where the probability residual for the uncensored model was something like this:
\[ p^{*\textrm{resid}}_i = P(y^{*\textrm{rep}}_i < y^*_i|\mathbf{y}^*) \] We can’t calculate the residual for a censored observation this way, as we do not know what \(\mathbf{y}^*\) is, only \(\mathbf{y}^\textrm{lower}\) and \(\mathbf{y}^\textrm{upper}\). However, we do know that \(p^{*\textrm{resid}}_i\) should be in the following interval:
\[ \begin{align*} && p^{*\textrm{resid}}_i &\in (p^\textrm{lower}_i, p^\textrm{upper}_i] \\ &\textrm{where}& p^\textrm{lower}_i &= P(y^{*\textrm{rep}}_i < y^\textrm{lower}_i|\mathbf{y}^\textrm{lower},\mathbf{y}^\textrm{upper}) \\ && p^\textrm{upper}_i &= P(y^{*\textrm{rep}}_i \le y^\textrm{upper}_i|\mathbf{y}^\textrm{lower},\mathbf{y}^\textrm{upper}) \end{align*} \]
If we define a randomized probability residual as:
\[ p^{\textrm{resid}}_i \sim \textrm{Uniform}(p^\textrm{lower}_i, p^\textrm{upper}_i) \]
Then these residuals will be uniformly random (see Dunn and Smyth 1996), and we can similarly define a randomized quantile residual by passing the probability residuals through the Normal inverse CDF:
\[ z^{\textrm{resid}}_i = F_{\textrm{Normal}}^{-1}(p^{\textrm{resid}}_i) \]
In code, that looks like this:
cens_df %>%
  add_predicted_draws(m) %>%
  summarise(
    p_lower = mean(.prediction < y_lower),
    p_upper = mean(.prediction < y_upper),
    p_residual = runif(1, p_lower, p_upper),
    z_residual = qnorm(p_residual),
    .groups = "drop_last"
  ) %>%
  ggplot(aes(x = .row, y = z_residual)) +
  geom_point()The striations are gone! And a QQ plot:
cens_df %>%
  add_predicted_draws(m) %>%
  summarise(
    p_lower = mean(.prediction < y_lower),
    p_upper = mean(.prediction < y_upper),
    p_residual = runif(1, p_lower, p_upper),
    z_residual = qnorm(p_residual),
    .groups = "drop_last"
  ) %>%
  ggplot(aes(sample = z_residual)) +
  geom_qq() +
  geom_abline()Looks decent. However, our residuals are now subject to noise—we actually have a distribution of possible QQ plots. So, let’s check on some of them to make sure this nice-looking one is not just a fluke. We’ll employ HOPs to do that:
# NOTE: ordinarily I would use a large number of frames (k), 
# say 50 or 100, but I am keeping it small for the sake of 
# keeping these examples small
k = 10
p = cens_df %>%
  add_predicted_draws(m) %>%
  summarise(
    p_lower = mean(.prediction < y_lower),
    p_upper = mean(.prediction < y_upper),
    p_residual = list(runif(k, p_lower, p_upper)),
    residual_draw = list(1:k),
    .groups = "drop_last"
  ) %>%
  unnest(c(p_residual, residual_draw)) %>%
  mutate(z_residual = qnorm(p_residual)) %>%
  ggplot(aes(sample = z_residual)) +
  geom_qq() +
  geom_abline() +
  transition_manual(residual_draw)
animate(p, nframes = k, width = 384, height = 384, res = 96, dev = "png", type = "cairo")Looks good.
Let’s instead generate the data from a t distribution, which should give us some more extreme values:
set.seed(41181)
n = 100
cens_df_t =
  tibble(
    y = rt(n, 3) + 0.5,
    y_lower = floor(y),
    y_upper = ceiling(y),
    censoring = "interval"
  )Note the presence of outliers:
uncensored_plot = cens_df_t %>%
  ggplot(aes(y = "", x = y)) +
  stat_slab() +
  geom_jitter(aes(y = 0.75, color = ordered(y_lower)), position = position_jitter(height = 0.2), show.legend = FALSE) +
  ylab(NULL) +
  scale_x_continuous(breaks = -10:10, limits = c(-10, 10)) +
  background_grid("x")
censored_plot = cens_df_t %>%
  ggplot(aes(y = "", x = (y_lower + y_upper)/2)) +
  geom_dotplot(
    aes(fill = ordered(y_lower)),
    method = "histodot", origin = -4, binwidth = 1, dotsize = 0.5, stackratio = .8, show.legend = FALSE,
    stackgroups = TRUE, binpositions = "all", color = NA
  ) +
  geom_segment(
    aes(x = y + 0.5, xend = y + 0.5, y = 1.75, yend = 1.5, color = ordered(y)),
    data = data.frame(y = unique(cens_df_t$y_lower)), show.legend = FALSE,
    arrow = arrow(type = "closed", length = unit(7, "points")), linewidth = 1
  ) +
  ylab(NULL) +
  xlab("interval-censored y") +
  scale_x_continuous(breaks = -10:10, limits = c(-10, 10)) +
  background_grid("x")
plot_grid(align = "v", ncol = 1, rel_heights = c(1, 2.25),
  uncensored_plot,
  censored_plot
)Now, fitting the same model as before:
m_t1 = brm(
  y_lower | cens(censoring, y_upper) ~ 1,
  data = cens_df_t,
  file = "models/tidybayes-residuals_m_t1"  # cache model (can be removed)
)And checking the QQ plot:
cens_df_t %>%
  add_residual_draws(m_t1) %>%
  median_qi(.residual) %>%
  ggplot(aes(sample = .residual)) +
  geom_qq() +
  geom_qq_line()## Warning: Results may not be meaningful for censored models.It looks like this could be problematic, but the striations make diagnosing the problem difficult.
Let’s try randomized quantile residuals again:
cens_df_t %>%
  add_predicted_draws(m_t1) %>%
  summarise(
    p_lower = mean(.prediction < y_lower),
    p_upper = mean(.prediction < y_upper),
    p_residual = runif(1, p_lower, p_upper),
    z_residual = qnorm(p_residual),
    .groups = "drop_last"
  ) %>%
  ggplot(aes(sample = z_residual)) +
  geom_qq() +
  geom_abline()## Warning: Removed 1 rows containing non-finite values (`stat_qq()`).Does not look good for the model—this s-shape is characteristic of excess kurtosis (aka fat tails), which is exactly what a t distribution should produce. So let’s try a different model.
A common model to try to “robustify” linear regression is to swap out the Normal response distribution for a Student’s t distribution, which has fatter tails (it also happens that this is the distribution used to generate the data above, so I am clearly cheating, but also: this had better work).
m_t2 = brm(
  y_lower | cens(censoring, y_upper) ~ 1, 
  data = cens_df_t,
  family = student,
  
  file = "models/tidybayes-residuals_m_t2.rds"  # cache model (can be removed)
)Let’s look at the QQ plot of the residuals:
cens_df_t %>%
  add_residual_draws(m_t2) %>%
  median_qi(.residual) %>%
  ggplot(aes(sample = .residual)) +
  geom_qq() +
  geom_qq_line()## Warning: Results may not be meaningful for censored models.Odd, these don’t look great! But again, these residuals are not to be trusted…
Let’s check out the randomized quantile residuals:
cens_df_t %>%
  add_predicted_draws(m_t2) %>%
  summarise(
    p_lower = mean(.prediction < y_lower),
    p_upper = mean(.prediction < y_upper),
    p_residual = runif(1, p_lower, p_upper),
    z_residual = qnorm(p_residual),
    .groups = "drop_last"
  ) %>%
  ggplot(aes(sample = z_residual)) +
  geom_qq() +
  geom_abline()Looking pretty good! As well they should, since this is the same model that generated the data.
But we’re not done yet—another sensible choice of model for this data might have been an ordinal regression. So let’s try that too.
First, we’ll add a column that transforms the data into an ordered factor:
Then we’ll fit an ordinal regression model. We have to adjust the fit parameters a bit to get it to converge:
m_o = brm(
  y_factor ~ 1, 
  data = cens_df_o, 
  family = cumulative, 
  prior = prior(normal(0, 10), class = Intercept), 
  control = list(adapt_delta = 0.99),
  
  file = "models/tidybayes-residuals_m_o.rds"  # cache model (can be removed)
)Now, let’s check out those residuals:
cens_df_o %>%
  add_residual_draws(m_o) %>%
  median_qi(.residual) %>%
  ggplot(aes(sample = .residual)) +
  geom_qq() +
  geom_qq_line()## Error: Predictive errors are not defined for ordinal or categorical models.Gah! brms will not even produce residuals for models
like these. This is very sensible, as standard residuals here are hard
to define. Fortunately, we can still construct randomized quantile
residuals.
For the interval-censored model, we had:
\[ \begin{align*} && p^\textrm{lower}_i &= P(y^{*\textrm{rep}}_i < y^\textrm{lower}_i|\mathbf{y}^\textrm{lower},\mathbf{y}^\textrm{upper}) \\ && p^\textrm{upper}_i &= P(y^{*\textrm{rep}}_i \le y^\textrm{upper}_i|\mathbf{y}^\textrm{lower},\mathbf{y}^\textrm{upper}) \\ && p^{\textrm{resid}}_i &\sim \textrm{Uniform}(p^\textrm{lower}_i, p^\textrm{upper}_i) \end{align*} \]
But that was for a model with observations of the form \((\mathbf{y}^\textrm{lower}, \mathbf{y}^\textrm{upper}]\) and which can produce a posterior predictive distribution for the latent variable \(\mathbf{y}^{*\textrm{rep}}|\mathbf{y}^\textrm{lower}, \mathbf{y}^\textrm{upper}\). Now our observations are discrete values of \(\mathbf{y}^\textrm{factor}\), and our posterior predictive distribution, \(\mathbf{y}^\textrm{rep}|\mathbf{y}^\textrm{factor}\), is also discrete. So we have to tweak the definition a bit:
\[ \begin{align*} && p^\textrm{lower}_i &= P(y^{\textrm{rep}}_i < y^\textrm{factor}_i|\mathbf{y}^\textrm{factor}) \\ && p^\textrm{upper}_i &= P(y^{\textrm{rep}}_i \le y^\textrm{factor}_i|\mathbf{y}^\textrm{factor}) \\ && p^{\textrm{resid}}_i &\sim \textrm{Uniform}(p^\textrm{lower}_i, p^\textrm{upper}_i) \end{align*} \]
Then we proceed as before:
cens_df_o %>%
  add_predicted_draws(m_o) %>%
  mutate(.prediction = ordered(levels(y_factor)[.prediction], levels = levels(y_factor))) %>%
  summarise(
    p_lower = mean(.prediction < y_factor),
    p_upper = mean(.prediction <= y_factor),
    p_residual = runif(1, p_lower, p_upper),
    z_residual = qnorm(p_residual),
    .groups = "drop_last"
  ) %>%
  ggplot(aes(sample = z_residual)) +
  geom_qq() +
  geom_abline()It is not too surprising that the ordinal regression also works here, even though it is not the data generating model. Ordinal regressions are pretty robust.
Here’s a function that can calculate randomized probability residuals
for you. A future version of tidybayes will likely supply
some variant of this function, but for now, since it is experimental, I
have not folded it into the main package yet. Use at your own risk:
library(rlang)
make_probability_residuals = function(data, prediction, y, y_upper = NA, n = 1) {
  .prediction = enquo(prediction)
  .y = enquo(y)
  .y_upper = enquo(y_upper)
  if (eval_tidy(expr(is.factor(!!.prediction) && !is.ordered(!!.prediction)), data)) {
    data = mutate(data, !!.prediction := ordered(!!.prediction, levels = levels(!!.prediction)))
  }
  
  if (is.na(get_expr(.y_upper))) {
    #no y_upper provided, use y as y_upper
    data = summarise(data,
      .p_lower = mean(!!.prediction < !!.y),
      .p_upper = mean(!!.prediction <= !!.y),
      .groups = "drop_last"
    )
  } else {
    #y_upper should be a vector, and if an entry in it is NA, use the entry from y
    data = summarise(data,
      .p_lower = mean(!!.prediction < !!.y),
      .p_upper = mean(!!.prediction <= ifelse(is.na(!!.y_upper), !!.y, !!.y_upper)),
      .groups = "drop_last"
    )
  }
  
  data %>%
    mutate(
      .p_residual = map2(.p_lower, .p_upper, runif, n = !!n),
      .residual_draw = map(.p_residual, seq_along)
    ) %>%
    unnest(c(.p_residual, .residual_draw)) %>%
    mutate(.z_residual = qnorm(.p_residual))
}Consider a simple dataset with a single binary outcome:
A model for this might be the following:
m_bin = brm(
  y ~ 1, 
  data = bin_df, 
  family = bernoulli,
  
  file = "models/tidybayes-residuals_m_bin.rds"  # cache model (can be removed)
)Again, standard residuals are hard to interpret:
bin_df %>%
  add_residual_draws(m_bin) %>%
  median_qi() %>%
  ggplot(aes(sample = .residual)) +
  geom_qq() +
  geom_qq_line()Randomized quantile residuals help us see that the fit is probably reasonable. In this case (just to switch it up), we’ll look at the probability residuals instead, comparing them against a uniform distribution, like so:
# NOTE: ordinarily I would use a large number of frames (k), 
# say 50 or 100, but I am keeping it small for the sake of 
# keeping these examples small
k = 10
p = bin_df %>%
  add_predicted_draws(m_bin) %>%
  make_probability_residuals(.prediction, y, n = k) %>%
  ggplot(aes(sample = .p_residual)) +
  geom_qq(distribution = qunif) +
  geom_abline() +
  transition_manual(.residual_draw)
animate(p, nframes = k, width = 384, height = 384, res = 96, dev = "png", type = "cairo")