---
title: "Sensitivity Analysis of a Simple Hierarchical Model"
output: rmarkdown::html_vignette
bibliography: adjustr.bib
vignette: >
  %\VignetteIndexEntry{Sensitivity Analysis of a Simple Hierarchical Model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(dplyr)
library(rstan)
library(adjustr)
load("eightschools_model.rda")
```

## Introduction

This vignette walks through the process of performing sensitivity
analysis using the `adjustr` package for the classic introductory
hierarchical model: the "eight schools" meta-analysis from Chapter 5
of @bda3.

We begin by specifying and fitting the model, which should be familiar
to most users of Stan. 
```{r eval=F}
library(dplyr)
library(rstan)
library(adjustr)

model_code = "
data {
    int<lower=0> J;         // number of schools
    real y[J];              // estimated treatment effects
    real<lower=0> sigma[J]; // standard error of effect estimates
}
parameters {
    real mu;                // population treatment effect
    real<lower=0> tau;      // standard deviation in treatment effects
    vector[J] eta;          // unscaled deviation from mu by school
}
transformed parameters {
    vector[J] theta = mu + tau * eta;        // school treatment effects
}
model {
    eta ~ std_normal();
    y ~ normal(theta, sigma);
}"

model_d = list(J = 8,
               y = c(28,  8, -3,  7, -1,  1, 18, 12),
               sigma = c(15, 10, 16, 11,  9, 11, 10, 18))
eightschools_m = stan(model_code=model_code, chains=2, data=model_d,
                      warmup=500, iter=1000)
```

We plot the original estimates for each of the eight schools.
```{r}
plot(eightschools_m, pars="theta")
```

The model partially pools information, pulling the school-level treatment effects
towards the overall mean.

It is natural to wonder how much these estimates depend on certain aspects of
our model.  The individual and school treatment effects are assumed to follow a
normal distribution, and we have used a uniform prior on the population
parameters `mu` and `tau`.

The basic __adjustr__ workflow is as follows:

1. Use `make_spec` to specify the set of alternative model specifications you'd
like to fit.

2. Use `adjust_weights` to calculate importance sampling weights which
approximate the posterior of each alternative specification.

3. Use `summarize` and `spec_plot` to examine posterior quantities of interest
for each alternative specification, in order to assess the sensitivity of the
underlying model.

## Basic Workflow Example

First suppose we want to examine the effect of our choice of uniform prior
on `mu` and `tau`.  We begin by specifying an alternative model in which
these parameters have more informative priors. This just requires
passing the `make_spec` function the new sampling statements we'd like to 
use. These replace any in the original model (`mu` and `tau` have implicit
improper uniform priors, since the original model does not have any sampling
statements for them).
```{r}
spec = make_spec(mu ~ normal(0, 20), tau ~ exponential(5))
print(spec)
```

Then we compute importance sampling weights to approximate the posterior under
this alternative model.
```{r include=F}
adjusted = adjust_weights(spec, eightschools_m, keep_bad=TRUE)
```
```{r eval=F}
adjusted = adjust_weights(spec, eightschools_m)
```

The `adjust_weights` function returns a data frame
containing a summary of the alternative model and a list-column named `.weights`
containing the importance weights. The last row of the table by default 
corresponds to the original model specification. The table also includes the
diagnostic Pareto *k*-value. When this value exceeds 0.7, importance sampling is
unreliable, and by default `adjust_weights` discards weights with a Pareto *k*
above 0.7 (the respective rows in `adjusted` are kept, but the `weights` column 
is set to `NA_real_`).
```{r}
print(adjusted)
```

Finally, we can examine how these alternative priors have changed our posterior
inference. We use `summarize` to calculate these under the alternative model.
```{r}
summarize(adjusted, mean(mu), var(mu))
```
We see that the more informative priors have pulled the posterior distribution
of `mu` towards zero and made it less variable.

## Multiple Alternative Specifications
What if instead we are concerned about our distributional assumption on the
school treatment effects? We could probe this assumption by fitting a series of
models where `eta` had a Student's *t* distribution, with varying degrees of
freedom. 

The `make_spec` function handles this easily.
```{r}
spec = make_spec(eta ~ student_t(df, 0, 1), df=1:10)
print(spec)
```
Notice how we have parameterized the alternative sampling statement with 
a variable `df`, and then provided the values `df` takes in another argument
to `make_spec`.

As before, we compute importance sampling weights to approximate the posterior
under these alternative models. Here, for the purposes of illustration,
we are using `keep_bad=TRUE` to compute weights even when the Pareto _k_ diagnostic
value is above 0.7. In practice, the alternative models should be completely
re-fit in Stan.
```{r}
adjusted = adjust_weights(spec, eightschools_m, keep_bad=TRUE)
```
Now, `adjusted` has ten rows, one for each alternative model.
```{r}
print(adjusted)
```

To examine the impact of these model changes, we can plot the posterior for 
a quantity of interest versus the degrees of freedom for the *t* distribution.
The package provides the `spec_plot` function which takes an x-axis specification
parameter and a y-axis posterior quantity (which must evaluate to a single
number per posterior draw). The dashed line shows the posterior median 
under the original model.
```{r}
spec_plot(adjusted, df, mu)
spec_plot(adjusted, df, theta[3])
```

It appears that changing the distribution of `eta`/`theta` from normal to
*t* has a small effect on posterior inferences (although, as noted above,
these inferences are unreliable as _k_ > 0.7).

By default, the function plots an inner 80\% credible interval and an outer
95\% credible interval, but these can be changed by the user.

We can also measure the distance between the new and original posterior
marginals by using the special `wasserstein()` function available in 
`summarize()`:
```{r}
summarize(adjusted, wasserstein(mu))
```
As we would expect, the 1-Wasserstein distance decreases as the degrees of
freedom increase. In general, we can compute the _p_-Wasserstein distance
by passing an extra `p` parameter to `wasserstein()`.


###