BudgetIV: partial identification of causal effects with mostly invalid isntruments

Introduction

budgetIV provides a tuneable and interpretable method for relaxing the instrumental variables (IV) assumptions to infer treatment effects in the presence of unobserved confounding. For a pre-treatment covariate to be a valid IV, it must be (a) unconfounded with the outcome and (b) have a causal effect on the outcome that is exclusively mediated by the exposure. It is impossible to test the validity of these IV assumptions for any particular pre-treatment covariate; however, when different pre-treatment covariates give differing causal effect estimates if treated as IVs, then we know at least one of the covariates violates these assumptions. budgetIV exploits this fact by taking as input a minimum ‘’budget’’ of pre-treatment covariates assumed to be valid IVs. This can be extended to assuming a set of budgets for varying ‘’degrees’’ of validity set by the user and defined formally through a parameter that captures violation of either IV assumption. These budget constraints can be chosen using specialist knowledge or varied in a principled sensitivity analysis. budgetIV supports non-linear treatment effects and multi-dimensional treatments; requires only summary statistics rather than raw data; and can be used to construct confidence sets under a standard assumption from the Mendelian randomisation literature. With one-dimensional \(\Phi (X)\), a computationally-efficient variant Budget_IV_Scalar allows for use with thousands of pre-treatment covariates.

We assume a heterogenous treatment effect, implying the following structural causal model: \[Z := f_z (\epsilon_z),\] \[X := f_x (Z, \epsilon_x),\] \[Y = \theta \Phi (X) + g_y (Z, \epsilon_y).\] There may be association between \(\epsilon_y\) and \(\epsilon_z\), indicating a violation of the unconfoundedness assumption (a); and \(g_y\) may depend on \(Z\), indicating violation of exclusivity (b). With budgetIV, the user defines degrees of validity \(0 \leq \tau_1 \leq \tau_2 \leq \ldots \leq \tau_K\) that may apply to any candidate instrument \(Z_i\). If \(Z_i\) satisfies the \(j\)’th degree of validity, this means \(\lvert \mathrm{Cov} (g_y (Z, \epsilon_y), Z_i) \rvert \leq \tau_j\). Choosing \(\tau_1 = 0\) would demand some pre-treatment covariates give valid causal effect estimates, while choosing \(\tau_K = \infty\) would allow for some covariates to give arbitrarily biased causal effect estimates if treated as IVs. budgetIV will return the corresponding identified/confidence set over causal effects that agree with the budget constraints and with the user-input summary statistics: beta_y corresponding to \(\mathrm{Cov} (Y, Z)\) and beta_Phi corresponding to \(\mathrm{Cov} (\Phi (X), Z)\). Other regression coefficients such as odds ratios, hazard ratios or multicolinearity-adjusted regression coefficients may be used for beta_y and beta_Phi, but this also changes the interpretation of the \(\tau\)’s.

For further methodological details and theoretical results and advanced use cases, please refer to Penn et al. (2024) doi:10.48550/arXiv.2411.06913.

Installation

To install the development version from GitHub, using devtools, run:

devtools::install_github('jpenn2023/budgetivr')

library(budgetivr)

Examples

First, we calculate summary statistics from the example dataset:

data(simulated_data_budgetIV)

beta_y <- simulated_data_budgetIV$beta_y

beta_phi_1 <- simulated_data_budgetIV$beta_phi_1
beta_phi_2 <- simulated_data_budgetIV$beta_phi_2

beta_phi <- matrix(c(beta_phi_1, beta_phi_2), nrow = 2, byrow = TRUE)

delta_beta_y <- simulated_data_budgetIV$delta_beta_y

Then, we define the basis functions \[\Phi (X)\] and set background budget constraints:

phi_basis <- expression(x, x^2)

tau_vec = c(0)
b_vec = c(3)

Then, we define the baseline treatment \[x_0\] and the treatment values to calculate the average treatment effect over:

X_baseline <- list("x" = c(0))

x_vals <- seq(from = 0, to = 1, length.out = 500)

ATE_search_domain <- expand.grid("x" = x_vals)

Now we run budgetIV to partially identify the budget assignments and corresponding average causal effect bounds:

partial_identification_ATE <- budgetIV(beta_y = beta_y,
                                       beta_phi = beta_phi,
                                       phi_basis = phi_basis,
                                       tau_vec = tau_vec,
                                       b_vec = b_vec,
                                       ATE_search_domain = ATE_search_domain,
                                       X_baseline = X_baseline,
                                       delta_beta_y = delta_beta_y)