Title: | Highly Adaptive Lasso Conditional Density Estimation |
Version: | 0.2.3 |
Maintainer: | Nima Hejazi <nh@nimahejazi.org> |
Description: | An algorithm for flexible conditional density estimation based on application of pooled hazard regression to an artificial repeated measures dataset constructed by discretizing the support of the outcome variable. To facilitate non/semi-parametric estimation of the conditional density, the highly adaptive lasso, a nonparametric regression function shown to reliably estimate a large class of functions at a fast convergence rate, is utilized. The pooled hazards data augmentation formulation implemented was first described by Díaz and van der Laan (2011) <doi:10.2202/1557-4679.1356>. To complement the conditional density estimation utilities, tools for efficient nonparametric inverse probability weighted (IPW) estimation of the causal effects of stochastic shift interventions (modified treatment policies), directly utilizing the density estimation technique for construction of the generalized propensity score, are provided. These IPW estimators utilize undersmoothing (sieve estimation) of the conditional density estimators in order to achieve the non/semi-parametric efficiency bound. |
Depends: | R (≥ 3.2.0) |
Imports: | stats, utils, dplyr, tibble, ggplot2, data.table, matrixStats, future.apply, assertthat, hal9001 (≥ 0.4.1), origami (≥ 1.0.3), rsample, rlang, scales, Rdpack |
Suggests: | testthat, knitr, rmarkdown, stringr, covr, future |
License: | MIT + file LICENSE |
URL: | https://github.com/nhejazi/haldensify |
BugReports: | https://github.com/nhejazi/haldensify/issues |
Encoding: | UTF-8 |
VignetteBuilder: | knitr |
RoxygenNote: | 7.1.2 |
RdMacros: | Rdpack |
NeedsCompilation: | no |
Packaged: | 2022-02-09 21:48:47 UTC; nsh |
Author: | Nima Hejazi |
Repository: | CRAN |
Date/Publication: | 2022-02-09 22:20:06 UTC |
Confidence Intervals for IPW Estimates of the Causal Effects of Stochatic Shift Interventions
Description
Confidence Intervals for IPW Estimates of the Causal Effects of Stochatic Shift Interventions
Usage
## S3 method for class 'ipw_haldensify'
confint(object, parm = seq_len(object$psi), level = 0.95, ...)
Arguments
object |
An object of class |
parm |
A |
level |
A |
... |
Other arguments. Not currently used. |
Details
Compute confidence intervals for estimates produced by
ipw_shift
.
Value
A named numeric
vector containing the parameter estimate from
a ipw_haldensify
object, alongside lower/upper Wald-style confidence
intervals at a specified coverage level.
Examples
# simulate data
n_obs <- 50
W1 <- rbinom(n_obs, 1, 0.6)
W2 <- rbinom(n_obs, 1, 0.2)
A <- rnorm(n_obs, (2 * W1 - W2 - W1 * W2), 2)
Y <- rbinom(n_obs, 1, plogis(3 * A + W1 + W2 - W1 * W2))
# fit the IPW estimator
est_ipw_shift <- ipw_shift(
W = cbind(W1, W2), A = A, Y = Y,
delta = 0.5, n_bins = 3L, cv_folds = 2L,
lambda_seq = exp(seq(-1, -10, length = 100L)),
# arguments passed to hal9001::fit_hal()
max_degree = 1,
# ...continue arguments for IPW
undersmooth_type = "gcv"
)
confint(est_ipw_shift)
HAL Conditional Density Estimation in a Cross-validation Fold
Description
HAL Conditional Density Estimation in a Cross-validation Fold
Usage
cv_haldensify(
fold,
long_data,
wts = rep(1, nrow(long_data)),
lambda_seq = exp(seq(-1, -13, length = 1000L)),
smoothness_orders = 0L,
...
)
Arguments
fold |
Object specifying cross-validation folds as generated by a call
to |
long_data |
A |
wts |
A |
lambda_seq |
A |
smoothness_orders |
A |
... |
Additional (optional) arguments of |
Details
Estimates the conditional density of A|W for a subset of the full
set of observations based on the inputted structure of the cross-validation
folds. This is a helper function intended to be used to select the optimal
value of the penalization parameter for the highly adaptive lasso estimates
of the conditional hazard (via cross_validate
). The
Value
A list
, containing density predictions, observations IDs,
observation-level weights, and cross-validation indices for conditional
density estimation on a single fold of the overall data.
IPW Estimator Selector via Projection of Efficient Influence Function
Description
IPW Estimator Selector via Projection of Efficient Influence Function
Usage
dcar_selector(
W,
A,
Y,
delta = 0,
gn_pred_natural,
gn_pred_shifted,
Qn_pred_natural,
Qn_pred_shifted
)
Arguments
W |
A |
A |
A |
Y |
A |
delta |
A |
gn_pred_natural |
A |
gn_pred_shifted |
A |
Qn_pred_natural |
A |
Qn_pred_shifted |
A |
Fit Conditional Density Estimation for a Sequence of HAL Models
Description
Fit Conditional Density Estimation for a Sequence of HAL Models
Usage
fit_haldensify(
A,
W,
wts = rep(1, length(A)),
grid_type = "equal_range",
n_bins = round(c(0.5, 1, 1.5, 2) * sqrt(length(A))),
cv_folds = 5L,
lambda_seq = exp(seq(-1, -13, length = 1000L)),
smoothness_orders = 0L,
...
)
Arguments
A |
The |
W |
A |
wts |
A |
grid_type |
A |
n_bins |
This |
cv_folds |
A |
lambda_seq |
A |
smoothness_orders |
A |
... |
Additional (optional) arguments of |
Details
Estimation of the conditional density of A|W via a cross-validated highly adaptive lasso, used to estimate the conditional hazard of failure in a given bin over the support of A.
Value
A list
, containing density predictions for the sequence of
fitted HAL models; the index and value of the L1 regularization parameter
minimizing the density loss; and the sequence of empirical risks for the
sequence of fitted HAL models.
Examples
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5)
n_train <- 50
w <- runif(n_train, -4, 4)
a <- rnorm(n_train, w, 0.5)
# fit cross-validated HAL-based density estimator of A|W
haldensify_cvfit <- fit_haldensify(
A = a, W = w, n_bins = 10L, lambda_seq = exp(seq(-1, -10, length = 100)),
# the following arguments are passed to hal9001::fit_hal()
max_degree = 3, reduce_basis = 1 / sqrt(length(a))
)
Generate Augmented (Long Format) Data for Pooled Hazards Regression
Description
Generate Augmented (Long Format) Data for Pooled Hazards Regression
Usage
format_long_hazards(
A,
W,
wts = rep(1, length(A)),
grid_type = c("equal_range", "equal_mass"),
n_bins = NULL,
breaks = NULL
)
Arguments
A |
The |
W |
A |
wts |
A |
grid_type |
A |
n_bins |
Only used if |
breaks |
A |
Details
Generates an augmented (long format, or repeated measures) dataset that includes multiple records for each observation, a single record for each discretized bin up to and including the bin in which a given observed value of A falls. Such bins are derived from selecting break points over the support of A. This repeated measures dataset is suitable for estimating the hazard of failing in a particular bin over A using a highly adaptive lasso (or other) classification model.
Value
A list
containing the break points used in dividing the
support of A
into discrete bins, the length of each bin, and the
reformatted data. The reformatted data is a data.table
of
repeated measures data, with an indicator for which bin an observation
fails in, the bin ID, observation ID, values of W
for each given
observation, and observation-level weights.
IPW Estimator Selector via Global Cross-Validation
Description
IPW Estimator Selector via Global Cross-Validation
Usage
gcv_selector(
W,
A,
Y,
delta = 0,
gn_pred_natural,
gn_pred_shifted,
Qn_pred_natural,
Qn_pred_shifted
)
Arguments
W |
A |
A |
A |
Y |
A |
delta |
A |
gn_pred_natural |
A |
gn_pred_shifted |
A |
Qn_pred_natural |
A |
Qn_pred_shifted |
A |
Cross-validated HAL Conditional Density Estimation
Description
Cross-validated HAL Conditional Density Estimation
Usage
haldensify(
A,
W,
wts = rep(1, length(A)),
grid_type = "equal_range",
n_bins = round(c(0.5, 1, 1.5, 2) * sqrt(length(A))),
cv_folds = 5L,
lambda_seq = exp(seq(-1, -13, length = 1000L)),
smoothness_orders = 0L,
hal_basis_list = NULL,
...
)
Arguments
A |
The |
W |
A |
wts |
A |
grid_type |
A |
n_bins |
This |
cv_folds |
A |
lambda_seq |
A |
smoothness_orders |
A |
hal_basis_list |
A |
... |
Additional (optional) arguments of |
Details
Estimation of the conditional density A|W through using the highly adaptive lasso to estimate the conditional hazard of failure in a given bin over the support of A. Cross-validation is used to select the optimal value of the penalization parameters, based on minimization of the weighted log-likelihood loss for a density.
Value
Object of class haldensify
, containing a fitted
hal9001
object; a vector of break points used in binning A
over its support W
; sizes of the bins used in each fit; the tuning
parameters selected by cross-validation; the full sequence (in lambda) of
HAL models for the CV-selected number of bins and binning strategy; and
the range of A
.
Note
Parallel evaluation of the cross-validation procedure to select tuning
parameters for density estimation may be invoked via the framework exposed
in the future ecosystem. Specifically, set plan
for future_mapply
to be used internally.
Examples
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5)
set.seed(429153)
n_train <- 50
w <- runif(n_train, -4, 4)
a <- rnorm(n_train, w, 0.5)
# learn relationship A|W using HAL-based density estimation procedure
haldensify_fit <- haldensify(
A = a, W = w, n_bins = 10L, lambda_seq = exp(seq(-1, -10, length = 100)),
# the following arguments are passed to hal9001::fit_hal()
max_degree = 3, reduce_basis = 1 / sqrt(length(a))
)
IPW Estimates of the Causal Effects of Stochatic Shift Interventions
Description
IPW Estimates of the Causal Effects of Stochatic Shift Interventions
Usage
ipw_shift(
W,
A,
Y,
delta,
n_bins = make_bins(A, "hist"),
cv_folds = 10L,
lambda_seq,
...,
bin_type = c("equal_range", "equal_mass"),
trim_density = FALSE,
undersmooth_type = c("dcar", "plateau", "gcv", "all"),
bootstrap = FALSE,
n_boot = 1000L
)
Arguments
W |
A |
A |
A |
Y |
A |
delta |
A |
n_bins |
A |
cv_folds |
A |
lambda_seq |
A |
... |
Additional arguments for model fitting to be passed directly to
|
bin_type |
A |
trim_density |
A |
undersmooth_type |
A |
bootstrap |
A |
n_boot |
A |
Examples
# simulate data
n_obs <- 50
W1 <- rbinom(n_obs, 1, 0.6)
W2 <- rbinom(n_obs, 1, 0.2)
A <- rnorm(n_obs, (2 * W1 - W2 - W1 * W2), 2)
Y <- rbinom(n_obs, 1, plogis(3 * A + W1 + W2 - W1 * W2))
# fit the IPW estimator
est_ipw_shift <- ipw_shift(
W = cbind(W1, W2), A = A, Y = Y,
delta = 0.5, n_bins = 3L, cv_folds = 2L,
lambda_seq = exp(seq(-1, -10, length = 100L)),
# arguments passed to hal9001::fit_hal()
max_degree = 1,
# ...continue arguments for IPW
undersmooth_type = "gcv"
)
Histogram Binning Procedures for Pooled Hazards Regression
Description
Histogram Binning Procedures for Pooled Hazards Regression
Usage
make_bins(grid_var, grid_type = c("hist", "scaled"), max_bins = 30L)
Arguments
grid_var |
The |
grid_type |
A |
max_bins |
A |
Map Predicted Hazard to Predicted Density for a Single Observation
Description
Map Predicted Hazard to Predicted Density for a Single Observation
Usage
map_hazard_to_density(hazard_pred_single_obs)
Arguments
hazard_pred_single_obs |
A |
Details
For a single observation, map a predicted hazard of failure (as an occurrence in a particular bin, under a given partitioning of the support) to a density.
Value
A matrix
composed of a single row and a number of columns
specified by the grid of penalization parameters used in fitting of the
highly adaptive lasso. This is the predicted conditional density for a
given observation, re-mapped from the hazard scale.
IPW Estimator Selector Using Lepski's Plateau Method for the MSE
Description
IPW Estimator Selector Using Lepski's Plateau Method for the MSE
Usage
plateau_selector(
W,
A,
Y,
delta = 0,
gn_pred_natural,
gn_pred_shifted,
gn_fit_haldensify,
Qn_pred_natural,
Qn_pred_shifted,
cv_folds = 10L,
gcv_mult = 50L,
bootstrap = FALSE,
n_boot = 1000L,
...
)
Arguments
W |
A |
A |
A |
Y |
A |
delta |
A |
gn_pred_natural |
A |
gn_pred_shifted |
A |
gn_fit_haldensify |
An object of class |
Qn_pred_natural |
A |
Qn_pred_shifted |
A |
cv_folds |
A |
gcv_mult |
TODO |
bootstrap |
A |
n_boot |
A |
... |
Additional arguments for model fitting to be passed directly to
|
Plot Method for HAL Conditional Density Estimates
Description
Plot Method for HAL Conditional Density Estimates
Usage
## S3 method for class 'haldensify'
plot(x, ..., type = c("risk", "density"))
Arguments
x |
Object of class |
... |
Additional arguments to be passed |
type |
A |
Value
Object of class ggplot
containing a plot of the desired
type
.
Examples
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5)
n_train <- 50
w <- runif(n_train, -4, 4)
a <- rnorm(n_train, w, 0.5)
# learn relationship A|W using HAL-based density estimation procedure
haldensify_fit <- haldensify(
A = a, W = w, n_bins = 3,
lambda_seq = exp(seq(-1, -10, length = 50)),
# the following arguments are passed to hal9001::fit_hal()
max_degree = 3, reduce_basis = 0.1
)
plot(haldensify_fit)
Prediction Method for HAL Conditional Density Estimation
Description
Prediction Method for HAL Conditional Density Estimation
Usage
## S3 method for class 'haldensify'
predict(
object,
...,
new_A,
new_W,
trim = TRUE,
trim_min = NULL,
lambda_select = c("cv", "undersmooth", "all")
)
Arguments
object |
An object of class |
... |
Additional arguments passed to |
new_A |
The |
new_W |
A |
trim |
A |
trim_min |
A |
lambda_select |
A |
Details
Method for computing and extracting predictions of the conditional
density estimates based on the highly adaptive lasso estimator, returned as
an S3 object of class haldensify
from haldensify
.
Value
A numeric
vector of predicted conditional density values from
a fitted haldensify
object.
Examples
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5)
n_train <- 50
w <- runif(n_train, -4, 4)
a <- rnorm(n_train, w, 0.5)
# HAL-based density estimator of A|W
haldensify_fit <- haldensify(
A = a, W = w, n_bins = 10L, lambda_seq = exp(seq(-1, -10, length = 100)),
# the following arguments are passed to hal9001::fit_hal()
max_degree = 3, reduce_basis = 1 / sqrt(length(a))
)
# predictions to recover conditional density of A|W
new_a <- seq(-4, 4, by = 0.1)
new_w <- rep(0, length(new_a))
pred_dens <- predict(haldensify_fit, new_A = new_a, new_W = new_w)
Print: Highly Adaptive Lasso Conditional Density Estimates
Description
Print: Highly Adaptive Lasso Conditional Density Estimates
Usage
## S3 method for class 'haldensify'
print(x, ...)
Arguments
x |
An object of class |
... |
Other options (not currently used). |
Details
The print
method for objects of class haldensify
Value
None. Called for the side effect of printing an informative summary
of slots of objects of class haldensify
.
Examples
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5)
set.seed(429153)
n_train <- 50
w <- runif(n_train, -4, 4)
a <- rnorm(n_train, w, 0.5)
# learn relationship A|W using HAL-based density estimation procedure
haldensify_fit <- haldensify(
A = a, W = w, n_bins = c(3, 5),
lambda_seq = exp(seq(-1, -15, length = 50L)),
max_degree = 3, reduce_basis = 0.1
)
print(haldensify_fit)
Print: IPW Estimates of the Causal Effects of Stochatic Shift Interventions
Description
Print: IPW Estimates of the Causal Effects of Stochatic Shift Interventions
Usage
## S3 method for class 'ipw_haldensify'
print(x, ..., ci_level = 0.95)
Arguments
x |
An object of class |
... |
Other options (not currently used). |
ci_level |
A |
Details
The print
method for objects of class ipw_haldensify
Value
None. Called for the side effect of printing an informative summary
of slots of objects of class ipw_haldensify
.
Examples
# simulate data
n_obs <- 50
W1 <- rbinom(n_obs, 1, 0.6)
W2 <- rbinom(n_obs, 1, 0.2)
A <- rnorm(n_obs, (2 * W1 - W2 - W1 * W2), 2)
Y <- rbinom(n_obs, 1, plogis(3 * A + W1 + W2 - W1 * W2))
# fit the IPW estimator
est_ipw_shift <- ipw_shift(
W = cbind(W1, W2), A = A, Y = Y,
delta = 0.5, n_bins = 3L, cv_folds = 2L,
lambda_seq = exp(seq(-1, -10, length = 100L)),
# arguments passed to hal9001::fit_hal()
max_degree = 1,
# ...continue arguments for IPW
undersmooth_type = "gcv"
)
print(est_ipw_shift)