Type: | Package |
Title: | Empirical Bayesian Tobit Matrix Estimation |
Version: | 1.0.2 |
Date: | 2024-05-03 |
Description: | Estimation tools for multidimensional Gaussian means using empirical Bayesian g-modeling. Methods are able to handle fully observed data as well as left-, right-, and interval-censored observations (Tobit likelihood); descriptions of these methods can be found in Barbehenn and Zhao (2023) <doi:10.48550/arXiv.2306.07239>. Additional, lower-level functionality based on Kiefer and Wolfowitz (1956) <doi:10.1214/aoms/1177728066> and Jiang and Zhang (2009) <doi:10.1214/08-AOS638> is provided that can be used to accelerate many empirical Bayes and nonparametric maximum likelihood problems. |
License: | GPL-3 |
URL: | https://github.com/barbehenna/ebTobit |
BugReports: | https://github.com/barbehenna/ebTobit/issues |
Imports: | Rcpp (≥ 1.0.10), RcppParallel, stats |
Suggests: | REBayes |
LinkingTo: | Rcpp, RcppArmadillo, RcppParallel |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.3 |
Depends: | R (≥ 3.3.0) |
NeedsCompilation: | yes |
Packaged: | 2024-05-03 22:50:23 UTC; altonbarbehenn |
Author: | Alton Barbehenn |
Maintainer: | Alton Barbehenn <altonbarbehenn@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-05-03 23:00:02 UTC |
Bile Acid Data
Description
A bile acid data set taken from Lei et al. (2018)
doi:10.1096/fj.201700055R via Wei et al. (2018)
doi:10.1371/journal.pcbi.1005973
(corresponding GitHub repository: https://github.com/WandeRum/GSimp). The
values in BileAcid
can be assumed to be independent log-normal
measurements.
Usage
BileAcid
Format
A data frame with 198 rows and 34 variables. Each row is a patient id and each column is an bile acid measurement.
Convex Optimization of the Kiefer-Wolfowitz NPMLE
Description
This method only works if there is a working installation of REBayes
available. See the REBayes
package and corresponding papers for more
implementation details.
Usage
ConvexDual(A, ...)
Arguments
A |
numeric matrix likelihoods |
... |
further arguments passed to |
Details
The matrix A
is structured as follows: A_ij = P(X_i | theta = t_j),
where X_i is the i'th observation and t_j is the j'th set of
parameters/grid-point.
Value
a vector containing the fitted prior
Convex Optimization of the Kiefer-Wolfowitz NPMLE
Description
This method only works if there is a working installation of REBayes
available. See the REBayes
package and corresponding papers for more
implementation details.
Usage
ConvexPrimal(A, ...)
Arguments
A |
numeric matrix likelihoods |
... |
further arguments passed to |
Details
The matrix A
is structured as follows: A_ij = P(X_i | theta = t_j),
where X_i is the i'th observation and t_j is the j'th set of
parameters/grid-point.
Value
a vector containing the fitted prior
Nonparametric Maximum Likelihood via Expectation Maximization
Description
Compute the nonparametric maximum likelihood estimate given a likelihood matrix. The matrix A is structured so that A_{ij} = f(X_i | theta_j) for some grid of potential parameter values theta_1, ..., theta_p and observations X_1, ..., X_n. The parameters, theta_j, can be multidimensional because all that is required is the likelihood. Convergence is achieved when the relative improvements of the log-likelihood is below the provided tolerance level.
Usage
EM(A, maxiter = 10000L, rtol = 1e-06)
Arguments
A |
numeric matrix likelihoods |
maxiter |
early stopping condition |
rtol |
convergence tolerance: abs(loss_new - loss_old)/abs(loss_old) |
Value
the estimated prior distribution (a vector of masses corresponding to the columns of A)
Examples
set.seed(1)
t = sample(c(0,5), size = 100, replace = TRUE)
x = t + stats::rnorm(100)
gr = seq(from = min(x), to = max(x), length.out = 50)
A = stats::dnorm(outer(x, gr, "-"))
EM(A)
## Not run:
# compare to solution from rmosek (requires additional library installation):
all.equal(
REBayes::KWPrimal(A = A, d = rep(1, 50), w = rep(1/100, 100))$f,
EM(A, maxiter = 1e+6, rtol = 1e-16), # EM alg converges slowly
tolerance = 0.01
)
## End(Not run)
Empirical Bayes Matrix Estimation under a Tobit Likelihood
Description
Fit and estimate the nonparametric maximum likelihood estimator in R^p
(p >= 1) when the likelihood is Gaussian and possibly interval censored. If
p = 1, then L
, R
, and gr
may be vectors (they are
immediately converted into matrices internally).
Usage
ebTobit(
L,
R = L,
gr = (R + L)/2,
s1 = 1,
algorithm = "EM",
pos_lik = TRUE,
...
)
Arguments
L |
n x p matrix of lower bounds on observations |
R |
n x p matrix of upper bounds on observations |
gr |
m x p matrix of grid points |
s1 |
a single numeric standard deviation or an n x p matrix of standard deviations |
algorithm |
method to fit prior, either a function or function name |
pos_lik |
boolean indicating whether to lower-bound the likelihood
matrix with |
... |
further arguments passed into fitting method such as |
Details
Each observation is stored in a pair of matrices, L
and R
. If
L_ij = R_ij then a direct measurement X_ij ~ N(theta, s1^2) is made;
if L_ij < R_ij then the measurement is censored so that L_ij < X_ij < R_ij.
To use a custom fitting algorithm, define a function MyAlg
that
takes in an n x m likelihood matrix: P_ij = P(L_i, R_i | theta = t_j) and
returns a vector of estimated prior weights for t_j. Once MyAlg
is
defined, fit the prior by using algorithm = "MyAlg"
or use the
function itself algorithm = MyAlg
.
Alternative fitting algorithms "ConvexPrimal"and "ConvexDual" have been
(wrappers of REBayes::KWPrimal
and REBayes::KWDual
,
respectively) included and can be used if MOSEK and REBayes
are
properly installed.
Value
a fitted ebTobit
object containing at least the prior weights,
corresponding grid/support points, and likelihood matrix relating the grid to
the observations
Examples
set.seed(1)
n <- 100
p <- 5
r <- 2
U.true <- matrix(stats::rexp(n*r), n, r)
V.true <- matrix(sample(x = c(1,4,7),
size = p*r,
replace = TRUE,
prob = c(0.7, 0.2, 0.1)),
p, r)
TH <- tcrossprod(U.true, V.true)
X <- TH + matrix(stats::rnorm(n*p), n, p)
# fit uncensored method
fit1 <- ebTobit(X)
# fit left-censored method
ldl <- 1 # lower and upper detection limits
udl <- Inf
L <- ifelse(X < ldl, 0, ifelse(X <= udl, X, udl))
R <- ifelse(X < ldl, ldl, ifelse(X <= udl, X, Inf))
fit2 <- ebTobit(L, R)
Fitted Estimates of an ebTobit object
Description
Compute either the posterior mean (default) or posterior L1 mediod which corresponds to the posterior median in one-dimension.
Usage
## S3 method for class 'ebTobit'
fitted(object, method = "mean", ...)
Arguments
object |
an object inheriting from class |
method |
either "mean", "L1mediod", or "mode" corresponding to the
methods: |
... |
not used |
Value
matrix containing the posterior estimates for measurements in the
fit empirical Bayes model object
Validate ebTobit Object
Description
Validate ebTobit Object
Usage
is.ebTobit(object)
Arguments
object |
any R object |
Value
boolean: TRUE if the object is a valid ebTobit
object
Helper Function - generate likelihood matrix
Description
Compute a matrix L whose entries are L[i,k] = P(L_i, R_i | theta = t_k) for observations (L_i, R_i) and grid of means t_k.
Usage
likMat(L, R, gr, s1)
Arguments
L |
n x p matrix of lower bounds |
R |
n x p matrix of upper bounds |
gr |
m x p matrix of candidate means |
s1 |
n x p matrix of standard deviations |
Value
the n x m likelihood matrix under partial interval censoring
Examples
# set-up
n = 100; m = 50; p = 5
gr = matrix(stats::rnorm(m*p), m, p)
L = R = matrix(stats::rnorm(n*p), n, p)
s1 = matrix(1, n, p)
missing.idx = sample.int(n = n*p, size = p*p)
L[missing.idx] = L[missing.idx] - stats::runif(p, 0, 1)
# R solution
lik = matrix(nrow = n, ncol = m)
for (i in 1:n) {
for(k in 1:m) {
lik[i,k] = prod(ifelse(
L[i,] == R[i,],
stats::dnorm(L[i,]-gr[k,], sd = s1[i,]),
stats::pnorm(R[i,]-gr[k,], sd = s1[i,]) - stats::pnorm(L[i,]-gr[k,], sd = s1[i,])
))
}
}
# Compare R to RcppParallel method
all.equal(lik, likMat(L, R, gr, s1))
Helper Function - generate likelihood for pair (L,R) and mean gr
Description
Compute P(L_i, R_i | theta = t_k) for observations (L_i, R_i) and grid of mean t_k.
Usage
lik_GaussianPIC(L, R, gr, s1)
Arguments
L |
numeric vector of lower bounds |
R |
numeric vector of upper bounds |
gr |
numeric vector of means |
s1 |
numeric vector of standard deviations |
Value
the likelihood under partial interval censoring
Examples
# set-up
p = 15
gr = stats::rnorm(p)
L = R = stats::rnorm(p)
missing.idx = sample.int(n = p, size = p/5)
L[missing.idx] = L[missing.idx] - stats::runif(length(missing.idx), 0, 1)
R[missing.idx] = R[missing.idx] + stats::runif(length(missing.idx), 0, 1)
# R solution
lik = prod(ifelse(
L == R,
stats::dnorm(L-gr),
stats::pnorm(R-gr) - stats::pnorm(L-gr)))
# Compare R to RcppParallel method
all.equal(lik, lik_GaussianPIC(L, R, gr, rep(1,p)))
Marginal Log-likelihood of an ebTobit object
Description
Marginal Log-likelihood of an ebTobit object
Usage
## S3 method for class 'ebTobit'
logLik(object, ...)
Arguments
object |
an object inheriting from class |
... |
not used |
Value
log likelihood for the fitted empirical Bayes model in object
Create a new ebTobit object
Description
Validate the provided elements and populate the object. Current methods
require that gr
is numeric for that calculation of posterior
statistics (mean and mediod).
Usage
new_ebTobit(prior, gr, lik)
Arguments
prior |
numeric vector of non-negative weights (sums to one) |
gr |
numeric matrix of support points |
lik |
numeric matrix of likelihoods |
Value
an EBayesMat
object containing at least the prior weights,
corresponding grid/support points, and likelihood matrix relating the grid to
the observations
Compute the Posterior L1 Mediod of an ebTobit object
Description
The posterior L1 mediod is defined as \arg\min_y E |y - t|_1 where the
expectation is taken over the posterior t|X=x. Here the posterior L1 mediod
is evaluated for each of the observations used to fit object
.
Usage
posterior_L1mediod.ebTobit(object)
Arguments
object |
an object inheriting from class |
Value
numeric matrix of posterior L1 mediods for the fitted empirical
Bayes model in object
Compute Posterior Mean of an ebTobit object
Description
Compute Posterior Mean of an ebTobit object
Usage
posterior_mean.ebTobit(object)
Arguments
object |
an object inheriting from class |
Value
numeric matrix of posterior means for the fitted empirical Bayes
model in object
Compute Posterior Mode of an ebTobit object
Description
Compute Posterior Mode of an ebTobit object
Usage
posterior_mode.ebTobit(object)
Arguments
object |
an object inheriting from class |
Value
numeric matrix of posterior modes for the fitted empirical
Bayes model in object
Fitted Estimates of an ebTobit object
Description
Compute either the posterior mean (default) or posterior L1 mediod which corresponds to the posterior median in one-dimension.
Usage
## S3 method for class 'ebTobit'
predict(object, L, R = L, s1 = 1, method = "mean", ...)
Arguments
object |
an object inheriting from class |
L |
n x p matrix of lower bounds on observations |
R |
n x p matrix of upper bounds on observations |
s1 |
a single numeric standard deviation or an n x p matrix of standard deviations |
method |
either "mean", "L1mediod", or "mode" corresponding to the
methods: |
... |
not used |
Value
matrix of posterior estimates for new observations under the
provided, pre-fit empirical Bayes model object
Fit Tobit Standard Deviation via Maximum Likelihood
Description
Fit the matrix of standard deviations given censored observations current mean estimates. Currently there are four models for S implemented: global, column-specific, row-specific, and rank-1.
Usage
tobit_sd(
L,
R,
mu = matrix(colMeans(L + R)/2, nrow(L), ncol(L), byrow = TRUE),
sd.structure = "global",
interval = c(1e-04, 100),
tol = .Machine$double.eps^0.25,
maxiter = 1000
)
Arguments
L |
matrix of lower bounds on observations (n x p) |
R |
matrix of upper bounds on observations (n x p) |
mu |
matrix of known means (n x p) |
sd.structure |
structure imposed on noise level estimates, must be one of: "global", "column", "row", or "rank1" |
interval |
a vector containing the end-points of the interval defining
the convex search space (default: |
tol |
the desired accuracy |
maxiter |
early stopping condition |
Value
matrix of maximum likelihood estimates for each observation's standard deviation (n x p)
Examples
set.seed(1)
n = 100; p = 5; r = 2
U.true = matrix(stats::rexp(n*r), n, r)
V.true = matrix(sample(x = c(1,4,7),
size = p*r,
replace = TRUE,
prob = c(0.7, 0.2, 0.1)),
p, r)
TH = tcrossprod(U.true, V.true)
X = TH + matrix(stats::rnorm(n*p, sd = 1), n, p)
ldl <- 0.1 # lower detection limit, known to be non-negative
L <- ifelse(X < ldl, 0, X)
R <- ifelse(X < ldl, ldl, X)
tobit_sd(L, R, mu = TH)
tobit_sd(L, R, mu = TH, sd.structure = "column")
Maximum Likelihood Estimator for a Single Standard Deviation Parameter
Description
Use standard numerical optimization methods to maximize the log-likelihood of the given problem. If all of the data is passed in, this method computes the global estimate of standard deviation. By passing in a subset of the data, more specific estimates can be made (ex column-specific standard deviations).
Usage
tobit_sd_mle(
L,
R,
mu = matrix(mean(L + R)/2, nrow(L), ncol(L)),
interval = c(1e-04, 100),
tol = .Machine$double.eps^0.25
)
Arguments
L |
matrix of lower bounds on observations (n x p) |
R |
matrix of upper bounds on observations (n x p) |
mu |
matrix of known means (n x p) |
interval |
a vector containing the end-points of the interval defining
the convex search space (default: |
tol |
the desired accuracy |
Value
a list containing estimate (maximum) and log-likelihood (objective)
Examples
set.seed(1)
n = 100; p = 5; r = 2
U.true = matrix(stats::rexp(n*r), n, r)
V.true = matrix(sample(x = c(1,4,7),
size = p*r,
replace = TRUE,
prob = c(0.7, 0.2, 0.1)),
p, r)
TH = tcrossprod(U.true, V.true)
X = TH + matrix(stats::rnorm(n*p, sd = 1), n, p)
ldl <- 0.1 # lower detection limit, known to be non-negative
L <- ifelse(X < ldl, 0, X)
R <- ifelse(X < ldl, ldl, X)
tobit_sd_mle(L, R, mu = TH)