Type: Package
Title: Generalized Linear Density Ratio Models
Version: 1.6
Description: Fits a generalized linear density ratio model (GLDRM). A GLDRM is a semiparametric generalized linear model. In contrast to a GLM, which assumes a particular exponential family distribution, the GLDRM uses a semiparametric likelihood to estimate the reference distribution. The reference distribution may be any discrete, continuous, or mixed exponential family distribution. The model parameters, which include both the regression coefficients and the cdf of the unspecified reference distribution, are estimated by maximizing a semiparametric likelihood. Regression coefficients are estimated with no loss of efficiency, i.e. the asymptotic variance is the same as if the true exponential family distribution were known. Huang (2014) <doi:10.1080/01621459.2013.824892>. Huang and Rathouz (2012) <doi:10.1093/biomet/asr075>. Rathouz and Gao (2008) <doi:10.1093/biostatistics/kxn030>.
Depends: R (≥ 3.2.2)
Imports: stats (≥ 3.2.2), graphics (≥ 3.2.2)
Suggests: testthat (≥ 1.0.2)
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.1
NeedsCompilation: no
Packaged: 2024-01-24 22:48:55 UTC; mike
Author: Michael Wurm [aut, cre], Paul Rathouz [aut]
Maintainer: Michael Wurm <wurm@uwalumni.com>
Repository: CRAN
Date/Publication: 2024-01-24 23:10:12 UTC

Control arguments for \beta update algorithm

Description

This function returns control arguments for the \beta update algorithm. Each argument has a default value, which will be used unless a different value is provided by the user.

Usage

beta.control(eps = 1e-10, maxiter = 1, maxhalf = 10)

Arguments

eps

Convergence threshold. The update has converged when the relative change in log-likelihood between iterations is less than eps. Only applies if maxiter>1.

maxiter

Maximum number of iterations allowed.

maxhalf

Maximum number of half steps allowed per iteration if log-likelihood does not improve.

Value

Object of S3 class "betaControl", which is a list of control arguments.


Control arguments for f0 update algorithm

Description

This function returns control arguments for the f_0 update algorithm. Each argument has a default value, which will be used unless a different value is provided by the user.

Usage

f0.control(eps = 1e-10, maxiter = 1000, maxhalf = 20, maxlogstep = 2)

Arguments

eps

Convergence threshold. The update has converged when the relative change in log-likelihood between iterations is less than eps. absolute change is less than thesh.

maxiter

Maximum number of iterations allowed.

maxhalf

Maximum number of half steps allowed per iteration if log-likelihood does not improve between iterations.

maxlogstep

Maximum optimization step size allowed on the log(f0) scale.

Value

Object of S3 class "f0Control", which is a list of control arguments.


Beta optimization routing

Description

Beta optimization routing

Usage

getBeta(
  x,
  y,
  spt,
  ySptIndex,
  f0,
  linkinv,
  mu.eta,
  offset,
  sampprobs,
  betaStart,
  thStart,
  thetaControl = theta.control(),
  betaControl = beta.control()
)

Arguments

x

Covariate matrix.

y

Response vector.

spt

Vector of unique observed support points in the response.

ySptIndex

Index of each y value within the spt vector.

f0

Current values of f0.

linkinv

Inverse link function.

mu.eta

Derivative of inverse link function.

offset

Vector of known offset values to be added to the linear combination (x' beta) for each observation. Mostly intended for likelihood ratio and score confidence intervals.

sampprobs

Optional matrix of sampling probabilities.

betaStart

Starting values for beta (typically the estimates from the previous iteration).

thStart

Starting theta values. Needs to be a list of values matching the output of the getTheta function.

thetaControl

A "thetaControl" object returned from the theta.control function.

betaControl

A "betaControl" object returned from the beta.control function.

Value

A list containing the following:


getTheta Updates theta. Vectorized but only updates observations that have not converged.

Description

getTheta Updates theta. Vectorized but only updates observations that have not converged.

Usage

getTheta(
  spt,
  f0,
  mu,
  sampprobs,
  ySptIndex,
  thetaStart = NULL,
  thetaControl = theta.control()
)

Arguments

spt

Support of the observed response variable. (This is the set of unique values observed, not the set of all possible values.)

f0

Values of the baseline distribution corresponding to the values of spt

mu

The fitted mean for each observation. Note these values must lie strictly within the range of the support.

sampprobs

Matrix of sampling probabilities. The number of rows should equal the number of observations, and the number of columns should equal the number of unique observed support points.

ySptIndex

Vector containing index of each obervation's response value within the spt vector. This is only needed to calculate the log-likelihood after each update.

thetaStart

Vector of starting values. One value per observation. If NULL, zero is used as the starting value for each observation.

thetaControl

Object of class thetaControl, which is a list of control arguments returned by the thetaControl function.

Value

List containing the following:


f0 optimization routine

Description

f0 optimization routine

Usage

getf0(
  y,
  spt,
  ySptIndex,
  sptFreq,
  sampprobs,
  mu,
  mu0,
  f0Start,
  thStart,
  thetaControl = theta.control(),
  f0Control = f0.control(),
  trace = FALSE
)

Arguments

y

Vector of response values.

spt

Vector of unique observed support points in the response.

ySptIndex

Index of each y value within spt.

sptFreq

Vector containing frequency of each spt value.

sampprobs

Optional matrix of sampling probabilities.

mu

Fitted mean for each observation. Only used if sampprobs=NULL.

mu0

Mean constraing for f0.

f0Start

Starting f0 values. (Typically the estimate from the previous iteration.)

thStart

Starting theta values. Needs to be a list of values matching the output of the getTheta function.

thetaControl

A "thetaControl" object returned from the theta.control function.

f0Control

An "f0Control" object returned from the f0.control function. trace Logical. If TRUE, then progress is printed to terminal at each iteration.

Value

A list containing the following:


Fits a generalized linear density ratio model (GLDRM)

Description

A GLDRM is a semiparametric generalized linear model. In contrast to a GLM, which assumes a particular exponential family distribution, the GLDRM uses a semiparametric likelihood to estimate the reference distribution. The reference distribution may be any discrete, continuous, or mixed exponential family distribution. The model parameters, which include both the regression coefficients and the cdf of the unspecified reference distribution, are estimated by maximizing a semiparametric likelihood. Regression coefficients are estimated with no loss of efficiency, i.e. the asymptotic variance is the same as if the true exponential family distribution were known.

Usage

gldrm(
  formula,
  data = NULL,
  link = "identity",
  mu0 = NULL,
  offset = NULL,
  gldrmControl = gldrm.control(),
  thetaControl = theta.control(),
  betaControl = beta.control(),
  f0Control = f0.control()
)

Arguments

formula

An object of class "formula".

data

An optional data frame containing the variables in the model.

link

Link function. Can be a character string to be passed to the make.link function in the stats package (e.g. "identity", "logit", or "log"). Alternatively, link can be a list containing three functions named linkfun, linkinv, and mu.eta. The first is the link function. The second is the inverse link function. The third is the derivative of the inverse link function. All three functions must be vectorized.

mu0

Mean of the reference distribution. The reference distribution is not unique unless its mean is restricted to a specific value. This value can be any number within the range of observed values, but values near the boundary may cause numerical instability. This is an optional argument with mean(y) being the default value.

offset

Known component of the linear term. Offset must be passed through this argument - offset terms in the formula will be ignored. value and covariate values. If sampling weights are a function of both the response value and covariates, then sampprobs must be a n \times q matrix, where n is the number of observations and q is the number of unique observed values in the response vector. If sampling weights do not depend on the covariate values, then sampprobs may alternatively be passed as a vector of length n. All values must be nonnegative and are assumed to correspond to the sorted response values in increasing order.

gldrmControl

Optional control arguments. Passed as an object of class "gldrmControl", which is constructed by the gldrm.control function. See gldrm.control documentation for details.

thetaControl

Optional control arguments for the theta update procedure. Passed as an object of class "thetaControl", which is constructed by the theta.control function. See theta.control documentation for details.

betaControl

Optional control arguments for the beta update procedure. Passed as an object of class "betaControl", which is constructed by the beta.control function. See beta.control documentation for details.

f0Control

Optional control arguments for the f0 update procedure. Passed as an object of class "f0Control", which is constructed by the f0.control function. See f0.control documentation for details.

Details

The arguments linkfun, linkinv, and mu.eta mirror the "link-glm" class. Objects of this class can be created with the stats::make.link function.

The "gldrm" class is a list of the following items.

Value

An S3 object of class "gldrm". See details.

Examples

data(iris, package="datasets")

# Fit a gldrm with log link
fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
             data=iris, link="log")
fit

# Fit a gldrm with custom link function
link <- list()
link$linkfun <- function(mu) log(mu)^3
link$linkinv <- function(eta) exp(eta^(1/3))
link$mu.eta <- function(eta) exp(eta^(1/3)) * 1/3 * eta^(-2/3)
fit2 <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
              data=iris, link=link)
fit2


Control arguments for gldrm algorithm

Description

This function returns control arguments for the gldrm algorithm. Each argument has a default value, which will be used unless a different value is provided by the user.

Usage

gldrm.control(
  eps = 1e-10,
  maxiter = 100,
  returnfTiltMatrix = TRUE,
  returnf0ScoreInfo = FALSE,
  print = FALSE,
  betaStart = NULL,
  f0Start = NULL
)

Arguments

eps

Convergence threshold. The fitting algorithm has converged when the relative change in log-likelihood between iterations is less than eps. A single iteration consists of a beta update followed by an f0 update.

maxiter

Maximum number of iterations allowed.

returnfTiltMatrix

Logical. Return nonparametric fitted probabilities for each observation. This is a matrix with nrow equal to the number of observations and ncol equal to the number of unique response values observed.

returnf0ScoreInfo

Logical. If TRUE, the score and information for log(f0) are returned as components of the "gldrm" object.

print

Logical. If TRUE, the relative change in the log-likelihood will be printed after each iteration.

betaStart

Optional vector of starting values for beta. If the call to gldrm contains a formula, the values of betaStart should correspond to the columns of the model matrix.

f0Start

Optional vector of starting values for f0. The length of the vector should be the number of unique values in the response, and the vector should correspond to these values sorted in increasing order. The starting values will be scaled to sum to one and tilted to have mean mu0. All values should be strictly positive.

Value

Object of S3 class "gldrmControl", which is a list of control arguments.


Confidence intervals for gldrm coefficients

Description

Calculates a Wald, likelihood ratio, or score confidence interval for a single gldrm coefficient. Also calculates upper or lower confidence bounds. Wald confidence intervals and bounds are calculated from the standard errors which are available from the gldrm model fit. For likelihood ratio and score intervals and bounds, a bisection search method is used, which takes longer to run.

Usage

gldrmCI(
  gldrmFit,
  term,
  test = c("Wald", "LRT", "Score"),
  level = 0.95,
  type = c("2-sided", "lb", "ub"),
  eps = 1e-10,
  maxiter = 100
)

Arguments

gldrmFit

A gldrm model fit. Must be an S3 object of class "gldrm", returned from the gldrm function.

term

Character string containing the name of the coefficient of interest. The coefficient names are the names of the beta component of the fitted model object. They can also be obtained from the printed model output. Usually the names match the formula syntax, but can be more complicated for categorical variables and interaction terms.

test

Character string for the type confidence interval. Options are "Wald", "LRT" (for likelihood ratio), and "Score".

level

Confidence level of the interval. Should be between zero and one.

type

Character string containing "2-sided" for a two-sided confidence interval, "lb" for a lower bound, or "ub" for an upper bound.

eps

Convergence threshold. Only applies for test = "LRT" and test = "Score". Convergence is reached when likelihood ratio p-value is within eps of the target p-value, based on the level of the test. For example, a two-sided 95% confidence interval has target p-value of 0.025 for both the upper and lower bounds. A 95% confidence bound has target p-value 0.05.

maxiter

The maximum number of bisection method iterations for likelihood ratio intervals or bounds. For two-sided intervals, maxiter iterations are allowed for each bound.

Value

An S3 object of class 'gldrmCI', which is a list of the following items.

Examples

data(iris, package="datasets")

### Fit gldrm with all variables
fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
             data=iris, link="log")

### Wald 95% confidence interval for Sepal.Width
ci <- gldrmCI(fit, "Sepal.Width", test="Wald", level=.95, type="2-sided")
ci


Main optimization function

Description

This function is called by the main gldrm function.

Usage

gldrmFit(
  x,
  y,
  linkfun,
  linkinv,
  mu.eta,
  mu0 = NULL,
  offset = NULL,
  sampprobs = NULL,
  gldrmControl = gldrm.control(),
  thetaControl = theta.control(),
  betaControl = beta.control(),
  f0Control = f0.control()
)

Likelihood ratio test for nested models

Description

Performs a likelihood ratio F-test between nested gldrm models. The F-statistic is calculated as 2 \times (llik - llik_0) / r, where r is the difference is the number of parameters between the full and null models. The F-statistic has degrees of freedom r and n-p, where n is the number of observations and p is the number of parameters in the full model.

Usage

gldrmLRT(gldrmFit, gldrmNull)

Arguments

gldrmFit

The full model. Must be an object of S3 class 'gldrm' returned from the gldrm function.

gldrmNull

The sub-model being tested under the null hypotheses. Must be an object of S3 class 'gldrm' returned from the gldrm function.

Value

An S3 object of class 'gldrmLRT', containing numerator and denominator degrees of freedom, an F-statistic, and a p-value.

Examples

data(iris, package="datasets")

### Fit gldrm with all variables
fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
             data=iris, link="log")

### Fit gldrm without the categorical variable "Species"
fit0 <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width,
              data=iris, link="log")

### Likelihood ratio test for the nested models
lrt <- gldrmLRT(fit, fit0)
lrt


Confidence intervals for gldrm coefficients

Description

Plots and returns the randomized probability inverse transform of a fitted gldrm.

Usage

gldrmPIT(
  gldrmFit,
  nbreaks = 7,
  cex.main = NULL,
  cex.lab = NULL,
  cex.axis = NULL
)

Arguments

gldrmFit

A gldrm model fit. Must be an S3 object of class "gldrm", returned from the gldrm function. The matrix of semiparametric tilted probabilities must be returned, which is done by fitting gldrm with gldrmControl = gldrm.control(returnfTiltMatrix = TRUE).

nbreaks

Number of breaks in the histogram.

cex.main

Text size for main titles.

cex.lab

Text size for axis labels.

cex.axis

Text size for axis numbers.

Details

The probability inverse transform is defined generally as \hat{F}(y|x), which is the fitted conditional cdf of each observation evaluated at the observed response value. In the case of gldrm, the fitted cdf is descrete, so we draw a random value from a uniform distribution on the interval (\hat{F}(y|x), \hat{F}(y-|x)), where y- is the next largest observed support less than y (or -Infinity if y is the minimum support value). The output and plots generated by this function will vary slightly each time it is called (unless the random number generator seed is set beforehand).

Value

Randomized robability inverse transform as a vector. Also plots the histogram and uniform QQ plot.

Examples

data(iris, package="datasets")

### Fit gldrm and return fTiltMatrix
fit <- gldrm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
             data=iris, link="log")
             
# Probability inverse transform plot
gldrmPIT(fit)


Predict method for a gldrm object

Description

Obtains predicted probabilities, predicted class, or linear predictors.

Usage

## S3 method for class 'gldrm'
predict(
  object,
  newdata = NULL,
  type = c("link", "response", "terms", "fTilt"),
  se.fit = FALSE,
  offset = NULL,
  ...
)

Arguments

object

S3 object of class "gldrm", returned from the gldrm function.

newdata

Optional data frame. If NULL, fitted values will be obtained for the training data.

type

The type of prediction required. Type "link" returns the linear predictor. Type "response" returns the fitted mean. Type "terms" returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale. Type "fTilt" returns a matrix containing the fitted nonparametric distribution for each observation. Each row of the matrix corresponds to an observation in newdata, and each column corresponds to a unique response value in the training data.

se.fit

Logical. If TRUE, standard errors are also returned. Does not apply for type = "fTilt".

offset

Optional offset vector. Only used if newdata is not NULL.

...

Not used. Additional predict arguments.

Value

The object returned depends on type.


Print summary of gldrm fit

Description

Prints fitted coefficients and standard errors, along with a likelihood ratio test against the null model.

Usage

## S3 method for class 'gldrm'
print(x, digits = 3, ...)

Arguments

x

S3 object of class "gldrm", returned from the gldrm function.

digits

Number of digits for rounding.

...

Unused. Additional arguments for print method.


Print confidence interval

Description

Print method for gldrmCI objects.

Usage

## S3 method for class 'gldrmCI'
print(x, digits = 3, ...)

Arguments

x

An S3 object of class 'gldrmCI'.

digits

Number of digits for rounding.

...

Not used. Additional arguments for print method.


Print likelihood ratio test results

Description

Print method for gldrmLRT objects. Prints results of a likelihood ratio F-test between nested models.

Usage

## S3 method for class 'gldrmLRT'
print(x, digits = 3, ...)

Arguments

x

S3 object of class 'gldrmLRT', returned from the gldrmLRT function.

digits

Number of digits for rounding.

...

Not used. Additional arguments for print method.


Control arguments for \theta update algorithm

Description

This function returns control arguments for the \theta update algorithm. Each argument has a default value, which will be used unless a different value is provided by the user.

Usage

theta.control(
  eps = 1e-10,
  maxiter = 100,
  maxhalf = 20,
  maxtheta = 500,
  logit = TRUE,
  logsumexp = FALSE
)

Arguments

eps

Convergence threshold for theta updates. Convergence is evaluated separately for each observation. An observation has converged when the difference between b'(\theta) and \mu is less than epsTheta.

maxiter

Maximum number of iterations.

maxhalf

Maximum number of half steps allowed per iteration if the convergence criterion does not improve.

maxtheta

Absolute value of theta is not allowed to exceed maxtheta.

logit

Logical for whether logit transformation should be used. Use of this stabilizing transformation appears to be faster in general. Default is TRUE.

logsumexp

Logical argument for whether log-sum-exp trick should be used. This may improve numerical stability at the expense of computational time.

Value

Object of S3 class "thetaControl", which is a list of control arguments.