Help for package refreg

Type:

Package

Title:

Conditional Multivariate Reference Regions

Version:

0.1.1

Maintainer:

Oscar Lado-Baleato <oscarlado.baleato@usc.es>

Description:

An R package for estimating conditional multivariate reference regions. The reference region is non parametrically estimated using a kernel density estimator. Covariates effects on the multivariate response means vector and variance-covariance matrix, thus on the region shape, are estimated by flexible additive predictors. Continuous covariates non linear effects might be estimated using penalized splines smoothers. Confidence intervals for the covariates estimated effects might be derived from bootstrap resampling. Kernel density bandwidth can be estimated with different methods, including a method that optimize the region coverage. Numerical, and graphical, summaries can be obtained by the user in order to evaluate reference region performance with real data. Full mathematical details can be found in <doi:10.1002/sim.9163> and <doi:10.1007/s00477-020-01901-1>.

Imports:

ks,KernSmooth,mgcv,gridExtra,ggplot2,RColorBrewer,sp,stringr,foreach,doParallel,pracma,misc3d,rgl,mbend,matrixcalc,methods

License:

GPL-2

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.1.2

Depends:

R (≥ 2.10)

NeedsCompilation:

Packaged:

2022-04-18 10:00:50 UTC; Usuario

Author:

Oscar Lado-Baleato [cre, aut], Javier Roca-Pardinas [aut, ctb], Carmen Cadarso-Suarez [aut, ths], Gude Francisco [aut, ths]

Repository:

CRAN

Date/Publication:

2022-04-18 14:20:02 UTC

Alternating Conditional Expectation (ACE) algorithm

Description

This function implements Alternating Conditional Expectation algorithm (ACE, Hastie and Tibshirani, 1990). This is a bivRegr's inner function for estimating variance, and correlation models.

Usage

ACE(
  y = "y",
  predictor = "~s(x)",
  restriction = "positive",
  eps = 0.01,
  itmax = 10,
  data = data,
  ...
)

Arguments

y

A character defining the response variable.

predictor

The regression predictor, following the mgcv package structure (see example below).

restriction

Type of restriction to be imposed to the response variable, "positive" for variance, and "correlation" for the correlation.

eps

A number defining the allowed estimation error, default = 0.01.

itmax

Maximum number of iterations of the algorithm, default = 10.

data

A data frame containing the response, and predictor variables.

...

Additional mgcv::gam() parameters to be modified by the user.

Value

This function returns a mgcv::gam() fit for a transformed response.

References

Hastie, T. & Tibshirani, J. (1990) Generalized additive models. CRC press. London.

Examples

n <- 100
x <- runif(n, -1, 1)
y <- x^2 + rnorm(n, sd = 0.1)
df <- data.frame(y, x)
plot(df$x, df$y)
m1 <- ACE(
  y = "y", predictor = "~s(x)", restriction = "positive",
  eps = 0.01, itmax = 10, data = df
)$fit
nw <- data.frame(x = seq(-1, 1, 0.1))
abline(h = 0, col = 2)
lines(exp(predict(m1, newdata = nw)) ~ nw$x, col = 3, lwd = 2)
legend("top", legend = c("ACE fit", "Zero"), lty = 1, lwd = 2, col = c(3, 2), bty = "n")

Kernel bandwidth selection method based on bivariate density contours coverage

Description

This function implements a method for estimating bivariate kernel bandwidth based on data covarage. The method starts with the plug in estimate (which usually overfits the data), and then increase this bandwidth value until the desired coverage is obtained. Region coverage is evaluated in an out sample design, using a k fold cross validation scheme.

Usage

Hcov(Y, shape = seq(1, 10, 0.5), k = 20, tau = 0.9, display_plot = TRUE)

Arguments

Y

A matrix containing bivariate data values.

shape

A sequence of values which controls plug in estimator increasing.

k

A number indicating k fold cross validations to be performed.

tau

The desired region coverage

display_plot

A logical indicating if a plot must be displaying, during the function estimation process, summarizing the results.

Value

This function return a diagonal kernel bandwidth matrix.

A Estrada Glycation and Inflammation Study (AEGIS) Data

Description

The aegis dataset was obtained from the Estrada Glycation and Inflammation (AEGIS) project. A cross sectional, population based study that was performed in the municipality of A Estrada (Galicia, NW Spain). The study objective was to investigate the association between glycation, inflammation status, lifestyle and common diseases, and reasons for glycemic markers discordances (Gude et al., 2017).

Usage

aegis

Format

A data frame containing 1516 observations, and 7 variables:

id: Anonymized patient identificator.
gender: A factor variable describing patient gender, (female, and male).
age: Patients' age.
dm: Diabetes mellitus indicator (no, and yes).
fpg: Fasting plasma glucose levels (mg/dL).
hba1c: Glycated hemoglobin percentage.
fru: Fructosamine concentration (mg/dL).

References

Gude F, Diaz–Vidal P, Rua–Perez C, et al. Glycemic variability and its association with demographics and lifestyles in ageneral adult population. J Diabetes Sci Technol. 2017; 11(4): 780–790

Bivariate reference region estimation

Description

This functions estimate a probabilistic/reference region for bivariate data. It is based on a kernel density estimation. It may be applied to a set of bivariate data points, or to a bivRegr object. In the former case, the function will estimate a bivariate reference region for the model standarized residuals.

Usage

bivRegion(
  Y = fit,
  H_choice = "Hcov",
  tau = 0.95,
  k = 20,
  display_plot = TRUE,
  shape = NULL,
  ...
)

Arguments

Y

A set of bivariate data points, or a bivRegr object.

H_choice

Kernel bandwidth selection method: "plug.in" for plug.in method, "LSCV" for least squate cross valiation, "SCV" for smooth cross validation, and "Hcov" for a bandwidth selection method which optimize the region coverage.

tau

A number or vector defining the desired coverage(s) of the bivariate reference region.

k

In case of using "Hcov" the number of k fold cross validations replicates to be performed.

display_plot

A logical indicating if plot must be displayed during "Hcov" bandwidht estimation procedure. The plot depicts region's coverage, evaluated with k fold cross validation, depending on kernel bandwidth value.

shape

Shape parameter modulating the final shape of the bivariate probabilistic/reference region by hand.

...

Additional parameters to be modified in KernSmooth::bkde2D() function by the user (e.g. gridsize).

Value

This function return a region or a set of regions containing a given percentage of bivariate data points.

References

Duong, T. (2019) ks: Kernel Smoothing. R package version 1.11.6. https://CRAN.R–project.org/package=ks.

Matt Wand (2020). KernSmooth: Functions for Kernel Smoothing Supporting Wand & Jones (1995). R package version 2.23–18. https://CRAN.R–project.org/package=KernSmooth

Examples

Y <- cbind(rnorm(100), rnorm(100))
Y <- as.data.frame(Y)
names(Y) <- c("y1", "y2")
region <- bivRegion(Y, tau = 0.95, shape = 2)
plot(region)

Bivariate regression model

Description

This function estimates the covariates effects on the means vector, and variance covariance matrix from a bivariate variable. Non linear effects might be estimated for continuous covariates using penalized spline smoothers.

Usage

bivRegr(f = f, data = data, ace.eps = 0.01, ace.itmax = 25)

Arguments

f

A list of five formulas defining the covariates effects in both responses means, variances, and in their correlation. The formulas follow the same structure as mgcv::gam() (see example below).

data

A data frame containing the reponses, and predictor variables values.

ace.eps

A number defining the error rate in the ACE algorithm.

ace.itmax

A number defining the maximum number of ACE algorithm iterations.

Value

This function returns the covariates effect on the means, variances, and correlation of a bivariate response variable.

Examples

# Bivariate reference region for fasting plasma glucose (fpg)
# and glycated hemoglobin (hba1c) levels depending on age

dm_no <- subset(aegis, aegis$dm == "no") # select healthy patients
# 1.1) Define predictors
mu1 <- fpg ~ s(age)
mu2 <- hba1c ~ s(age)
var1 <- ~ s(age)
var2 <- ~ s(age)
rho <- ~ s(age)
f <- list(mu1, mu2, var1, var2, rho)

fit <- bivRegr(f, data = dm_no)

# 1.2) Depict the estimated covariates effects
plot(fit, eq = 1)
plot(fit, eq = 2)
plot(fit, eq = 3)
plot(fit, eq = 4)
plot(fit, eq = 5)
# 1.2.1) Depict the estimated covariates effects with CI (Not Run)

s0 <- summary_boot(fit, B = 100) # no parallelization
# s1 = summary_boot(fit,B=100,parallel=TRUE) #parallelization
plot(s0, eq = 1)


# 1.3) Obtain the reference region in the standarized residuals
region <- bivRegion(fit, tau = 0.95, shape = 2)
plot(region)

# 1.4) Identify those patients located outside the reference region
summary(region)

# 1.5) Depict the conditional reference region for two ages
plot(region,
  cond = TRUE, newdata = data.frame(age = c(20, 50)), col = "grey", pch = "*",
  reg.lwd = 2, reg.lty = 2
)

Plot a bivRegion object

Description

This function allow to depict the estimated bivariate reference/probabilistic region, in the estandarized residuals scale (cond=FALSE), or for any covariate value (cond=TRUE).

Usage

## S3 method for class 'bivRegion'
plot(
  x,
  tau = 0.95,
  newdata = NULL,
  reg.col = NULL,
  reg.lwd = 1,
  reg.lty = NULL,
  axes = TRUE,
  axes.col = "black",
  axes.lwd = 2L,
  cond = FALSE,
  add = FALSE,
  legend = TRUE,
  ...
)

Arguments

x

A bivRegion object.

tau

A number, or vector, defining the desired coverage(s) of the bivariate reference region.

newdata

If cond=FALSE, a data.frame with new values to be depicted in the standarized residuals scale. If cond=TRUE, a data frame containing covariate values for which the reference/probabilistic region will be depicted.

reg.col

Region line colour, in case of more than one tau it can be a vector.

reg.lwd

Region line width, in case of more than one tau it can be a vector.

reg.lty

Region line type, in case of more than one tau it can be a vector.

axes

Logical; if TRUE (and cond=FALSE), vertical and horizontal lines are added indicating four quadrants in the model residuals scale.

axes.col

Axes colour.

axes.lwd

Axes line width.

cond

A logical argument, if TRUE a conditional reference region is depicted.

add

A logical argument, if TRUE the conditional reference region is depicted over a pre existing plot.

legend

A logical argument, if TRUE a legend is given along with the reference region.

...

Further plot parameters.

Value

This function return a graphical representation for a bivRegion object.

Examples

Y <- cbind(rnorm(100), rnorm(100))
Y <- as.data.frame(Y)
names(Y) <- c("y1", "y2")
reg <- bivRegion(Y, tau = 0.95, shape = 2)
plot(reg)

Plot method for bivRegr fit

Description

This function takes an bivRegr object and plots the estimated effects for the conditional response means, variances or correlation. summary_boot function must be applied by the user in order to get the estimated effects confidence intervals.

Usage

## S3 method for class 'bivRegr'
plot(x, eq = 1, newdata = NULL, ...)

Arguments

x

A bivRegr fit.

eq

A number indicating the model effects to be depicted; 1 = first response mean, 2 = second response mean, 3 = first response variance, 4 = second response variance, and 5 = correlation model.

newdata

An optional data frame containing covariates values.

...

Additional plot arguments.

Value

This function returns a plot for bivRegr mean, variance and correlation models.

Plot univariate conditional quantile models curves (i.e. reference curves)

Description

This function depict the univariate conditional quantile model based on the non parametric location scale model fitted with the refcurv function.

Usage

## S3 method for class 'refcurve'
plot(
  x,
  newdata = data.frame(x = seq(0, 1, 0.01)),
  tau = seq(0.1, 0.9, 0.2),
  ...
)

Arguments

x

A refcurv object.

newdata

A data frame defining a sequence of the predictor variables values.

tau

A number or vector defining desired quantile.

...

Additional plot options.

Value

This function returns a plot of the refcurve model.

Default summary_boot plotting

Description

This function takes the bivRegr bootstrap replicates obtained with summary_boot function, and plots the parametric, and smooth effects for each model.

Usage

## S3 method for class 'summary_boot'
plot(x, eq = 1, select = NULL, ...)

Arguments

x

A summary_boot object.

eq

A number indicating the model effects to be depicted; 1 = first response mean, 2 = second response mean, 3 = first response variance, 4 = second response variance, and 5 = correlation model.

select

An optional parameter to represent an specific effect for each equation.

...

Additional plot parameters, not yet implemented.

Value

This function returns a ggplot2 plot for the estimated effects along with bootstrap 95% confidence intervals.

Default trivRegion plotting

Description

This function allow to depict in an interactive rgl plot the estimated trivariate reference/probabilistic region. If cond=FALSE it showes trivariate standarized residuals, while with cond=TRUE it represents the region shape for any covariate(s) value.

Usage

## S3 method for class 'trivRegion'
plot(
  x,
  cond = FALSE,
  planes = FALSE,
  newdata = NULL,
  add = FALSE,
  reg.col = NULL,
  incol = "grey",
  legend = FALSE,
  ...
)

Arguments

x

A trivRegion object.

cond

A logical argument, if TRUE a conditional reference region is depicted.

planes

Logical; if TRUE, planes are added indicating (x=0,y=0,z=0).

newdata

If cond==TRUE, a data frame containing covariate values for which the reference/probabilistic region will be depicted.

add

A logical argument, if TRUE the conditional reference region is depicted over a pre existing plot.

reg.col

Region contour colour.

incol

Colours for the points included inside the reference region.

legend

A logical argument, if TRUE a legend is given along with the reference region.

...

Further rgl plot parameters.

Value

This function return an interactive rgl plot of a bivRegion object.

SO2 and Nox concentrations

Description

The pollution dataset contains the SO2 and Nox concentrations in the As Pontes thermal power plant surrondings, during a year. The data was analyzed from a bivariate perspective in Roca–Pardiñas et al. (2021).

Usage

pollution

Format

A data frame with 6179 observations and 13 variables:

Date: The observation day, hour, and minute.
So2: So2 air concentration at the present time.
Nox: Nox air concentration at the present time.
So2_0: So2 air concentration registered 30 minutes before than the So2 one.
Nox_0: Nox air concentration registered 30 minutes before than the Nox one.
So2_1: So2 air concentration registered 45 minutes before than the So2 one.
Nox_1: Nox air concentration registered 45 minutes before than the Nox one.
So2_2: So2 air concentration registered 60 minutes before than the So2 one.
Nox_2: Nox air concentration registered 60 minutes before than the Nox one.
So2_3: So2 air concentration registered 75 minutes before than the So2 one.
Nox_3: Nox air concentration registered 75 minutes before than the Nox one.
So2_4: So2 air concentration registered 90 minutes before than the So2 one.
Nox_4: Nox air concentration registered 90 minutes before than the Nox one.

References

Roca–Pardinas, J., Ordonez, C., & Lado Baleato, O. (2021). Nonparametric location scale model for the joint forecasting of SO2 and NOx pollution episodes. Stochastic Environmental Research and Risk Assessment, 35(2), 231–244.

SO2 and Nox concentrations during a pollution episode

Description

SO2 and Nox concentrations in the As Pontes thermal power plant surrondings, during a pollution episode.

Usage

pollution_episode

Format

A data frame with 288 observations and 13 variables:

Date: The observation day, hour and minute.
So2: So2 air concentration at the present time.
Nox: Nox air concentration at the present time.
So2_0: So2 air concentration registered 30 minutes before than the So2 one.
Nox_0: Nox air concentration registered 30 minutes before than the Nox one.
So2_1: So2 air concentration registered 45 minutes before than the So2 one.
Nox_1: Nox air concentration registered 45 minutes before than the Nox one.
So2_2: So2 air concentration registered 60 minutes before than the So2 one.
Nox_2: Nox air concentration registered 60 minutes before than the Nox one.
So2_3: So2 air concentration registered 75 minutes before than the So2 one.
Nox_3: Nox air concentration registered 75 minutes before than the Nox one.
So2_4: So2 air concentration registered 90 minutes before than the So2 one.
Nox_4: Nox air concentration registered 90 minutes before than the Nox one.

Prediction for a bivRegion object

Description

This function takes a bivRegion object and allow to obtain region limits for a given covariate value if cond=TRUE. If not, it can be applied to a new dataset to evaluate which points will be included inside the standarized region (for instance, if we estimate a reference region with healthy patients results, we can know where non healthy patients will be located).

Usage

## S3 method for class 'bivRegion'
predict(object, tau = 0.95, newdata = NULL, cond = FALSE, ...)

Arguments

object

A bivRegion object.

tau

A number or vector defining the desired coverage(s) of the bivariate reference/probabilistic region.

newdata

An optional data frame containing new covariate values, or new observations.

cond

A logical argument, if TRUE, a matrix of values defining the conditional reference region limits is given.

...

Additional prediction parameters.

Value

This function returns reference region limits values for a given covariate value (if cond=TRUE), or which observations falls outside the estimated region (if cond=FALSE).

Predict method for bivRegr

Description

Obtains predictions for a bivRegr object, that is the bivRegr means, variances, and correlation for given covariate values.

Usage

## S3 method for class 'bivRegr'
predict(object, newdata = NULL, ...)

Arguments

object

A bivRegr fit.

newdata

A data frame defining the covariate values for prediction. Default is NULL and the prediction will be done in the original data.

...

Additional predict options.

Value

This function returns prediction of bivRegr mean, variance and correlation models.

Univariate reference curve model

Description

This function obtain univariate conditional quantiles as described in Martinez-Silva et. al (2016).

Usage

refcurv(mu = "y~s(x)", sigma = "~s(x)", data = data)

Arguments

mu

A formula object for the response mean model following the mgcv package structure (see example below).

sigma

a formula object for fitting a model to the response variance (see example below).

data

A data frame containing both the response, and predictor variables.

Details

In the Martinez Silva et. al (2016) the non linear effects of the continuous covariates are estimating through polynomial kernel smoother, in this package we implement the same methodology but using penalized splines in order to reduce computational cost.

Value

This function returns univariate conditional quantiles estimated using a non parametric location scale model.

References

Martinez–Silva, I., Roca–Pardinas, J., & Ordonez, C. (2016). Forecasting SO2 pollution incidents by means of quantile curves based on additive models. Environmetrics, 27(3), 147–157.

Examples

#--- Glycation hemoglobin reference curve depending on age
dm_no <- subset(aegis, aegis$dm == "no")
fit1 <- refcurv(mu = "hba1c~s(age)", sigma = "~s(age)", data = dm_no)
plot(fit1, newdata = data.frame(age = 18:90), tau = c(0.025, 0.05, 0.10, 0.90, 0.95, 0.975))

bivRegion summary method

Description

This function takes an bivRegion object and indicates which observations are located outside the estimated reference region and why.

Usage

## S3 method for class 'bivRegion'
summary(object, tau = 0.95, ...)

Arguments

object

A bivRegion object.

tau

The data coverage proportion previously obtained by the bivRegion function.

...

Additional summary options.

Value

This function indicates the region apparent coverage, and which patients are located outside the estimated reference region.

bivRegr summary function

Description

This function perform a bootstrap procedure in order to obtain confidence intervals for the bivRegr estimated effects. The function allow to paralelize the bootstrap resampling scheme using doParalell, and foreach libraries.

Usage

summary_boot(object, B = 100, parallel = FALSE, cores = NULL)

Arguments

object

A bivRegr fit.

B

A number indicating the bootstrap iterations.

parallel

A logical indicating if bootstrap parallelization must be applied.

cores

If parallel = TRUE, a number indicating computer cores to be used during paralellization. If NULL the function use all cores available but one.

Value

This function returns the bootstrap replicates of the bivRegr sub models. Results might be checked applying plot.summary_boot().

Trivariate reference region estimation

Description

This functions estimate a probabilistic/reference region for trivariate data. It is based on a non parametric kernel density estimation. It can only be applied to a trivRegr object, and for one single tau.

Usage

trivRegion(fit, tau = 0.9)

Arguments

fit

A trivRegr object.

tau

A number defining the desired coverage of the trivariate reference region.

Value

This function return a region containing a given percentage of trivariate data points.

References

Duong, T. (2019) ks: Kernel Smoothing. R package version 1.11.6. https://CRAN.R–project.org/package=ks.

Trivariate regression model

Description

This function estimates the covariates effects on the means vector, and variance covariance matrix of a trivariate variable. Non linear effects might be estimated for continuous covariates using penalized splines.

Usage

trivRegr(f = f, data = data)

Arguments

f

A list of 9 formulas defining the covariates effects in three responses means, in their variances, and in their correlations. The formulas follow the mgcv::gam() structure.

data

A data frame containing the reponses, and predictor variables values.

Value

This function returns the covariates effect on the means, variances, and correlation for a trivariate response variable.

Examples

dm_no <- subset(aegis, aegis$dm == "no")

# Model formulas
mu1 <- fpg ~ s(age)
mu2 <- hba1c ~ s(age)
mu3 <- fru~s(age)
var1 <- ~ s(age)
var2 <- ~ s(age)
var3 <- ~ s(age)
theta12 <- ~ s(age)
theta13 <- ~ s(age)
theta23 <- ~ s(age)
f <- list(mu1, mu2, mu3, var1, var2, var3, theta12, theta13, theta23)

# Model fit
fit <- trivRegr(f, data = dm_no)
# Trivariate region estimation
region <- trivRegion(fit, tau = 0.95)
plot(region, col = 2, planes = TRUE)
plot(region,
  cond = TRUE, newdata = data.frame(age = c(20, 80)),
  xlab = "FPG, mg/dl", ylab = "HbA1c, %", zlab = "Fru, mg/dL"
)