Help for package MiRKAT

Title:

Microbiome Regression-Based Kernel Association Tests

Version:

1.2.3

Maintainer:

Anna Plantinga <amp9@williams.edu>

Description:

Test for overall association between microbiome composition data and phenotypes via phylogenetic kernels. The phenotype can be univariate continuous or binary (Zhao et al. (2015) <doi:10.1016/j.ajhg.2015.04.003>), survival outcomes (Plantinga et al. (2017) <doi:10.1186/s40168-017-0239-9>), multivariate (Zhan et al. (2017) <doi:10.1002/gepi.22030>) and structured phenotypes (Zhan et al. (2017) <doi:10.1111/biom.12684>). The package can also use robust regression (unpublished work) and integrated quantile regression (Wang et al. (2021) <doi:10.1093/bioinformatics/btab668>). In each case, the microbiome community effect is modeled nonparametrically through a kernel function, which can incorporate phylogenetic tree information.

Depends:

R (≥ 3.1.0)

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.2.3

NeedsCompilation:

Imports:

MASS, CompQuadForm, quantreg, GUniFrac, PearsonDS, lme4, Matrix, permute, mixtools, survival, stats

Suggests:

knitr, vegan, rmarkdown, magrittr, kableExtra

VignetteBuilder:

knitr, rmarkdown

Packaged:

2023-02-15 23:09:07 UTC; anna

Author:

Anna Plantinga [aut, cre], Nehemiah Wilson [aut, ctb], Haotian Zheng [aut, ctb], Tianying Wang [aut, ctb], Xiang Zhan [aut, ctb], Michael Wu [aut], Ni Zhao [aut, ctb], Jun Chen [aut]

Repository:

CRAN

Date/Publication:

2023-02-17 14:50:02 UTC

Small-sample SKAT for correlated (continuous) data ('c' stands for 'correlated'). Called within GLMM-MiRKAT.

Description

Compute the adjusted score statistic and p-value

Usage

CSKAT(lmer.obj, Ks)

Arguments

lmer.obj

A fitted lme4 object (model under H0)

Ks

A kernel matrix or list of kernels, quantifying the similarities between samples.

Value

p.value: Association p-values
Q.adj: Adjusted score statistics

Author(s)

Nehemiah Wilson, Anna Plantinga, Xiang Zhan, Jun Chen.

References

Zhan X, et al. (2018) A small-sample kernel association test for correlated data with application to microbiome association studies. Genet Epidemiol.

D2K

Description

Construct kernel matrix from distance matrix.

Usage

D2K(D)

Arguments

D

An n by n matrix giving pairwise distances or dissimilarites, where n is sample size.

Details

Converts a distance matrix (matrix of pairwise distances) into a kernel matrix for microbiome data. The kernel matrix is constructed as K = -(I-11'/n)D^2(I-11'/n)/2, where D is the pairwise distance matrix, I is the identity matrix, and 1 is a vector of ones.

D^2 represents element-wise square.

To ensure that K is positive semi-definite, a positive semi-definiteness correction is conducted

Value

An n by n kernel or similarity matrix corresponding to the distance matrix given.

Author(s)

Ni Zhao

References

Zhao, Ni, et al. "Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test

Examples

library(GUniFrac)

#Load in data and create a distance matrix
data(throat.tree)
data(throat.otu.tab)
unifracs <- GUniFrac(throat.otu.tab, throat.tree, alpha=c(1))$unifracs
D1 <- unifracs[,,"d_1"]

#Function call
K <- D2K(D1)

The Microbiome Regression-based Kernel Association Test Based on the Generalized Linear Mixed Model

Description

GLMMMiRKAT utilizes a generalized linear mixed model to allow dependence among samples.

Usage

GLMMMiRKAT(
  y,
  X = NULL,
  Ks,
  id = NULL,
  time.pt = NULL,
  model,
  method = "perm",
  slope = FALSE,
  omnibus = "perm",
  nperm = 5000
)

Arguments

y

A numeric vector of Gaussian (e.g., body mass index), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits.

X

A vector or matrix of numeric covariates, if applicable (default = NULL).

Ks

A list of n-by-n OTU kernel matrices or one singular n-by-n OTU kernel matrix, where n is sample size.

id

A vector of cluster (e.g., family or subject including repeated measurements) IDs. Defaults to NULL since it is unnecessary for the CSKAT call.

time.pt

A vector of time points for the longitudinal studies. 'time.pt' is not required (i.e., 'time.pt = NULL') for the random intercept model. Default is time.pt = NULL.

model

A string declaring which model ("gaussian", "binomial" or "poisson") is to be used; should align with whether a Gaussian, Binomial, or Poisson trait is being inputted for the y argument.

method

A string declaring which method ("perm" or "davies) will be used to calculate the p-value. Davies is only available for Gaussian traits. Defaults to "perm".

slope

An indicator to include random slopes in the model (slope = TRUE) or not (slope = FALSE). 'slope = FALSE' is for the random intercept model. 'slope = TRUE' is for the random slope model. For the random slope model (slope = TRUE), 'time.pt' is required.

omnibus

A string equal to either "Cauchy" or "permutation" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or residual permutation to generate the omnibus p-value.

nperm

The number of permutations used to calculate the p-values and omnibus p-value. Defaults to 5000.

Details

Missing data is not permitted. Please remove all individuals with missing y, X, and Ks prior to input for analysis.

y and X (if not NULL) should be numerical matrices or vectors with the same number of rows.

Ks should either be a list of n by n kernel matrices (where n is sample size) or a single kernel matrix. If you have distance matrices from metagenomic data, each kernel can be constructed through function D2K. Each kernel can also be constructed through other mathematical approaches.

If model="gaussian" and method="davies", CSKAT is called. CSKAT utilizes the same omnibus test as GLMMMiRKAT. See ?CSKAT for more details.

The "method" argument only determines kernel-specific p-values are generated. When Ks is a list of multiple kernels, an omnibus p-value is computed via permutation.

Value

Returns a p-value for each inputted kernel matrix, as well as an overall omnibus p-value if more than one kernel matrix is inputted

p_values

p-value for each individual kernel matrix

omnibus_p

overall omnibus p-value calculated by permutation for the adaptive GLMMMiRKAT analysis

Author(s)

Hyunwook Koh

References

Koh H, Li Y, Zhan X, Chen J, Zhao N. (2019) A distance-based kernel association test based on the generalized linear mixed model for correlated microbiome studies. Front. Genet. 458(10), 1-14.

Examples



## Example with Gaussian (e.g., body mass index) traits
## For non-Gaussian traits, see vignette. 

# Import example microbiome data with Gaussian traits
data(nordata)
otu.tab <- nordata$nor.otu.tab
meta <- nordata$nor.meta

# Create kernel matrices and run analysis 
if (requireNamespace("vegan")) {
 library(vegan)
 D_BC = as.matrix(vegdist(otu.tab, 'bray'))
 K_BC = D2K(D_BC)
 GLMMMiRKAT(y = meta$y, X = cbind(meta$x1, meta$x2), id = meta$id, 
          Ks = K_BC, model = "gaussian", nperm = 500)
} else {
# Computation time is longer for phylogenetic kernels 
 tree <- nordata$nor.tree
 unifracs <- GUniFrac::GUniFrac(otu.tab, tree, alpha=c(1))$unifracs
 D_W <- unifracs[,,"d_1"] 
 K_W = D2K(D_W)
 GLMMMiRKAT(y = meta$y, X = cbind(meta$x1, meta$x2), id = meta$id, 
          Ks = K_W, model = "gaussian", nperm = 500)
}

Kernel RV Coefficient Test (KRV)

Description

Kernel RV coefficient test to evaluate the overall association between microbiome composition and high-dimensional or structured phenotype or genotype.

Usage

KRV(
  y = NULL,
  X = NULL,
  adjust.type = NULL,
  kernels.otu,
  kernel.y,
  omnibus = "kernel_om",
  returnKRV = FALSE,
  returnR2 = FALSE
)

Arguments

y

A numeric n by p matrix of p continuous phenotype variables and sample size n (default = NULL). If it is NULL, a phenotype kernel matrix must be entered for "kernel.y". Defaults to NULL.

X

A numeric n by q matrix, containing q additional covariates (default = NULL). If NULL, an intercept only model is used. If the first column of X is not uniformly 1, then an intercept column will be added.

adjust.type

Possible values are "none" (default if X is null), "phenotype" to adjust only the y variable (only possible if y is a numeric phenotype matrix rather than a pre-computed kernel), or "both" to adjust both the X and Y kernels.

kernels.otu

A numeric OTU n by n kernel matrix or a list of matrices, where n is the sample size. It can be constructed from microbiome data, such as by transforming from a distance metric.

kernel.y

Either a numerical n by n kernel matrix for phenotypes or a method to compute the kernel of phenotype. Methods are "Gaussian" or "linear". A Gaussian kernel (kernel.y="Gaussian") can capture the general relationship between microbiome and phenotypes; a linear kernel (kernel.y="linear") may be preferred if the underlying relationship is close to linear.

omnibus

A string equal to either "Cauchy" or "kernel_om" (or unambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or an omnibus kernel to generate the omnibus p-value.

returnKRV

A logical indicating whether to return the KRV statistic. Defaults to FALSE.

returnR2

A logical indicating whether to return the R-squared coefficient. Defaults to FALSE.

Details

kernels.otu should be a list of numerical n by n kernel matrices, or a single n by n kernel matrix, where n is sample size.

When kernel.y is a method ("Gaussian" or "linear") to compute the kernel of phenotype, y should be a numerical phenotype matrix, and X (if not NULL) should be a numeric matrix of covariates. Both y and X should have n rows.

When kernel.y is a kernel matrix for the phenotype, there is no need to provide X and y, and they will be ignored if provided. In this case, kernel.y and kernel.otu should both be numeric matrices with the same number of rows and columns.

Missing data is not permitted. Please remove all individuals with missing kernel.otu, y (if not NULL), X (if not NULL), and kernel.y (if a matrix is entered) prior to analysis.

Value

If only one candidate kernel matrix is considered, returns a list containing the p-value for the candidate kernel matrix. If more than one candidate kernel matrix is considered, returns a list of two elements:

p_values

P-value for each candidate kernel matrix

omnibus_p

Omnibus p-value

KRV

A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE

R2

A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE

Author(s)

Nehemiah Wilson, Haotian Zheng, Xiang Zhan, Ni Zhao

References

Zheng, Haotian, Zhan, X., Plantinga, A., Zhao, N., and Wu, M.C. A Fast Small-Sample Kernel Independence Test for Microbiome Community-Level Association Analysis. Biometrics. 2017 Mar 10. doi: 10.1111/biom.12684.

Liu, Hongjiao, Ling, W., Hua, X., Moon, J.Y., Williams-Nguyen, J., Zhan, X., Plantinga, A.M., Zhao, N., Zhang, A., Durazo-Arzivu, R.A., Knight, R., Qi, Q., Burk, R.D., Kaplan, R.C., and Wu, M.C. Kernel-based genetic association analysis for microbiome phenotypes identifies host genetic drivers of beta-diversity. 2021+

Examples

        
library(GUniFrac)
library(MASS)

data(throat.tree)
data(throat.otu.tab)
data(throat.meta)

## Simulate covariate data 
set.seed(123)
n = nrow(throat.otu.tab)
Sex <- throat.meta$Sex
Smoker <- throat.meta$SmokingStatus
anti <- throat.meta$AntibioticUsePast3Months_TimeFromAntibioticUsage
Male = (Sex == "Male")**2
Smoker = (Smoker == "Smoker") **2
Anti =  (anti != "None")^2
cova = cbind(1, Male, Smoker, Anti)

## Simulate microbiome data
otu.tab.rff <- Rarefy(throat.otu.tab)$otu.tab.rff
unifracs <- GUniFrac(otu.tab.rff, throat.tree, alpha=c(0, 0.5, 1))$unifracs
# Distance matrices
D.weighted = unifracs[,,"d_1"]
D.unweighted = unifracs[,,"d_UW"]
# Kernel matrices 
K.weighted = D2K(D.weighted)
K.unweighted = D2K(D.unweighted)

if (requireNamespace("vegan")) {
 library(vegan)
 D.BC = as.matrix(vegdist(otu.tab.rff, method="bray"))
 K.BC = D2K(D.BC)
} 

# Simulate phenotype data 
rho = 0.2
Va = matrix(rep(rho, (2*n)^2), 2*n, 2*n)+diag(1-rho, 2*n)
Phe = mvrnorm(n, rep(0, 2*n), Va)  
K.y = Phe %*% t(Phe)  # phenotype kernel 

# Simulate genotype data
G = matrix(rbinom(n*10, 2, 0.1), n, 10) 
K.g = G %*% t(G)  # genotype kernel


## Unadjusted analysis (microbiome and phenotype)
KRV(y = Phe, kernels.otu = K.weighted, kernel.y = "Gaussian")  # numeric y 
KRV(kernels.otu = K.weighted, kernel.y = K.y)  # kernel y 

## Adjusted analysis (phenotype only)
KRV(kernels.otu = K.weighted, y = Phe, kernel.y = "linear", X = cova, adjust.type = "phenotype")

if (requireNamespace("vegan")) {
  ## Adjusted analysis (adjust both kernels; microbiome and phenotype) 
  KRV(kernels.otu = K.BC, kernel.y = K.y, X = cova, adjust.type='both')

  ## Adjusted analysis (adjust both kernels; microbiome and genotype)
  KRV(kernels.otu = K.BC, kernel.y = K.g, X = cova, adjust.type='both')
}

Multivariate Microbiome Regression-based Kernel Association Test

Description

Test for association between overall microbiome composition and multiple continuous outcomes.

Usage

MMiRKAT(Y, X = NULL, Ks, returnKRV = FALSE, returnR2 = FALSE)

Arguments

Y

A numerical n by p matrix of p continuous outcome variables, n being sample size.

X

A numerical n by q matrix or data frame, containing q additional covariates that you want to adjust for (Default = NULL). If it is NULL, an intercept only model is fit.

Ks

A list of numerical n by n kernel matrices, or a single n by n kernel matrix, where n is the sample size. Kernels can be constructed from distance matrices (such as Bray-Curtis or UniFrac distances) using the function D2K, or through other mathematical approaches.

returnKRV

A logical indicating whether to return the KRV statistic. Defaults to FALSE.

returnR2

A logical indicating whether to return the R-squared coefficient. Defaults to FALSE.

Details

Missing data is not permitted. Please remove all individuals with missing Y, X, K prior to analysis

The method of generating kernel specific p-values is "davies", which represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq.

Value

Returns a list of the MMiRKAT p-values for each inputted kernel matrix, labeled with the names of the kernels, if given.

p_values

list of the p-values for each individual kernel matrix inputted

KRV

A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE

R2

A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE

Author(s)

Nehemiah Wilson, Haotian Zheng, Xiang Zhan, Ni Zhao

References

Zheng, H., Zhan, X., Tong, X., Zhao, N., Maity,A., Wu, M.C., and Chen,J. A small-sample multivariate kernel machine test for microbiome association studies. Genetic Epidemiology, 41(3), 210-220. DOI: 10.1002/gepi.22030

Examples

 
library(GUniFrac)
if(requireNamespace("vegan")) { library(vegan) }

data(throat.tree)
data(throat.otu.tab)
data(throat.meta)

unifracs <- GUniFrac(throat.otu.tab, throat.tree, alpha=c(0, 0.5, 1))$unifracs

if(requireNamespace("vegan")) {
 BC= as.matrix(vegdist(throat.otu.tab , method="bray"))
 Ds = list(w = unifracs[,,"d_1"], u = unifracs[,,"d_UW"], BC = BC) 
} else {
 Ds = list(w = unifracs[,,"d_1"], u = unifracs[,,"d_UW"]) 
}
Ks = lapply(Ds, FUN = function(d) D2K(d)) 

n = nrow(throat.otu.tab)
Y = matrix(rnorm(n*3, 0, 1), n, 3)

covar = cbind(as.numeric(throat.meta$Sex == "Male"), as.numeric(throat.meta$PackYears))
MMiRKAT(Y = Y, X = covar, Ks = Ks)

Microbiome Regression-based Kernel Association Test

Description

Test for association between microbiome composition and a continuous or dichotomous outcome by incorporating phylogenetic or nonphylogenetic distance between different microbiomes.

Usage

MiRKAT(
  y,
  X = NULL,
  Ks,
  out_type = "C",
  method = "davies",
  omnibus = "permutation",
  nperm = 999,
  returnKRV = FALSE,
  returnR2 = FALSE
)

Arguments

y

A numeric vector of the a continuous or dichotomous outcome variable.

X

A numeric matrix or data frame, containing additional covariates that you want to adjust for. If NULL, a intercept only model is used. Defaults to NULL.

Ks

A list of n by n kernel matrices or a single n by n kernel matrix, where n is the sample size. It can be constructed from microbiome data through distance metric or other approaches, such as linear kernels or Gaussian kernels.

out_type

An indicator of the outcome type ("C" for continuous, "D" for dichotomous).

method

Method used to compute the kernel specific p-value. "davies" represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq. We adopt an exact variance component tests because most of the studies concerning microbiome compositions have modest sample size. "moment" represents an approximation method that matches the first two moments. "permutation" represents a permutation approach for p-value calculation. Defaults to "davies".

omnibus

A string equal to either "cauchy" or "permutation" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or residual permutation to generate the omnibus p-value.

nperm

The number of permutations if method = "permutation" or when multiple kernels are considered. If method = "davies" or "moment", nperm is ignored. Defaults to 999.

returnKRV

A logical indicating whether to return the KRV statistic (a measure of effect size). Defaults to FALSE.

returnR2

A logical indicating whether to return R-squared. Defaults to FALSE.

Details

y and X (if not NULL) should all be numeric matrices or vectors with the same number of rows.

Ks should be a list of n by n matrices or a single matrix. If you have distance metric(s) from metagenomic data, each kernel can be constructed through function D2K. Each kernel can also be constructed through other mathematical approaches.

Missing data is not permitted. Please remove all individuals with missing y, X, Ks prior to analysis

Parameter "method" only concerns with how kernel specific p-values are generated. When Ks is a list of multiple kernels, omnibus p-value is computed through permutation from each individual p-values, which are calculated through method of choice.

Value

Returns a list containing the following elements:

p_values

P-value for each candidate kernel matrix

omnibus_p

Omnibus p-value considering multiple candidate kernel matrices

KRV

Kernel RV statistic (a measure of effect size). Only returned if returnKRV = TRUE.

R2

R-squared. Only returned if returnR2 = TRUE.

Author(s)

Ni Zhao

References

Zhao, N., Chen, J.,Carroll, I. M., Ringel-Kulka, T., Epstein, M.P., Zhou, H., Zhou, J. J., Ringel, Y., Li, H. and Wu, M.C. (2015)). Microbiome Regression-based Kernel Association Test (MiRKAT). American Journal of Human Genetics, 96(5):797-807

Chen, J., Chen, W., Zhao, N., Wu, M~C.and Schaid, D~J. (2016) Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies. 40: 5-19. doi: 10.1002/gepi.21934

Davies R.B. (1980) Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables, Journal of the Royal Statistical Society. Series C , 29, 323-333.

Satterthwaite, F. (1946). An approximate distribution of estimates of variance components. Biom. Bull. 2, 110-114.

Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA; NHLBI GO Exome Sequencing Project-ESP Lung Project Team, Christiani DC, Wurfel MM, Lin X. (2012) Optimal unified approach for rare variant association testing with application to small sample case-control whole-exome sequencing studies. American Journal of Human Genetics, 91, 224-237.

Zhou, J. J. and Zhou, H.(2015) Powerful Exact Variance Component Tests for the Small Sample Next Generation Sequencing Studies (eVCTest), in submission.

Examples

library(GUniFrac)

data(throat.tree)
data(throat.otu.tab)
data(throat.meta)

unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs
if (requireNamespace("vegan")) {
 library(vegan)
 BC= as.matrix(vegdist(throat.otu.tab, method="bray"))
 Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC)
} else {
 Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"])
}
Ks = lapply(Ds, FUN = function(d) D2K(d))

covar = cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male"))

# Continuous phenotype
n = nrow(throat.meta)
y = rnorm(n)
MiRKAT(y, X = covar, Ks = Ks, out_type="C", method = "davies")

# Binary phenotype 
y = as.numeric(runif(n) < 0.5)
MiRKAT(y, X = covar, Ks = Ks, out_type="D")

Robust MiRKAT (quantile regression)

Description

A more robust version of MiRKAT utlizing a linear model that uses quantile regression.

Usage

MiRKAT.Q(y, X, Ks, omnibus = "kernel_om", returnKRV = FALSE, returnR2 = FALSE)

Arguments

y

A numeric vector of the a continuous or dichotomous outcome variable.

X

A numerical matrix or data frame, containing additional covariates that you want to adjust for. Mustn't be NULL.

Ks

A list of n by n kernel matrices (or a single n by n kernel matrix), where n is the sample size. If you have distance metric from metagenomic data, each kernel can be constructed through function D2K. Each kernel can also be constructed through other mathematical approaches, such as linear or Gaussian kernels.

omnibus

A string equal to either "Cauchy" or "kernel_om" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or an omnibus kernel to generate the omnibus p-value.

returnKRV

A logical indicating whether to return the KRV statistic. Defaults to FALSE.

returnR2

A logical indicating whether to return the R-squared coefficient. Defaults to FALSE.

Details

MiRKAT.Q creates a kernel matrix using the linear model created with the function rq, a quantile regression function, then does the KRV analysis on Ks and the newly formed kernel matrix representing the outome traits.

Missing data is not permitted. Please remove all individuals with missing y, X, Ks prior to analysis

Value

Returns p-values for each individual kernel matrix, an omnibus p-value if multiple kernels were provided, and measures of effect size KRV and R2.

p_values

labeled individual p-values for each kernel

omnibus_p

omnibus p_value, calculated as for the KRV test

KRV

A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE

R2

A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE

Author(s)

Weija Fu

Examples

    
library(GUniFrac)
library(quantreg)

# Generate data
data(throat.tree)
data(throat.otu.tab)
data(throat.meta)

unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs
if (requireNamespace("vegan")) {
 library(vegan)
 BC= as.matrix(vegdist(throat.otu.tab, method="bray"))
 Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC)
} else {
 Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"])
}
Ks = lapply(Ds, FUN = function(d) D2K(d))

covar = scale(cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male")))

# Continuous phenotype
n = nrow(throat.meta)
y = rchisq(n, 2) + apply(covar, 1, sum) 
MiRKAT.Q(y, X = covar, Ks = Ks)

Robust MiRKAT (robust regression)

Description

A more robust version of MiRKAT utilizing a linear model by robust regression using an M estimator.

Usage

MiRKAT.R(y, X, Ks, omnibus = "kernel_om", returnKRV = FALSE, returnR2 = FALSE)

Arguments

y

A numeric vector of the a continuous or dichotomous outcome variable.

X

A numerical matrix or data frame, containing additional covariates that you want to adjust for Mustn't be NULL

Ks

list of n by n kernel matrices (or a single n by n kernel matrix), where n is the sample size. It can be constructed from microbiome data through distance metric or other approaches, such as linear kernels or Gaussian kernels.

omnibus

A string equal to either "Cauchy" or "kernel_om" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or an omnibus kernel to generate the omnibus p-value.

returnKRV

A logical indicating whether to return the KRV statistic. Defaults to FALSE.

returnR2

A logical indicating whether to return the R-squared coefficient. Defaults to FALSE.

Details

MiRKAT.R creates a kernel matrix using the linear model created with the function rlm, a robust regression function, then does the KRV analysis on Ks and the newly formed kernel matrix representing the outome traits.

y and X should all be numerical matrices or vectors with the same number of rows, and mustn't be NULL.

Ks should be a list of n by n matrices or a single matrix. If you have distance metric from metagenomic data, each kernel can be constructed through function D2K. Each kernel may also be constructed through other mathematical approaches.

Missing data is not permitted. Please remove all individuals with missing y, X, Ks prior to analysis

Value

Returns p-values for each individual kernel matrix, an omnibus p-value if multiple kernels were provided, and measures of effect size KRV and R2.

p_values

labeled individual p-values for each kernel

omnibus_p

omnibus p_value, calculated as for the KRV test

KRV

A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE

R2

A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE

Author(s)

Weijia Fu

Examples

    


# Generate data
library(GUniFrac)

data(throat.tree)
data(throat.otu.tab)
data(throat.meta)

unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs
if (requireNamespace("vegan")) {
 library(vegan)
 BC= as.matrix(vegdist(throat.otu.tab, method="bray"))
 Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC)
} else {
 Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"])
}
Ks = lapply(Ds, FUN = function(d) D2K(d))

covar = cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male"))

# Continuous phenotype
n = nrow(throat.meta)
y = rchisq(n, 2)
MiRKAT.R(y, X = covar, Ks = Ks)

MiRKAT-iQ

Description

Integrated quantile regression-based kernel association test.

Usage

MiRKAT.iQ(Y, X, K, weight = c(0.25, 0.25, 0.25, 0.25))

Arguments

Y

A numeric vector of the continuous outcome variable.

X

A numeric matrix for additional covariates that you want to adjust for.

K

A list of n by n kernel matrices at a single n by n kernel matrix, where n is the sample size.

weight

A length 4 vector specifying the weight for Cauchy combination, corresponding to wilcoxon/normal/inverselehmann/lehmann functions. The sum of the weight should be 1.

Value

Returns a list containing the p values for single kernels, or the omnibus p-value if multiple candidate kernel matrices are provided.

Author(s)

Tianying Wang, Xiang Zhan.

References

Wang T, et al. (2021) Testing microbiome association using integrated quantile regression models. Bioinformatics (to appear).

Examples


library(GUniFrac)
library(quantreg)
library(PearsonDS)
library(MiRKAT)

data(throat.tree)
data(throat.otu.tab)

## Create UniFrac and Bray-Curtis distance matrices 
unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs
if (requireNamespace("vegan")) {
 library(vegan)
 BC= as.matrix(vegdist(throat.otu.tab, method="bray"))
 Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC)
} else {
 Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"])
}

## Convert to kernels           
Ks = lapply(Ds, FUN = function(d) D2K(d))
covar = cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male"))
n = nrow(throat.meta)
y = rnorm(n)
result = MiRKAT.iQ(y, X = covar, K = Ks)

Microiome Regression-based Kernel Association Test for Survival

Description

Community level test for association between microbiome composition and survival outcomes (right-censored time-to-event data) using kernel matrices to compare similarity between microbiome profiles with similarity in survival times.

Usage

MiRKATS(
  obstime,
  delta,
  X = NULL,
  Ks,
  beta = NULL,
  perm = FALSE,
  omnibus = "permutation",
  nperm = 999,
  returnKRV = FALSE,
  returnR2 = FALSE
)

Arguments

obstime

A numeric vector of follow-up (survival/censoring) times.

delta

Event indicator: a vector of 0/1, where 1 indicates that the event was observed for a subject (so "obstime" is survival time), and 0 indicates that the subject was censored.

X

A vector or matrix of numeric covariates, if applicable (default = NULL).

Ks

A list of or a single numeric n by n kernel matrices or matrix (where n is the sample size).

beta

A vector of coefficients associated with covariates. If beta is NULL and covariates are present, coxph is used to calculate coefficients (default = NULL).

perm

Logical, indicating whether permutation should be used instead of analytic p-value calculation (default=FALSE). Not recommended for sample sizes of 100 or more.

omnibus

nperm

Integer, number of permutations used to calculate p-value if perm==TRUE (default=1000) and to calculate omnibus p-value if omnibus=="permutation".

returnKRV

A logical indicating whether to return the KRV statistic. Defaults to FALSE.

returnR2

A logical indicating whether to return the R-squared coefficient. Defaults to FALSE.

Details

obstime, delta, and X should all have n rows, and the kernel or distance matrix should be a single n by n matrix. If a distance matrix is entered (distance=TRUE), a kernel matrix will be constructed from the distance matrix.

Update in v1.1.0: MiRKATS also utilizes the OMiRKATS omnibus test if more than one kernel matrix is provided by the user. The OMiRKATS omnibus test calculates an overall p-value for the test via permutation.

Missing data is not permitted. Please remove individuals with missing data on y, X or in the kernel or distance matrix prior to using the function.

The Efron approximation is used for tied survival times.

Value

Return value depends on the number of kernel matrices inputted. If more than one kernel matrix is given, MiRKATS returns two items; a vector of the labeled individual p-values for each kernel matrix, as well as an omnibus p-value from the Optimal-MiRKATS omnibus test. If only one kernel matrix is given, then only its p-value will be given, as no omnibus test will be needed.

p_values

individual p-values for each inputted kernel matrix

omnibus_p

overall omnibus p-value

KRV

A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE

R2

A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE

Author(s)

Nehemiah Wilson, Anna Plantinga

References

Plantinga, A., Zhan, X., Zhao, N., Chen, J., Jenq, R., and Wu, M.C. MiRKAT-S: a distance-based test of association between microbiome composition and survival times. Microbiome, 2017:5-17. doi: 10.1186/s40168-017-0239-9

Chen, J., Chen, W., Zhao, N., Wu, M~C.and Schaid, D~J. (2016) Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies. 40:5-19. doi: 10.1002/gepi.21934

Efron, B. (1977) "The efficiency of Cox's likelihood function for censored data." Journal of the American statistical Association 72(359):557-565.

Davies R.B. (1980) Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables, Journal of the Royal Statistical Society Series C, 29:323-333

Examples


###################################
# Generate data
library(GUniFrac)


# Throat microbiome data
data(throat.tree)
data(throat.otu.tab)

unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs
if (requireNamespace("vegan")) {
 library(vegan)
 BC= as.matrix(vegdist(throat.otu.tab, method="bray"))
 Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC)
} else {
 Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"])
}

Ks = lapply(Ds, FUN = function(d) D2K(d))

# Covariates and outcomes
covar <- matrix(rnorm(120), nrow=60)
S <- rexp(60, 3)   # survival time 
C <- rexp(60, 1)   # censoring time 
D <- (S<=C)        # event indicator
U <- pmin(S, C)    # observed follow-up time

MiRKATS(obstime = U, delta = D, X = covar, Ks = Ks, beta = NULL)

Microbiome Regression-Based Kernel Association Test for binary outcomes

Description

Called by MiRKAT if the outcome variable is dichotomous (out_type="D")

This function is called by the exported function MiRKAT if the argument "out_type" of MiRKAT is equal to "D" (for dichotomous).

Each argument of MiRKAT_continuous is given the value of the corresponding argument given by the user to MiRKAT.

Function not exported

Usage

MiRKAT_binary(
  y,
  X = NULL,
  Ks,
  method = "davies",
  family = "binomial",
  omnibus = "permutation",
  nperm = 999,
  returnKRV = FALSE,
  returnR2 = FALSE
)

Arguments

y

A numeric vector of the dichotomous outcome variable

X

A numerical matrix or data frame, containing additional covariates that you want to adjust for (Default = NULL). If it is NULL, a intercept only model was fit.

Ks

A list of n by n kernel matrices (or a single n by n kernel matrix), where n is the sample size. It can be constructed from microbiome data through distance metric or other approaches, such as linear kernels or Gaussian kernels.

method

A string telling R which method to use to compute the kernel specific p-value (default = "davies"). "davies" represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq. We adopt an exact variance component tests because most of the studies concerning microbiome compositions have modest sample size. "moment" represents an approximation method that matches the first two moments. "permutation" represents a permutation approach for p-value calculation.

family

A string describing the error distribution and link function to be used in the linear model.

omnibus

nperm

the number of permutations if method = "permutation" or when multiple kernels are considered. if method = "davies" or "moment", nperm is ignored.

returnKRV

A logical indicating whether to return the KRV statistic. Defaults to FALSE.

returnR2

A logical indicating whether to return the R-squared coefficient. Defaults to FALSE.

Value

p_values

p-value for each candidate kernel matrix

omnibus_p

omnibus p-value if multiple kernel matrices are considered

KRV

A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE

R2

A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE

Author(s)

Ni Zhao

References

Chen, J., Chen, W., Zhao, N., Wu, M~C.and Schaid, D~J. (2016) Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies. 40: 5-19. doi: 10.1002/gepi.21934

Davies R.B. (1980) Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables, Journal of the Royal Statistical Society. Series C , 29, 323-333.

Satterthwaite, F. (1946). An approximate distribution of estimates of variance components. Biom. Bull. 2, 110-114.

Zhou, J. J. and Zhou, H.(2015) Powerful Exact Variance Component Tests for the Small Sample Next Generation Sequencing Studies (eVCTest), in submission.

Microbiome Regression-based Analysis Test for a continuous outcome variable

Description

Inner function for MiRKAT; computes MiRKAT for continuous outcomes. Called by MiRKAT if out_type="C"

Usage

MiRKAT_continuous(
  y,
  X = NULL,
  Ks,
  method,
  omnibus,
  nperm = 999,
  returnKRV = FALSE,
  returnR2 = FALSE
)

Arguments

y

A numeric vector of the continuous outcome variable

X

A numeric matrix or data frame containing additional covariates (default = NULL). If NULL, an intercept only model is used.

Ks

method

A method to compute the kernel specific p-value (Default= "davies"). "davies" represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq. We adopt an exact variance component tests because most of the studies concerning microbiome compositions have modest sample size. "moment" represents an approximation method that matches the first two moments. "permutation" represents a permutation approach for p-value calculation.

omnibus

nperm

the number of permutations if method = "permutation" or when multiple kernels are considered. If method = "davies" or "moment", nperm is ignored. Defaults to 999.

returnKRV

A logical indicating whether to return the KRV statistic. Defaults to FALSE.

returnR2

A logical indicating whether to return the R-squared coefficient. Defaults to FALSE.

Details

This function is called by the exported function "MiRKAT" when the argument of MiRKAT, out_type, is set equal to "C".

Each argument of MiRKAT_continuous is given the value of the corresponding argument given by the user to MiRKAT.

Function not exported

Value

p_values

p-value for each candidate kernel matrix

omnibus_p

omnibus p-value considering multiple candidate kernel matrices

KRV

A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE

R2

A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE

Author(s)

Ni Zhao

References

Chen, J., Chen, W., Zhao, N., Wu, M~C.and Schaid, D~J. (2016) Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies. 40: 5-19. doi: 10.1002/gepi.21934

Davies R.B. (1980) Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables, Journal of the Royal Statistical Society. Series C , 29, 323-333.

Satterthwaite, F. (1946). An approximate distribution of estimates of variance components. Biom. Bull. 2, 110-114.

Zhou, J. J. and Zhou, H.(2015) Powerful Exact Variance Component Tests for the Small Sample Next Generation Sequencing Studies (eVCTest), in submission.

Simulated DEPENDENT data with BINOMIAL traits for correlated regression-based analysis (i.e. CSKAT, GLMMMiRKAT)

Description

Simulated DEPENDENT data with BINOMIAL traits for correlated regression-based analysis (i.e. CSKAT, GLMMMiRKAT)

Usage

data(bindata)

Format

A list containing three data objects for correlated microbiome data with binary response variable (described below).

bin.otu.tab: Simulated OTU data for correlated regression-based analysis; 59 rows and 730 columns, rows being patients and columns being OTUs
bin.meta: Simulated metadata for correlated regression-based analysis; 59 rows and 4 columns, rows being patients and columns being the outcome variable, subject identifier, and covariates to possibly account for in any regression modeling
bin.tree: Simulated rooted phylogenetic tree with 730 tips and 729 nodes

Inner Function for CSKAT, Correlated Sequence Kernel Association Test

Description

Small-sample SKAT for correlated (continuous) data ('c' stands for 'correlated'). Computes the adjusted score statistic and p-value.

Usage

inner.CSKAT(lmer.obj, K)

Arguments

lmer.obj

A fitted lme4 object (model under H0)

K

the kernel matrix, which quantifies the similarities between samples

Value

p.value: association p-value
Q.adj: adjusted score statistic

References

Zhan X, et al. (2018) A small-sample kernel association test for correlated data with application to microbiome association studies. Genet Epidemiol., submitted.

Kernel RV Coefficient Test; Inner Function

Description

Function called when user calls function KRV. For each kernel matrix inputted into KRV, KRV runs inner.KRV on that kernel with the inputted kernel.y outcome matrix.

Usage

inner.KRV(
  y = NULL,
  X = NULL,
  adjust.type,
  kernel.otu,
  kernel.y,
  returnKRV = FALSE,
  returnR2 = FALSE
)

Arguments

y

A numeric n by p matrix of p continuous phenotype variables and sample size n (default = NULL). If it is NULL, a phenotype kernel matrix must be entered for "kernel.y". Defaults to NULL.

X

adjust.type

kernel.otu

A numeric OTU n by n kernel matrix or a list of matrices, where n is the sample size. It can be constructed from microbiome data, such as by transforming from a distance metric.

kernel.y

returnKRV

A logical indicating whether to return the KRV statistic. Defaults to FALSE.

returnR2

A logical indicating whether to return the R-squared coefficient. Defaults to FALSE.

Details

y and X (if not NULL) should all be numerical matrices or vectors with the same number of rows.

Ks should be a list of n by n matrices or a single matrix. If you have distance metric from metagenomic data, each kernel can be constructed through function D2K. Each kernel can also be constructed through other mathematical approaches.

Missing data is not permitted. Please remove all individuals with missing y, X, Ks prior to analysis

Parameter "method" only concerns how kernel specific p-values are generated. When Ks is a list of multiple kernels, omnibus p-value is computed through permutation from each individual p-value, which are calculated through method of choice.

Value

Returns a p-value for the candidate kernel matrix

pv

p-value for the candidate kernel matrix

KRV

KRV statistic for the candidate kernel matrix. Only returned if returnKRV = TRUE.

R2

R-squared for the candidate kernel matrix. Only returned if returnR2 = TRUE.

Author(s)

Haotian Zheng, Xiang Zhan, Ni Zhao

References

Zhan, X., Plantinga, A., Zhao, N., and Wu, M.C. A Fast Small-Sample Kernel Independence Test for Microbiome Community-Level Association Analysis. Biometrics. 2017 Mar 10. doi: 10.1111/biom.12684.

Simulated DEPENDENT data with GAUSSIAN traits for correlated regression-based analysis (i.e. CSKAT, GLMMMiRKAT)

Description

Simulated DEPENDENT data with GAUSSIAN traits for correlated regression-based analysis (i.e. CSKAT, GLMMMiRKAT)

Usage

data(nordata)

Format

A list containing three data objects for correlated microbiome data with continuous response variable (described below).

nor.otu.tab: Simulated OTU data for correlated regression-based analysis; 59 rows and 730 columns, rows being patients and columns being OTUs
nor.meta: Simulated metadata for correlated regression-based analysis; 59 rows and 4 columns, rows being patients and columns being the outcome variable, subject identifier, and covariates to possibly account for in any regression modeling
nor.tree: Simulated rooted phylogenetic tree with 730 tips and 729 nodes

Simulated DEPENDENT data with POISSON (count) traits for correlated regression-based analysis (i.e. CSKAT, GLMMMiRKAT)

Description

Simulated DEPENDENT data with POISSON (count) traits for correlated regression-based analysis (i.e. CSKAT, GLMMMiRKAT)

Usage

data(poisdata)

Format

A list containing three data objects for correlated microbiome data with binary response variable (described below).

pois.otu.tab: Simulated OTU data for correlated regression-based analysis; 59 rows and 730 columns, rows being patients and columns being OTUs
pois.meta: Simulated metadata for correlated regression-based analysis; 59 rows and 4 columns, rows being patients and columns being the outcome variable, subject identifier, and covariates to possibly account for in any regression modeling
pois.tree: Simulated rooted phylogenetic tree with 730 tips and 729 nodes

Simulated metadata for microbiome regression-based analysis

Description

Simulation code can be seen in ?KRV Corresponding OTU matrix is stored in "throat.otu.tab"

Usage

data(throat.meta)

Format

A data frame with 59 rows and 16 columns, rows being participants and columns being different covariates to possibly be accounted for in any utilized linear models.

throat.meta is part of a microbiome data set for studying the effect of smoking on the upper respiratory tract microbiome. This data set comes from the throat microbiome of left body side. It contains 60 subjects consisting of 32 nonsmokers and 28 smokers.

BarcodeSequence: Sequence of DNA that allows for the identification of the specific species of bacteria. See GUniFrac for more details
LinkerPrimerSequence: Sequence of DNA that aids in locating the Barcode Sequence. See GUniFrac for more details
SmokingStatus: whether or not each patient is a "Smoker" or a "NonSmoker"
PatientID: Identifying integer label given to each patient
SampleIndex: Labels each patient as being from this particular sample, so as possibly be able to use multiple samples at once
AirwaySite: Part of body where our samples were taken from from in each participant
SideOfBody: Which side of the body the samples were taken from
SampleType: What kind of sample each one is; should all be patientsamples
RespiratoryDiseaseStatus_severity_timeframe: Whether or not the patient has had a respiratory disease, and if so which one_severity of said disease_whether or not that disease is still active. If there has been no such disease in the patient's medical history, the patient's value is "healthy" in this column
AntibioticUsePast3Months_TimeFromAntibioticUsage: Whether or not the patient has used antibiotics in the past month_if so, how long ago it was. If not antibiotics have been used in the past month, the patient's value is "None" in this column
Age: Age of the patient
Sex: The sex of the patient
PackYears: Unit of measurement measuring the intensity of smoking; average number of packs per day times the number of years the patient has been smoking. If patient has never smoked, their value is 0 for this column
TimeFromLastCig: Minutes since the patient's last cigarette
TimeFromLastMeal: Minutes since the patient's last meal
Description: See Charleston paper and other sources

Simulated OTU data for microbiome regression analysis

Description

Simulated code can be seen in ?KRV Corresponding metadata is stored in "throat.meta"

Usage

data(throat.otu.tab)

Format

60 rows and 856 columns, where rows are patients and columns are OTUs

Simulated rooted phylogenetic tree

Description

Simulation code can be seen in ?KRV

Usage

data(throat.tree)

Format

Phylogenetic tree with 856 tips and 855 internal nodes

Details

Corresponding OTU matrix stored in "throat.otu.tab" See the GUniFrac package for more details