Type: Package
Title: A Survival Tree Based on Stabilized Score Tests for High-dimensional Covariates
Version: 1.5
Author: Takeshi Emura and Wei-Chern Hsu
Maintainer: Takeshi Emura <takeshiemura@gmail.com>
Description: A classification (decision) tree is constructed from survival data with high-dimensional covariates. The method is a robust version of the logrank tree, where the variance is stabilized. The main function "uni.tree" returns a classification tree for a given survival dataset. The inner nodes (splitting criterion) are selected by minimizing the P-value of the two-sample the score tests. The decision of declaring terminal nodes (stopping criterion) is the P-value threshold given by an argument (specified by user). This tree construction algorithm is proposed by Emura et al. (2021, in review).
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.0
Depends: survival,compound.Cox
NeedsCompilation: no
Packaged: 2021-03-22 05:50:46 UTC; biouser
Repository: CRAN
Date/Publication: 2021-03-22 06:40:02 UTC

Kaplan-Meier estimator of binary splitting

Description

Given a cut-off-point and selected covariate, return the survival curve for binary classification and the P-value of two sample log-rank test.

Usage

KM.split(t.vec, d.vec, X.mat, x.name, cutoff)

Arguments

t.vec

:Vector of survival times (time to either death or censoring)

d.vec

:Vector of censoring indicators (1=death, 0=censoring)

X.mat

:n by p matrix of covariates, where n is the sample size and p is the number of covariates

x.name

:the name of covariate

cutoff

:cut-off-point

Value

P-value of two sample logrank test and a plot of two KM estimates

Examples

data(Lung,package="compound.Cox")
train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data
t.vec=train_Lung[,1]
d.vec=train_Lung[,2]
x.mat=train_Lung[,-c(1,2,3)]
KM.split(t.vec,d.vec,x.mat,x.name="ANXA5",cutoff=1)

Generate a matrix of gene expressions (discrete version of X.pathway() against to Emura (2012)) in the presence of gene pathways

Description

Generate a matrix of gene expressions in the presence of gene pathways, we first produce the matrix by X.pathway(Emura et al. 2012), then we change each value to 1 ~ 4 depend on the quantile.

Usage

X.pathway_discrete.balanced(n, p, q1, q2, rho1 = 0.5, rho2 = 0.5)

Arguments

n

:the number of individuals (sample size)

p

:the number of genes

q1

:the number of genes in the first pathway

q2

:the number of genes in the second pathway

rho1

:the correlation coefficient for the first pathway

rho2

:the correlation coefficient for the second pathway

Value

X n by p matrix of gene expressions

References

Emura T, Chen YH, Chen HY (2012). Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models. PLoS ONE 7(10): e47627. doi:10.1371/journal.pone.0047627

Examples

## generate 6 gene expressions from 10 individuals
X.pathway_discrete.balanced(n=10,p=6,q1=2,q2=2,rho1=0.5,rho2=0.5)

Generate a matrix of unbalance gene expressions (discrete version of X.pathway() against to Emura (2012)) in the presence of gene pathways

Description

Generate a matrix of gene expressions in the presence of gene pathways, we first produce the matrix by X.pathway(Emura et al. 2012), then we change each value to 1 ~ 3 depend on the quantile and randomly replace a element to 4 in the last p-(q1+q2) column for each row.

Usage

X.pathway_discrete.imbalanced(n, p, q1, q2, rho1 = 0.5, rho2 = 0.5)

Arguments

n

:the number of individuals (sample size)

p

:the number of genes

q1

:the number of genes in the first pathway

q2

:the number of genes in the second pathway

rho1

:the correlation coefficient for the first pathway

rho2

:the correlation coefficient for the second pathway

Value

X n by p matrix of gene expressions

References

Emura T, Chen YH, Chen HY (2012). Survival Prediction Based on Compound Covariate under Cox Proportional Hazard Models. PLoS ONE 7(10): e47627. doi:10.1371/journal.pone.0047627

Examples

## generate 6 gene expressions from 10 individuals
X.pathway_discrete.imbalanced(n=10,p=6,q1=2,q2=2,rho1=0.5,rho2=0.5)

The names of features that are selected in a tree

Description

The function returns the names of features (covariates) that are selected as the internal nodes of a tree. Only the names of the covariates are shown by excluding the cutt-off values.

Usage

feature.selected(tree)

Arguments

tree

:an object made from the "uni.tree" function

Details

The outputs show important features for predicting survival outcomes.

Value

An array of characters that are the names from those covariates selected in the tree

Examples

data(Lung,package="compound.Cox")
train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data
t.vec=train_Lung[,1]
d.vec=train_Lung[,2]
x.mat=train_Lung[,-c(1,2,3)]
res=uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE)
feature.selected(res)

The risk ranks of the samples predicted by a tree

Description

The function returns the ranks (1=the lowest risk, 2=the 2nd lowest risk, ..., k=the highest risk) predicted for the samples.

Usage

risk.classification(tree, X.mat)

Arguments

tree

:an object made from the "uni.tree" function

X.mat

:n by p matrix of covariates from the samples, where n is the sample size and p is the number of covariates

Details

If the tree has k terminal nodes, then the response 1 respresents the lowest risk and k represents the highest risk.

Value

A vector of integers, 1, 2, ..., k, that represent the ranks predicted for the samples.

Examples

data(Lung,package="compound.Cox")
train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data
t.vec=train_Lung[,1]
d.vec=train_Lung[,2]
x.mat=train_Lung[,-c(1,2,3)]
res=uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE)
risk.classification(res,x.mat)

Univariate binary splits by the logrank test

Description

The output is the summary of significance tests for binary splits, where the cut-off values are optimized for each covariate.

Usage

uni.logrank(t.vec, d.vec, X.mat)

Arguments

t.vec

:Vector of survival times (time to either death or censoring)

d.vec

:Vector of censoring indicators (1=death, 0=censoring)

X.mat

:n by p matrix of covariates, where n is the sample size and p is the number of covariates

Details

The output can be used to construct a logrank tree.

Value

A dataframe containing:

Pvalue: the P-value of the two-sample logrank test, where the cut-off value is optimized

cut_off_point: the optimal cutt-off values of the binary splits given a feature

left.sample.size: the sample size of a left child node

right.sample.size: the sample size of a right child node

Examples

data(Lung,package="compound.Cox")
train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data
t.vec=train_Lung[,1]
d.vec=train_Lung[,2]
x.mat=train_Lung[,-c(1,2,3)]
uni.logrank(t.vec,d.vec,x.mat)

A survival tree based on stabilized score tests

Description

This function returns a classification (decision) tree for a given survival dataset. The decision of making inner nodes (splitting criterion) is based on the univariate score tests. The decision of declaring terminal nodes (stopping criterion) is the P-value threshold given by an argument. This tree construction algorithm is proposed by Emura et al. (2021).

Usage

uni.tree(
  t.vec,
  d.vec,
  X.mat,
  P.value = 0.01,
  d0 = 0.01,
  S.plot = FALSE,
  score = TRUE
)

Arguments

t.vec

:Vector of survival times (time to either death or censoring)

d.vec

:Vector of censoring indicators (1=death, 0=censoring)

X.mat

:n by p matrix of covariates (features), where n is the sample size and p is the number of covariates

P.value

:the threshold of P-value for stop splitting (stopping criterion)

d0

:A positive constant to stabilize the variance of score statistics (Witten & Tibshirani 2010)

S.plot

:call for plot the KM estimator for each split

score

:TRUE = score test (Emura et al. 2019); FALSE = log-rank test

Details

In order to stabilize the univariate score tests, a small value "d0" is added to the variance of the score statistics (Witten and Tibshirani 2010). d0=0 corresponds to the logrank test. To perform a large number of the score tests, the "compound.Cox" packages (Emura et al.2019) is applied with d0 as a option.

Value

A nested list describing a classification tree, consisting of inner nodes and terminal node.

References

Emura T, Hsu WC, Chou WC (2021). A survival tree based on stabilized score tests for high-dimensional covariates, in review

Emura T, Matsui S, Chen HY (2019). compound.Cox: Univariate Feature Selection and Compound Covariate for Predicting Survival, Computer Methods and Programs in Biomedicine 168: 21-37.

Witten DM, Tibshirani R (2010) Survival analysis with high-dimensional covariates. Stat Method Med Res 19:29-51

Examples

data(Lung,package="compound.Cox")
train_Lung=Lung[which(Lung[,"train"]==TRUE),] #select training data
t.vec=train_Lung[,1]
d.vec=train_Lung[,2]
x.mat=train_Lung[,-c(1,2,3)]
uni.tree(t.vec,d.vec,x.mat,P.value=0.01,d0=0.01,S.plot=FALSE,score=TRUE)