Type: | Package |
Title: | Clustering Analysis Using Survival Tree and Forest Algorithms |
Version: | 1.1.1 |
Date: | 2024-05-15 |
Maintainer: | Lu You <lu.you@epi.usf.edu> |
Description: | An outcome-guided algorithm is developed to identify clusters of samples with similar characteristics and survival rate. The algorithm first builds a random forest and then defines distances between samples based on the fitted random forest. Given the distances, we can apply hierarchical clustering algorithms to define clusters. Details about this method is described in https://github.com/luyouepiusf/SurvivalClusteringTree. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Suggests: | knitr, rmarkdown, tinytest |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
Imports: | Rcpp, survival, dplyr, grid, gridtext, formula.tools |
LinkingTo: | Rcpp, RcppArmadillo |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2024-05-24 02:58:58 UTC; luyou |
Author: | Lu You [aut, cre] (Created the package. Maintains the package.), Lauric Ferrat [aut] (Added functionality. Revised the package. Wrote the vignette.), Hemang Parikh [aut] (Checked and revised the package.), Yanan Huo [aut] (Revised plotting functions of the package.), Yuting Yang [aut] (Added some data frame features.), Jeffrey Krischer [ctb] (Supervisor the medical research. Coauthor of the medical manuscript.), Maria Redondo [ctb] (Principal investigators of the medical research. Coauthor of the medical manuscript.), Richard Oram [ctb] (Coauthor of the medical manuscript.), Andrea Steck [ctb] (Coauthor of the medical manuscript.) |
Repository: | CRAN |
Date/Publication: | 2024-05-24 21:10:25 UTC |
Clustering Analysis Using Survival Tree and Forest Algorithms
Description
An outcome-guided algorithm is developed to identify clusters of samples with similar characteristics and survival rate. The algorithm first builds a random forest and then defines distances between samples based on the fitted random forest. Given the distances, we can apply hierarchical clustering algorithms to define clusters. Details about this method is described in <https://github.com/luyouepiusf/SurvivalClusteringTree>.
Package Content
Index of help topics:
SurvivalClusteringTree-package Clustering Analysis Using Survival Tree and Forest Algorithms plot_survival_tree Visualize the Fitted Survival Tree predict_distance_forest Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as a Dataframe) predict_distance_forest_matrix Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as Matrices) predict_distance_tree Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as a Dataframe) predict_distance_tree_matrix Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as Matrices) predict_weights Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as a Dataframe) predict_weights_matrix Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as Matrices) survival_forest Build a Survival Forest (Data Supplied as a Dataframe) survival_forest_matrix Build a Survival Forest (Data Supplied as Matrices) survival_tree Build a Survival Tree (Data Supplied as a Dataframe) survival_tree_matrix Build a Survival Tree (Data Supplied as Matrices)
Maintainer
Lu You <lu.you@epi.usf.edu>
Author(s)
NA
Visualize the Fitted Survival Tree
Description
Visualize the Fitted Survival Tree
Usage
plot_survival_tree(survival_tree, cex = 0.75)
Arguments
survival_tree |
a fitted survival tree object. |
cex |
numeric character expansion factor. |
Value
No return value, called for generating graphical outputs.
Examples
library(survival)
a_survival_tree<-
survival_tree(
survival_outcome=Surv(time,status==2)~1,
numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal,
factor_predictor=~as.factor(sex),
data=lung)
plot_survival_tree(a_survival_tree)
Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as a Dataframe)
Description
The function
predict_distance_forest
predicts distances between samples based on a survival forest fit.
Usage
predict_distance_forest(
survival_forest,
numeric_predictor,
factor_predictor,
data,
missing = "omit"
)
Arguments
survival_forest |
a fitted survival forest |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Details
Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as a Dataframe)
Value
A list.
mean_distance
is the mean distance matrix.
sum_distance
is the matrix that sums the distances between samples.
sum_non_na
is the matrix of the number of non NA distances being averaged.
Examples
library(survival)
a_survival_forest<-
survival_forest(
survival_outcome=Surv(time,status==2)~1,
numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal,
factor_predictor=~as.factor(sex),
data=lung,nboot=20)
a_distance<-
predict_distance_forest(
a_survival_forest,
numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal,
factor_predictor=~as.factor(sex),
data=lung)
Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as Matrices)
Description
The function
predict_distance_forest_matrix
predicts distances between samples based on a survival forest fit.
Usage
predict_distance_forest_matrix(
survival_forest,
matrix_numeric,
matrix_factor,
missing = "omit"
)
Arguments
survival_forest |
a fitted survival forest |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Details
Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as Matrices) (Works for raw matrices)
Value
A list.
mean_distance
is the mean distance matrix.
sum_distance
is the matrix that sums the distances between samples.
sum_non_na
is the matrix of the number of non NA distances being averaged.
Examples
library(survival)
a_survival_forest<-
survival_forest_matrix(
time=lung$time,
event=lung$status==2,
matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]),
matrix_factor=data.matrix(lung[,5,drop=F]),
nboot=20)
a_distance<-
predict_distance_forest_matrix(
a_survival_forest,
matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]),
matrix_factor=data.matrix(lung[,5,drop=F]))
Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as a Dataframe)
Description
The function
predict_distance_tree
predicts distances between samples based on a survival tree fit.
Usage
predict_distance_tree(
survival_tree,
numeric_predictor,
factor_predictor,
data,
missing = "omit"
)
Arguments
survival_tree |
a fitted survival tree |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Details
Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as a Dataframe)
Value
A list.
node_distance
gives the distance matrix between nodes.
ind_distance
gives the distance matrix between samples.
ind_weights
gives the weights of samples in each node.
Examples
library(survival)
a_survival_tree<-
survival_tree(
survival_outcome=Surv(time,status==2)~1,
numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal,
factor_predictor=~as.factor(sex),
data=lung)
a_distance<-
predict_distance_tree(
a_survival_tree,
numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal,
factor_predictor=~as.factor(sex),
data=lung)
Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as Matrices)
Description
The function
predict_distance_tree_matrix
predicts distances between samples based on a survival tree fit.
Usage
predict_distance_tree_matrix(
survival_tree,
matrix_numeric,
matrix_factor,
missing = "omit"
)
Arguments
survival_tree |
a fitted survival tree |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Details
Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as Matrices) (Works for raw matrices)
Value
A list.
node_distance
gives the distance matrix between nodes.
ind_distance
gives the distance matrix between samples.
ind_weights
gives the weights of samples in each node.
Examples
library(survival)
a_survival_tree<-
survival_tree_matrix(
time=lung$time,
event=lung$status==2,
matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]),
matrix_factor=data.matrix(lung[,5,drop=FALSE]))
a_distance<-
predict_distance_tree_matrix(
a_survival_tree,
matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]),
matrix_factor=data.matrix(lung[,5,drop=FALSE]))
Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as a Dataframe)
Description
The function
predict_weights
predicts weights of samples in terminal nodes based on a survival tree fit.
Usage
predict_weights(
survival_tree,
numeric_predictor,
factor_predictor,
data,
missing = "omit"
)
Arguments
survival_tree |
a fitted survival tree |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Details
Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as a Dataframe)
Value
A weight matrix representing the weights of samples in each node.
Examples
library(survival)
a_survival_tree<-
survival_tree(
survival_outcome=Surv(time,status==2)~1,
numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal,
factor_predictor=~as.factor(sex),
data=lung)
a_weight<-
predict_weights(
a_survival_tree,
numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal,
factor_predictor=~as.factor(sex),
data=lung)
Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as Matrices)
Description
The function
predict_weights_matrix
predicts weights of samples in terminal nodes based on a survival tree fit.
Usage
predict_weights_matrix(
survival_tree,
matrix_numeric,
matrix_factor,
missing = "majority"
)
Arguments
survival_tree |
a fitted survival tree |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Details
Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as Matrices)
Value
A weight matrix representing the weights of samples in each node.
Examples
library(survival)
a_survival_tree<-
survival_tree_matrix(
time=lung$time,
event=lung$status==2,
matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]),
matrix_factor=data.matrix(lung[,5,drop=FALSE]))
a_weight<-
predict_weights_matrix(
a_survival_tree,
matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]),
matrix_factor=data.matrix(lung[,5,drop=FALSE]))
Build a Survival Forest (Data Supplied as a Dataframe)
Description
The function
survival_forest
build a survival forest given the survival outcomes and predictors of numeric and factor variables.
Usage
survival_forest(
survival_outcome,
numeric_predictor,
factor_predictor,
weights = NULL,
data,
significance = 0.05,
min_weights = 50,
missing = "omit",
test_type = "univariate",
cut_type = 0,
nboot = 100,
seed = 0
)
Arguments
survival_outcome |
a |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
weights |
sample weights, a numeric vector.
|
data |
the dataframe that stores the outcome and predictor variables.
Variables in the global environment will be used if |
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
nboot |
an integer value that specifies the number of bootstrap replications. |
seed |
an integer value that specifies the seed. |
Details
Build a Survival Forest (Data Supplied as a Dataframe)
Value
A list containing the information of the survival forest fit.
Examples
library(survival)
a_survival_forest<-
survival_forest(
survival_outcome=Surv(time,status==2)~1,
numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal,
factor_predictor=~as.factor(sex),
data=lung,nboot=20)
Build a Survival Forest (Data Supplied as Matrices)
Description
The function
survival_forest_matrix
build a survival forest given the survival outcomes and predictors of numeric and factor variables.
Usage
survival_forest_matrix(
time,
event,
matrix_numeric,
matrix_factor,
weights = rep(1, length(time)),
significance = 0.05,
min_weights = 50,
missing = "omit",
test_type = "univariate",
cut_type = 0,
nboot = 100,
seed = 0
)
Arguments
time |
survival times, a numeric vector.
|
event |
survival events, a logical vector.
|
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
weights |
sample weights, a numeric vector.
|
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
nboot |
an integer value that specifies the number of bootstrap replications. |
seed |
an integer value that specifies the seed. |
Details
Build a Survival Forest (Data Supplied as Matrices)
Value
A list containing the information of the survival forest fit.
Examples
library(survival)
a_survival_forest<-
survival_forest_matrix(
time=lung$time,
event=lung$status==2,
matrix_numeric=data.matrix(lung[,c(4,6:9),drop=FALSE]),
matrix_factor=data.matrix(lung[,5,drop=FALSE]),
nboot=20)
Build a Survival Tree (Data Supplied as a Dataframe)
Description
The function
survival_tree
build a survival tree given the survival outcomes and predictors of numeric and factor variables.
Usage
survival_tree(
survival_outcome,
numeric_predictor,
factor_predictor,
weights = NULL,
data,
significance = 0.05,
min_weights = 50,
missing = "omit",
test_type = "univariate",
cut_type = 0
)
Arguments
survival_outcome |
a |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
weights |
sample weights, a numeric vector.
|
data |
the dataframe that stores the outcome and predictor variables.
Variables in the global environment will be used if |
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
Details
Build a Survival Tree (Data Supplied as a Dataframe)
Value
A list containing the information of the survival tree fit.
Examples
library(survival)
a_survival_tree<-
survival_tree(
survival_outcome=Surv(time,status==2)~1,
numeric_predictor=~age+ph.ecog+ph.karno+pat.karno+meal.cal,
factor_predictor=~as.factor(sex),
data=lung)
Build a Survival Tree (Data Supplied as Matrices)
Description
The function
survival_tree_matrix
build a survival tree given the survival outcomes and predictors of numeric and factor variables.
Usage
survival_tree_matrix(
time,
event,
matrix_numeric,
matrix_factor,
weights = rep(1, length(time)),
significance = 0.05,
min_weights = 50,
missing = "omit",
test_type = "univariate",
cut_type = 0
)
Arguments
time |
survival times, a numeric vector.
|
event |
survival events, a logical vector.
|
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
weights |
sample weights, a numeric vector.
|
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
cut_type |
an integer value that specifies how to cut between two numeric values.
If |
Details
Build a Survival Tree (Data Supplied as Matrices)
Value
A list containing the information of the survival tree fit.