Type: | Package |
Title: | Nonparametric Estimation of Time-Dependent ROC, Brier Score, and Survival Difference from Right Censored Time-to-Event Data with or without Competing Risks |
Version: | 2.0 |
Maintainer: | Xiaoyang Li <xli35@mdanderson.org> |
Description: | The tdROC package facilitates the estimation of time-dependent ROC (Receiver Operating Characteristic) curves and the Area Under the time-dependent ROC Curve (AUC) in the context of survival data, accommodating scenarios with right censored data and the option to account for competing risks. In addition to the ROC/AUC estimation, the package also estimates time-dependent Brier score and survival difference. Confidence intervals of various estimated quantities can be obtained from bootstrap. The package also offers plotting functions for visualizing time-dependent ROC curves. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R(≥ 3.6.0) |
Imports: | survival (≥ 3.4), graphics (≥ 4.2.2), stats (≥ 4.2.2), Rcpp (≥ 1.0.10), magrittr (≥ 2.0.3) |
LinkingTo: | Rcpp (≥ 1.0.10) |
NeedsCompilation: | yes |
RoxygenNote: | 7.2.3 |
Packaged: | 2023-12-10 07:18:31 UTC; 43770 |
Author: | Xiaoyang Li [aut, cre], Zhe Yin [aut], Liang Li [aut, ths] |
Repository: | CRAN |
Date/Publication: | 2023-12-13 09:50:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling 'rhs(lhs)'.
Calculate Time-dependent ROC and AUC with competing risk
Description
This function calculates the time-dependent sensitivity and specificity and area under the curve (AUC)
using precalculated weights by td.kw.cr
.
Usage
AUC.cr(X, W.prim, W.cmp, cut.off = NULL, n.grid = 1000, method)
Arguments
X |
a numeric vector of risk score ranging from 0 to 1 in the same length as |
W.prim |
a numeric vector of weight for the primary event you want to study. It has the same length as |
W.cmp |
a numeric vector of weight for the competing event. It has the same length as |
cut.off |
a vector of risk score cut-off values at which sensitivity and specificity will be calculated. Default is |
n.grid |
a positive integer, the number of grid points used when calculating the ROC curve. The default is |
method |
Details
This function read in the risk score value X
, estimated conditional probability for primary event W.prim
,
and estimated conditional probability for competing event W.cmp
to calculate sensitivity and specificity
for a series specified grid points. Based on the definition of controls mentioned in Wu and Li, 2018, we separately
calculate specificity and corresponding AUC for each definition. In addition, this function returns both the
AUC estimated by trapezoidal integration and AUC estimated by nonparametric framework mentioned in Wu and Li, 2018.
Value
Returns a list of the following items:
a list of AUC.A.integral
estimated by trapezoidal integration for definition A,
AUC.A.empirical
estimated by nonparametric framework for definition A (Wu and Li, 2018),
AUC.B.integral
estimated by trapezoidal integration for definition B,
AUC.B.empirical
estimated by nonparametric framework for definition B (Wu and Li, 2018),
and a data frame ROC
with dimension (2+n.grid) x 4
with columns cut.off
, sens
, specA
and specB
.
See Also
Calculate the area under a ROC curve (AUC) by trapezoidal integration
Description
This function reads in a vector of sensitivity and a vector of specificity to calculates the area under the curve (AUC) by trapezoidal integration.
Usage
AUC_calc_integral(sens, spec)
Arguments
sens |
a numerical vector of sensitivity values within the range of (0, 1). |
spec |
a numerical vector of specificity values within the range of (0, 1). |
Value
It returns AUC as a numerical scalar.
Note
This function sorts sens
and 1-spec
in an increasing order.
A 0 and 1 will be added to the two ends of the sorted vectors. The Area Under the Curve (AUC) is obtained by trapezoidal
integration of the area under the piecewise linear curve obtained by connecting
points in sens
and 1-spec
.
Example data: Malignant Melanoma Data
Description
This example dataset is included for illustration. The Melano data is publicly available. In 1962-1977, 205 patients with malignant melanoma (skin cancer) had a radical operation performed at an academic medical center. At the end of the follow-up, 57 died from cancer, 14 died from other causes, and the other 134 patients were alive (censored). This example dataset illustrates the prediction accuracy of competing risk outcomes with baseline age and tumor thickness.
Usage
data(Melano)
Format
A data frame with 312 observations and 4 variables: time (event time/censoring), time), censor (censoring mayoscore4, mayoscore5. The two scores are derived from 4 and 5 covariates respectively.
References
Andersen, P. K. , & Skovgaard, L. T. (2010). Regression with linear predictors. New York, NY: Springer.
Examples
data(Melano)
head(Melano)
Calculate Kernel weights
Description
This function calculate the nearest neighbor kernel weights using uniform kernel weights.
Usage
calc.kw(X, x0, span = 0.1, h = NULL, type = "uniform")
Arguments
X |
the vector of biomarker values from other subjects used to calculate the weights around the center x0. |
x0 |
a scalar as the center around which the kernel weights are calculated. |
span |
a numeric value of the proportion of neighbour observations used, default is 0.1. |
h |
a numeric value of the bandwidth of kernel weights, defualt is NULL. If not specified, the function used the value of |
type |
a character value of the type of kernel function used to calculate kernel weights. Default is "uniform" kernel. Other options are "Epanichnekov" and "normal".
It will only be used when the bandwidth |
Value
Return a vector of kernel weights for each element in X. It has the same length as X.
Note
X must be the vector of ALL risk score values in the data; it cannot be any other vector of arbitrary length.
Example data: Mayo Data
Description
This example dataset is included for illustration. The Mayo PBC data is publicly available and has been used in many statistical researches (e.g., Zheng and Heagerty 2005). This example dataset is a subset of the full PBC data.
Usage
data(mayo)
Format
A data frame with 312 observations and 4 variables:
time
: event time or censoring time
censor
: censoring indicator.
mayoscore4
and mayoscore5
: derived from 4 and 5 covariates respectively.
References
Heagerty, P. J., & Zheng, Y. (2005). Survival model predictive accuracy and ROC curves. Biometrics, 61(1), 92-105.
Examples
data(mayo)
head(mayo)
Plot the time-dependent ROC curve
Description
This function reads in object returned by tdROC()
and plot ROC curve for it.
Usage
plot_tdROC(
x,
lwd = 2,
xlab = "1-specificity",
ylab = "sensitivity",
xlim = c(0, 1),
ylim = c(0, 1),
main = "ROC curve",
col = "black",
abline = T,
...
)
Arguments
x |
the object returned by |
lwd |
user-specified line width. Default is |
xlab |
user-specified label for x-axis. Default is " |
ylab |
user-specified label for y-axis. Default is " |
xlim |
user-specified limit for x axis. Default is |
ylim |
user-specified limit for y axis. Default is |
main |
user-specified title for the plot. Default is " |
col |
user-specified color for ROC curve. Defualt is " |
abline |
user-specified reference diagnol line. Default is |
... |
for future methods |
Value
Returns a plot of ROC curve. If the tdROC object comes with bootstrap result, then the ROC curve will be plotted with confidence interval.
Examples
library(survival)
data(mayo)
dat <- mayo[, c("time", "censor", "mayoscore5")]
fm <- tdROC(
X = dat$mayoscore5, Y = dat$time, delta = dat$censor,
tau = 365 * 6, span = 0.1, nboot = 0, alpha = 0.05, n.grid = 1000, cut.off = 5:9
)
# plot the object "fm" from tdROC()
plot_tdROC(fm)
Plot the time-dependent ROC curve with competing risk
Description
This function reads in object returned by tdROC.cr()
and plot ROC curve for it.
Usage
plot_tdROC_cr(
x,
lwd = 2,
xlab = "1-specificity",
ylab = "sensitivity",
xlim = c(0, 1),
ylim = c(0, 1),
col = c("red", "blue"),
main = "ROC curve",
abline = T,
...
)
Arguments
x |
the object returned by |
lwd |
user-specified line width. Default is |
xlab |
user-specified label for x-axis. Default is " |
ylab |
user-specified label for y-axis. Default is " |
xlim |
user-specified limit for x axis. Default is |
ylim |
user-specified limit for y axis. Default is |
col |
user-specified color for ROC curve. Defualt is " |
main |
user-specified title for the plot. Default is " |
abline |
user-specified reference diagnol line. Default is |
... |
for future methods |
Value
Returns several plots of ROC curve. For competing risk data, there are two definitions of controls introduced by Zheng et al, which was listed below
\text{Definition A:} \text{Case} k:T \le \tau, \delta = k; \text{Control}_A: (T>\tau)\cup (T \le \tau \cap \delta \ne k)
\text{Definition B:} \text{Case} k:T \le \tau, \delta = k; \text{Control}_B: (T>\tau)
For more details about above two definitions, please read details of function tdROC.cr
.
If the tdROC.cr
object comes without bootstrap result, the ROC curve for above two definitions will be plotted together and indicated by the specified col
.
If the tdROC.cr
object with bootstrap result, one more ROC curve with confidence interval will be plotted for each definition.
References
Zheng Y, Cai T, Jin Y, Feng Z. Evaluating prognostic accuracy of biomarkers under competing risk. Biometrics. 2012;68(2):388-396. doi:10.1111/j.1541-0420.2011.01671.x
Examples
library(survival)
data(Melano)
tdROC.cr_res <- tdROC.cr(
X = Melano$thick, Y = Melano$time,
delta = Melano$status, tau = 1800, nboot = 10
)
plot_tdROC_cr(tdROC.cr_res)
Calculate conditional probability of being a case at time tau
Description
This is key function to estimate the weight, the conditional probability of being a case at time tau given the observed time to event, event status, and prognostic risk score, as described in Wu and Li, 2018.
Usage
td.kw.cr(
X,
Y,
delta,
event.code,
tau,
span = 0.1,
h = NULL,
type = "uniform",
epsilon = 0.01
)
Arguments
X |
a numeric vector of risk score for each subject. Higher value of |
Y |
a numeric vector of time to event. Same length with |
delta |
a vector of numeric indicator of event type. The primary event you want to study should be coded as 1,
the competing event should be coded as 2, and censoring should be coded as 0. Same length with |
event.code |
numeric indicator of event (1), or competing event (2), it specifies you are going to calculate the conditional probability for which event. |
tau |
a scalar, the prediction horizon at which the prediction is evaluated. |
span |
a numeric value, the proportion of neighbour observations used in nearest neighbor method, default is 0.1. |
h |
a numeric value, the bandwidth of kernel weights, defualt is |
type |
a character value, indicating the type of kernel function used to calculate kernel weights. Default is " |
epsilon |
the precision parameter for weight calculation using neighborhood approximation. If not specified, default will be calculating weights for all right censored points individually. |
Details
This function read in the risk score value X
, the time-to-event data Y
and censoring indicator delta
to estimate the weight, the conditional probability of being a case at time tau when there is competing event.
The weight estimation serves for the further prediction accuracy estimation, including AUC, Brier score and so on.
Value
Returns the estimated conditional probability of being a case at time tau for the specified event code.
See Also
Calculate the Brier Score
Description
This function reads in a vector of estimated weight and same length biomarker to calculate Brier Score by formula (Wu and Li, 2018). This function is used internally by other functions in this package.
Usage
tdBrier(W, X)
Arguments
W |
a numerical vector of weight estimated by nonparametric weight adjustments (Li et al., 2015). Same length with |
X |
a numerical vector of risk score values. Same length with |
Value
Brier Score as a numerical scalar.
Note
This function estimates brier score by using the formula from Wu and Li, 2018.
Estimate time-dependent prediction accuracy measures, including the ROC, AUC, Brier score, and survival difference, with right-censored survival data.
Description
This is a core function of the ‘tdROC‘ package. It uses the nonparametric weights proposed by Li (Li et al., 2015) to estimate a number of time-dependent prediction accuracy measures for right-censored survival outcomes, including ROC curve, AUC, Brier score, and survival difference. For each measure, the variance can be estimated through bootstrap resampling.
Usage
tdROC(
X,
Y,
delta,
tau,
span = 0.1,
h = NULL,
type = "uniform",
n.grid = 1000,
X.min = NULL,
X.max = NULL,
cut.off = NULL,
nboot = 0,
alpha = 0.05,
epsilon = NULL,
method = "both",
output = "both"
)
Arguments
X |
a numeric vector of risk score in the same length as |
Y |
a numeric vector of time to event in the same length as |
delta |
a vector of binary indicator of event (1) or censoring (0) in the same length as |
tau |
a scalar, the prediction horizon at which the prediction is evaluated. |
span |
a numeric value, the proportion of neighbour observations used in nearest neighbor method, default to 0.1. |
h |
a numeric value, the bandwidth of kernel weights, the defualt is |
type |
a character value, indicating the type of kernel function used to calculate kernel weights. The default is |
n.grid |
a positive integer, the number of grid points used when calculating the ROC curve. The default is |
X.min |
the lower boundary of grid cut-off points for biomarker |
X.max |
the upper boundary of grid cut-off points for biomarker |
cut.off |
a vector of |
nboot |
the number of bootstrap replications to be used for variance estimation. The default is |
alpha |
It is (1 - level of confidence interval)/2, default is |
epsilon |
The precision parameter used in an approximation to the weight calculation when the sample size is large. If a weight corresponding to a specific risk score is already calculated, then the weights corresponding to adjacent risk scores, within the distance specified by epsilon, will be the same under the approximation. This approximation avoids repeated calculation of weights that are almost the same, and hence increases the speed of computation in this situation. The default is NULL, which means no approximation is used. A higher value indicates less precision. |
method |
It is used to specify which method you would like to use to estimate AUC, default to |
output |
It is used to specify which kind of output you want, default to |
Details
This function takes the risk score value X
, the time-to-event data Y
and censoring indicator delta
as input to estimate
a number of time-dependent prediction accuracy measures for right-censored survival outcomes, including ROC curve, AUC, Brier score, and survival difference.
The confidence intervals of above quantities will be estimated by bootstrap.
This function offer two options to estimate AUC. The first one make use of estimated sensitivity and specificity to calculate the AUC via trapezoidal integration
by setting a series of cutoff point. The output will also include corresponding sensitivity and specificity for our plot function. The other one estimate AUC by the empirical estimator
of the proportion of concordance pairs with proposed weight estimator (Li et al, 2015). These two methods will generate quite similar estimates. The option can be set by argument method
.
We also include Brier Score and survival difference to evaluate the calibration metrics. Their definitions are included below.
They can be estimated with the proposed conditional probability weight (Wu and Li, 2018).
Both of them are measures to assess the accuracy of probabilistic predictions X
. The calibration result makes sense only
when the risk score X
is a predicted probability, and should be ignored otherwise.
\text{Brier Score} = E{[1(T \le \tau, \delta = 1) - X]^2}
\text{Survival difference} = E[1(T \le \tau, \delta = 1) - X]
As mentioned in arguments, we introduced a small precision parameter epsilon
to speed up the computation when the sample size is large.
For each subject with a risk score, X_i
, we assess whether there exists a previously processed grid point, X_{grid,m}
where 1\le m \le j
,
within the proximity of X_i
such that |X_i - X_{grid,m}| < \epsilon
. In the absence of such a point, we designate X_i
as a new grid point,
X_{grid,j+1}
, and store the corresponding survfit
object for subsequent weight estimation and mark it as a processed grid point. Conversely,
if a previously processed grid point is found, we directly utilize the stored survfit
object associated with it for weight calculation.
Given that the most time-consuming step in our estimation process is the survfit
computation, this method significantly reduces computing time
without incurring notable bias especially when dealing with large sample sizes.
Value
Returns a list of the following items:
main_res:
a list of AUC.integral
estimated by trapezoidal integration, AUC.empirical
estimated by empirical estimator of the proportion of concordance pairs.
and a data frame ROC
with dimension (2+n.grid) x 3
with columns cut.off
, sens
, and spec
.
calibration_res:
brier score and survival difference estimated based on the formula similar to Wu and Li (2018). When the risk score X
is a biomarker value instead of a predicted cumulative incidence probability, the brier score and survival difference cannot be calculated. In this case, please disregard the calibration results.
boot_res:
a list of bootstrap results, including bAUC
, bAUC2
, bBS
, bSurvDiff
, bROC
.
For bAUC
, bAUC2
, bBS
, bSurvDiff
, each one is a list including corresponding mean, standard deviation, and confidence interval.
bROC
is a data frame with colomns sens.mean
, sens.sd
, sens.lower
, sens.upper
, spec.mean
, spec.sd
, spec.lower
, spec.upper
Examples
library(survival)
data(mayo)
dat <- mayo[, c("time", "censor", "mayoscore5")]
fm <- tdROC(
X = dat$mayoscore5, Y = dat$time, delta = dat$censor,
tau = 365 * 6, span = 0.1, nboot = 0, alpha = 0.05,
n.grid = 1000, cut.off = 5:9
)
# In the following example, We use biomarker mayoscore5 to estimate predicted probability
# tipycally a monotone transformation function such as expit() is used to transform biomarker
# with range out of range into estimated probability between 0 and 1
expit <- function(x){ 1/(1+exp(-x)) }
tdROC(
X = expit(dat$mayoscore5), Y = dat$time, delta = dat$censor,
tau = 365 * 6, span = 0.1, nboot = 0, alpha = 0.05,
n.grid = 1000, cut.off = 5:9
)
tdROC(
X = expit(dat$mayoscore5), Y = dat$time, delta = dat$censor,
tau = 365 * 6, span = 0.1, nboot = 0, alpha = 0.05,
n.grid = 1000, cut.off = 5:9, epsilon = 0.05
)
Estimate time-dependent prediction accuracy measures, including the ROC, AUC, Brier score, and survival probability difference, with competing risk data.
Description
This is a core function of the ‘tdROC‘ package. It uses the nonparametric weights proposed by Wu (Wu and Li, 2018) to estimate a number of time-dependent prediction accuracy measures for right-censored survival outcomes, including ROC curve, AUC, Brier score, and survival difference, with competing risk data. For each measure, the variance can be estimated through bootstrap resampling.
Usage
tdROC.cr(
X,
Y,
delta,
tau,
span = 0.1,
h = NULL,
type = "uniform",
epsilon = 0.01,
cut.off = NULL,
n.grid = 1000,
nboot = 0,
alpha = 0.05,
method = "both",
output = "both"
)
Arguments
X |
a numeric vector of risk score in the same length as |
Y |
a numeric vector of time to event in the same length as |
delta |
a vector of numeric indicator of event type in the same length as |
tau |
a scalar, the prediction horizon at which the prediction is evaluated. |
span |
a numeric value, the proportion of neighbour observations used in nearest neighbor method. The default is 0.1. |
h |
a numeric value, the bandwidth of kernel weights, the defualt is |
type |
a character value, indicating the type of kernel function used to calculate kernel weights. The default is |
epsilon |
The precision parameter used in an approximation to the weight calculation when the sample size is large. If a weight corresponding to a specific risk score is already calculated, then the weights corresponding to adjacent risk scores, within the distance specified by epsilon, will be the same under the approximation. This approximation avoids repeated calculation of weights that are almost the same, and hence increases the speed of computation in this situation. The default is NULL, which means no approximation is used. A higher value indicates less precision. |
cut.off |
a vector of |
n.grid |
a positive integer, the number of grid points used when calculating the ROC curve. The default is |
nboot |
the number of bootstrap replications to be used for variance estimation. The default is |
alpha |
It is (1 - level of confidence interval)/2, default is |
method |
It is used to specify which method you would like to use to estimate AUC, default to |
output |
It is used to specify which kind of output you want, default to |
Details
This function takes the risk score value X
, the time-to-event data Y
and censoring indicator delta
as input to estimate
a number of time-dependent prediction accuracy measures for survival outcomes, including ROC curve, AUC, Brier score, and survival difference, with competing risk.
The confidence intervals of above quantities are estimated by bootstrap.
For competing risk data, there are two definition of controls introduced by Zheng et al, which are listed below
\text{Definition A:} \text{Case} k:T \le \tau, \delta = k; \text{Control}_A: (T>\tau)\cup (T \le \tau \cap \delta \ne k)
\text{Definition B:} \text{Case} k:T \le \tau, \delta = k; \text{Control}_B: (T>\tau)
Based on the definition A, both the event-free subjects and subjects who experience other competing events were included as controls. While definition B include only event-free subjects.
This function offers two options to estimate AUC. The first one make use of estimated sensitivity and specificity to calculate the AUC via trapezoidal integration
by setting a series of cutoff point. For the two different definition, we separately calculate the sensitivity, specificity and AUC. The output will also include the sensitivities
and specificities for our plot function. The other one estimates AUC by the empirical estimator of the proportion of concordance pairs with proposed weight estimator (Wu and Li, 2018).
These two methods generate quite similar estimates. The option can be set by the argument method
.
In addition to the above prediction measures, we include Brier Score and survival difference to evaluate the calibration metrics. Their definitions are included below.
They can be estimated with the proposed conditional probability weight (Wu and Li, 2018).
Both of them are measures to assess the accuracy of probabilistic predictions X
. The calibration result makes sense only
when the risk score X
is a predicted probability, and should be ignored otherwise.
\text{Brier Score} = E{[1(T \le \tau, \delta = 1) - X]^2}
\text{Survival difference} = E[1(T \le \tau, \delta = 1) - X]
This function uses the same approximation as tdROC
with the argument epsilon
Value
Returns a list of the following items:
main_res:
a list of AUC.A.integral
estimated by trapezoidal integration for definition A,
AUC.A.empirical
estimated by empirical estimator for definition A,
AUC.B.integral
estimated by trapezoidal integration for definition B,
AUC.B.empirical
estimated by empirical estimator for definition B,
and a data frame ROC
with dimension (2+n.grid) x 4
with columns cut.off
, sens
, specA
and specB
.
calibration_res:
brier score and survival difference estimated based on the formula similar to Wu and Li (2018). When the risk score X
is a biomarker value instead of a predicted cumulative incidence probability, the brier score and survival difference cannot be calculated. In this case, please disregard the calibration results.
boot_res:
a list of bootstrap results, including bAUC.A.integral
, bAUC.A.empirical
, bAUC.B.integral
, bAUC.B.empirical
, bBS
, bSurvDiff
, bROC
.
For bAUC.A.integral
, bAUC.A.empirical
, bAUC.B.integral
, bAUC.B.empirical
, bBS
, bSurvDiff
, each one is a list including corresponding mean, standard deviation, confidence interval.
bROC
is a data frame with colomns sens.mean
, sens.sd
, sens.lower
, sens.upper
,
specA.mean
, specA.sd
, specA.lower
, specA.upper
, specB.mean
, specB.sd
, specB.lower
, specB.upper
References
Zheng Y, Cai T, Jin Y, Feng Z. Evaluating prognostic accuracy of biomarkers under competing risk. Biometrics. 2012;68(2):388-396. doi:10.1111/j.1541-0420.2011.01671.x
Examples
library(survival)
data(Melano)
expit <- function(x){ 1/(1+exp(-x)) }
tdROC.cr(X = expit(Melano$thick) , Y = Melano$time, delta = Melano$status, tau = 1800, nboot = 10)
Calculate the Survival Difference
Description
This function reads in a vector of estimated weight and same length risk score to calculate survival difference by formula (Wu and Li, 2018).This function is used internally by other functions in this package.
Usage
tdSurvDiff(W, X)
Arguments
W |
a numerical vector of weight estimated by nonparametric weight adjustments (Li et al., 2015). Same length with |
X |
a numerical vector of risk score values. Same length with |
Value
Survival difference as a numerical scalar.