Type: | Package |
Title: | Adaptation of Virtual Twins Method from Jared Foster |
Version: | 1.0.1 |
Date: | 2018-02-03 |
Description: | Research of subgroups in random clinical trials with binary outcome and two treatments groups. This is an adaptation of the Jared Foster method (https://www.ncbi.nlm.nih.gov/pubmed/21815180). |
License: | GPL-3 | file LICENSE |
URL: | https://github.com/prise6/aVirtualTwins |
BugReports: | https://github.com/prise6/aVirtualTwins/issues |
Imports: | rpart, party, methods, randomForest, stats |
Suggests: | caret, knitr, rpart.plot, rmarkdown, e1071 |
Depends: | R (≥ 3.2.0), |
Collate: | 'aVirtualTwins.R' 'data.R' 'object.R' 'difft.R' 'setClass.R' 'predict.R' 'forest.R' 'forest.double.R' 'forest.fold.R' 'forest.one.R' 'forest.wrapper.R' 'formatRCTDataset.R' 'incidences.R' 'object.wrapper.R' 'tools.R' 'tree.R' 'tree.class.R' 'tree.reg.R' 'tree.wrapper.R' |
VignetteBuilder: | knitr |
RoxygenNote: | 6.0.1 |
NeedsCompilation: | no |
Packaged: | 2018-02-03 15:22:29 UTC; prise6 |
Author: | Francois Vieille [aut, cre], Jared Foster [aut] |
Maintainer: | Francois Vieille <vieille.francois@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2018-02-04 16:00:40 UTC |
aVirtualTwins : An adapation of VirtualTwins method created by Jared Foster.
Description
aVirtualTwins is written mainly with reference classes. Briefly, there is three kinds of class :
-
VT.object
class to represent RCT dataset used by aVirtualTwins. To format correctly RCT dataset, useformatRCTDataset
. -
VT.difft
class to compute difference between twins. FamilyVT.forest
extends it to compute twins by random forest.vt.forest
is users function. -
VT.tree
class to find subgroups fromdifft
by CART trees.VT.tree.class
andVT.tree.reg
extend it.vt.tree
is users function.
Details
See http://github.com/prise6/aVirtualTwins for last updates.
Difference between twins
Description
A reference class to represent difference between twin1 and twin2
Details
Difft are calculated depending on the favorable outcome chosen. It is the second level of the outcome. For example, if the outcome is 0 and 1, the favorable outcome is 1. Then,
difft_i = twin1_i - twin2_i if T_i = 1
difft_i = twin2_i - twin1_i if T_i = 0
. So absolute method is :
P(Y = 1 | T = 1) - P(Y = 1 | T =0)
So relative method is :
P(Y = 1 | T = 1)/P(Y = 1 | T =0)
So absolute method is :
logit(P(Y = 1 | T = 1)) - logit(P(Y = 1 | T =0))
Fields
vt.object
VT.object (refClass) representing data
twin1
vector of
E(Y|T = real treatment)
twin2
vector of
E(Y|T = another treatment)
method
Method available to compute difft : c("absolute", "relative", "logit"). Absolute is default value. See details.
difft
vector of difference between twin1 and twin2
Methods
computeDifft()
Compute difference between twin1 and twin2. See details.
See Also
VT.forest
, VT.forest.one
,
VT.forest.double
Difft by Random Forest
Description
An abstract reference class to compute twin via random forests
VT.forest
extends VT.difft
Fields
...
see fields of
VT.difft
Methods
checkModel(model)
Checking model class: Must be : train, RandomForest, randomForest
getFullData()
Return twin1, twin2 and difft in column
run()
Compute twin1 and twin2 estimation. Switch treatment if necessary.
See Also
VT.difft
, VT.forest.one
, VT.forest.double
Difft by double random forest
Description
A reference class to compute twins via double random forests
Details
VT.forest.double
extends VT.forest
.
E(Y|T = 1)
if T_i = 1
is estimated by OOB predictions from
model_trt1
.
E(Y|T = 0)
if T_i = 0
is estimated by OOB predictions from
model_trt0
.
This is what computeTwin1()
does.
Then E(Y|T = 1)
if T_i = 0
is estimated by model_trt1.
Then E(Y|T = 0)
if T_i = 1
is estimated by model_trt1.
This is what computeTwin2()
does.
Fields
model_trt1
a caret/RandomForest/randomForest object for treatment T = 1
model_trt0
a caret/RandomForest/randomForest object for treatment T = 0
...
field from parent class :
VT.forest
Methods
computeTwin1()
Compute twin1 with OOB predictions from double forests. See details.
computeTwin2()
Compute twin2 by the other part of data in the other forest. See details.
See Also
VT.difft
, VT.forest
,
VT.forest.one
Difft via k random forests
Description
A reference class to compute twins via k random forest
Details
VT.forest.fold
extends VT.forest
Twins are estimated by k-fold cross validation. A forest is computed on k-1/k of the data and then used to estimate twin1 and twin2 on 1/k of the left data.
Fields
interactions
logical set TRUE if model has been computed with interactions
fold
numeric, number of fold, i.e. number of forest (k)
ratio
numeric experimental, use to balance sampsize. Defaut to 1.
groups
vector Define which observations belong to which group
...
field from parent class :
VT.forest
Methods
run()
Compute twin1 and twin2 estimation. Switch treatment if necessary.
See Also
VT.difft
, VT.forest
,
VT.forest.one
, VT.forest.double
Difft by one random forest
Description
A reference class to compute twins via one random forest
Details
VT.forest.one
extends VT.forest
.
OOB predictions are used to estimate E(Y|T = real treatment)
. Then,
treatement is switched, it means that 1 becomes 0 and 0 becomes 1. We use
again model
to estimate E(Y|T = the other treatment)
. This is
what computeTwin1()
and computeTwin2()
functions do.
Fields
model
is a caret/RandomForest/randomForest class object
interactions
logical set TRUE if model has been computed with interactions
...
field from parent class :
VT.forest
Methods
computeTwin1()
Compute twin1 with OOB predictions
computeTwin2()
Compute twin2 by switching treatment and applying random forest model
See Also
VT.difft
, VT.forest
, VT.forest.double
VT.object
Description
A Reference Class to deal with RCT dataset
Details
Currently working with binary response only. Continous will come, one day. Two-levels treatment only as well.
data
field should be as described, however if virtual twins won't used
interactions, there is no need to transform factors. See
formatRCTDataset for more details.
Fields
data
Data.frame with format:
Y,T,X_{1}, \ldots, X_{p}
. Y must be two levels factor if type is binary. T must be numeric or integer.screening
Logical, set to
FALSE
Set toTRUE
to usevarimp
in trees computation.varimp
Character vector of important variables to use in trees computation.
delta
Numeric representing the difference of incidence between treatments.
type
Character : binary or continous. Only binary is currently available.
Methods
computeDelta()
Compute delta value.
getData(interactions = F)
Return dataset. If interactions is set to T, return data with treatement interactions
getFormula()
Return formula : Y~T+X1+...+Xp. Usefull for cforest function.
getIncidences(rule = NULL)
Return incidence table of data if rule set to NULL. Otherwise return incidence for the rule.
getX(interactions = T, trt = NULL)
Return predictors (T,X,X*T,X*(1-T)). Or (T,X) if interactions is FALSE. If trt is not NULL, return predictors for T = trt
getXwithInt()
Return predictors with interactions. Use VT.object::getX(interactions = T) instead.
getY(trt = NULL)
Return outcome. If trt is not NULL, return outcome for T = trt.
switchTreatment()
Switch treatment value.
See Also
Examples
## Not run:
# Default use :
vt.o <- VT.object$new(data = my.rct.dataset)
# Getting data
head(vt.o$data)
# or getting predictor with interactions
vt.o$getX(interactions = T)
# or getting X|T = 1
vt.o$getX(trt = 1)
# or getting Y|T = 0
vt.o$getY(0)
# Print incidences
vt.o$getIncidences()
## End(Not run)
VT.predict generic function
Description
VT.predict generic function
Usage
VT.predict(rfor, newdata, type)
## S4 method for signature 'RandomForest,missing,character'
VT.predict(rfor, type = "binary")
## S4 method for signature 'RandomForest,data.frame,character'
VT.predict(rfor, newdata,
type = "binary")
## S4 method for signature 'randomForest,missing,character'
VT.predict(rfor, type = "binary")
## S4 method for signature 'randomForest,data.frame,character'
VT.predict(rfor, newdata,
type = "binary")
## S4 method for signature 'train,ANY,character'
VT.predict(rfor, newdata, type = "binary")
## S4 method for signature 'train,missing,character'
VT.predict(rfor, type = "binary")
Arguments
rfor |
random forest model. Can be train, randomForest or RandomForest class. |
newdata |
Newdata to predict by the random forest model. If missing, OOB predictions are returned. |
type |
Must be binary or continous, depending on the outcome. Only binary is really available. |
Value
vector E(Y=1)
Methods (by class)
-
rfor = RandomForest,newdata = missing,type = character
: rfor(RandomForest) newdata (missing) type (character) -
rfor = RandomForest,newdata = data.frame,type = character
: rfor(RandomForest) newdata (data.frame) type (character) -
rfor = randomForest,newdata = missing,type = character
: rfor(randomForest) newdata (missing) type (character) -
rfor = randomForest,newdata = data.frame,type = character
: rfor(randomForest) newdata (data.frame) type (character) -
rfor = train,newdata = ANY,type = character
: rfor(train) newdata (ANY) type (character) -
rfor = train,newdata = missing,type = character
: rfor(train) newdata (missing) type (character)
Tree to find subgroup
Description
An abstract reference class to compute tree
Details
VT.tree.class
and VT.tree.reg
are children of VT.tree
.
VT.tree.class
and VT.tree.reg
try to find a strong association
between difft
(in VT.difft
object) and RCT variables.
In VT.tree.reg
, a regression tree is computed on difft
values.
Then, thanks to the threshold
it flags leafs of the tree
which
are above the threshold
(when sens
is ">"). Or it flags leafs
which are below the threshold
(when sens
= "<").
In VT.tree.class
, it first flags difft
above or below
(depending on the sens
) the given threshold
. Then a
classification tree is computed to find which variables explain flagged
difft
.
To sum up, VT.tree
try to understand which variables are associated
with a big change of difft
.
Results are shown with getRules()
function. only.leaf
parameter
allows to obtain only the leaf of the tree
. only.fav
parameter
select only favorable nodes. tables
shows incidence table of the rule.
verbose
allow getRules()
to be quiet. And compete
show
also rules with maxcompete
competitors from the tree
.
Fields
vt.difft
VT.difft
objectoutcome
outcome vector from
rpart
functionthreshold
numeric Threshold for difft calculation (c)
screening
Logical. TRUE if using varimp. Default is VT.object screening field
sens
character Sens can be ">" (default) or "<". Meaning :
difft
>threshold
ordifft
<threshold
name
character Names of the tree
tree
rpart Rpart object to construct the tree
Ahat
vector Indicator of beglonging to Ahat
Methods
computeNameOfTree(type)
return label of response variable of the tree
createCompetitors()
Create competitors table
getAhatIncidence()
Return Ahat incidence
getAhatQuality()
Return Ahat quality
getData()
Return data used for tree computation
getIncidences(rule, rr.snd = T)
Return incidence of the rule
getInfos()
Return infos about tree
getRules(only.leaf = F, only.fav = F, tables = T, verbose = T, compete = F)
Return subgroups discovered by the tree. See details.
run(...)
Compute tree with rpart parameters
See Also
Classification tree to find subgroups
Description
See VT.tree
Methods
run(...)
Compute tree with rpart parameters
Regression tree to find subgroups
Description
See VT.tree
Methods
run(...)
Compute tree with rpart parameters
RCT format for Virtual Twins
Description
formatRCTDataset
returns dataset that Virtual Twins is able to
analyze.
Usage
formatRCTDataset(dataset, outcome.field, treatment.field, interactions = TRUE)
Arguments
dataset |
data.frame representing RCT's |
outcome.field |
name of the outcome's field in |
treatment.field |
name of the treatment's field in |
interactions |
logical. If running VirtualTwins with treatment's interactions, set to TRUE (default value) |
Details
This function check these differents topic: Outcome must be binary and a
factor. If numeric with two distincts values, outcome becomes a factor where
the favorable reponse is the second level. Also, outcome is moved on the
first column of dataset
.
Treatment must have two distinct numeric values, 0 : no treatment, 1 : treatment. Treatment is moved to the second column.
Qualitatives variables must be factor. If it has more than two levels, if running VirtualTwins with interaction, it creates dummy variables.
Value
return data.frame with good format (explained in details section) to run VirtualTwins
Examples
## Not run:
data.format <- formatRCTDataset(data, "outcome", "treatment", TRUE)
## End(Not run)
data(sepsis)
data.format <- formatRCTDataset(sepsis, "survival", "THERAPY", T)
Clinical Trial for Sepsis desease
Description
Simulated clinical trial with two groups treatment about sepsis desease. See details.
Usage
data(sepsis)
Format
470 patients and 13 variables.
- survival
binary outcome
- THERAPY
1 for active treatment, 0 for control treatment
- TIMFIRST
Time from first sepsis-organ fail to start drug
- AGE
Patient age in years
- BLLPLAT
Baseline local platelets
- blSOFA
Sum of baselin sofa (cardiovascular, hematology, hepaticrenal, and respiration scores)
- BLLCREAT
Base creatinine
- ORGANNUM
Number of baseline organ failures
- PRAPACHE
Pre-infusion apache-ii score
- BLGCS
Base GLASGOW coma scale score
- BLIL6
Baseline serum IL-6 concentration
- BLADL
Baseline activity of daily living score
- BLLBILI
Baseline local bilirubin
Details
This dataset is taken from SIDES method.
Sepsis
contains simulated data on 470 subjects with a binary outcome
survival, that stores survival status for patient after 28 days of treatment,
value of 1 for subjects who died after 28 days and 0 otherwise. There are 11
covariates, listed below, all of which are numerical variables.
Note that contrary to the original dataset used in SIDES, missing values have
been imputed by random forest (randomForest::rfImpute())
. See file
data-raw/sepsis.R for more details.
True subgroup is PRAPACHE <= 26 & AGE <= 49.80. NOTE: This subgroup is defined with the lower event rate (survival = 1) in treatement arm.
Source
http://biopharmnet.com/subgroup-analysis-software/
Initialize virtual twins data
Description
vt.data
is a wrapper of formatRCTDataset
and
VT.object
. Allows to format your data.frame in order to create
a VT.object object.
Usage
vt.data(dataset, outcome.field, treatment.field, interactions = TRUE, ...)
Arguments
dataset |
data.frame representing RCT's |
outcome.field |
name of the outcome's field in |
treatment.field |
name of the treatment's field in |
interactions |
logical. If running VirtualTwins with treatment's interactions, set to TRUE (default value) |
... |
parameters of |
Value
VT.object
See Also
Examples
data(sepsis)
vt.o <- vt.data(sepsis, "survival", "THERAPY", T)
Create forest to compute difft
Description
vt.forest
is a wrapper of VT.forest.one
,
VT.forest.double
and VT.forest.fold
. With
parameter forest.type, any of these class can be used with its own parameter.
Usage
vt.forest(forest.type = "one", vt.data, interactions = T,
method = "absolute", model = NULL, model_trt1 = NULL,
model_trt0 = NULL, ratio = 1, fold = 10, ...)
Arguments
forest.type |
must be a character. "one" to use VT.forest.one class. "double" to use VT.forest.double. "fold" to use VT.forest.fold. |
vt.data |
|
interactions |
logical. If running VirtualTwins with treatment's interactions, set to TRUE (default value) |
method |
character c("absolute", "relative", "logit"). See
|
model |
allows to give a model you build outside this function. Can be randomForest, train or cforest. Is only used with forest.type = "one". If NULL, a randomForest model is grown inside the function. NULL is default. |
model_trt1 |
see model_trt0 explanation and
|
model_trt0 |
works the same as model parameter. Is only used with
forest.type = "double". If NULL, a randomForest model is grown inside the
function. NULL is default. See |
ratio |
numeric value that allow sampsize to be a bit controlled.
Default to 1. See |
fold |
number of fold you want to construct forest with k-fold method.
Is only used with forest.type = "fold". Default to 5. See
|
... |
randomForest() function parameters. Can be used for any forest.type. |
Value
VT.difft
Examples
data(sepsis)
vt.o <- vt.data(sepsis, "survival", "THERAPY", T)
# inside model :
vt.f <- vt.forest("one", vt.o)
# ...
# your model :
# library(randomForest)
# rf <- randomForest(y = vt.o$getY(),
# x = vt.o$getX(int = T),
# mtry = 3,
# nodesize = 15)
# vt.f <- vt.forest("one", vt.o, model = rf)
# ...
# Can also use ... parameters
vt.f <- vt.forest("one", vt.o, mtry = 3, nodesize = 15)
# ...
Visualize subgroups
Description
Function which uses VT.tree
intern functions. Package
rpart.plot must be loaded. See VT.tree
for details.
Usage
vt.subgroups(vt.trees, only.leaf = T, only.fav = T, tables = F,
verbose = F, compete = F)
Arguments
vt.trees |
|
only.leaf |
logical to select only leaf of trees. TRUE is default. |
only.fav |
logical select only favorable subgroups (meaning with favorable label of the tree). TRUE is default. |
tables |
set to TRUE if tables of incidence must be shown. FALSE is default. |
verbose |
print infos during computation. FALSE is default. |
compete |
print competitors rules thanks to competitors computation of the tree |
Value
data.frame of rules
Examples
data(sepsis)
vt.o <- vt.data(sepsis, "survival", "THERAPY", TRUE)
# inside model :
vt.f <- vt.forest("one", vt.o)
# use classification tree
vt.tr <- vt.tree("class", vt.f, threshold = c(0.01, 0.05))
# show subgroups
subgroups <- vt.subgroups(vt.tr)
# change options you'll be surprised !
subgroups <- vt.subgroups(vt.tr, verbose = TRUE, tables = TRUE)
Trees to find Subgroups
Description
vt.tree
is a wrapper of VT.tree.class
and
VT.tree.reg
. With parameter tree.type, any of these two class
can be used with its own parameter.
Usage
vt.tree(tree.type = "class", vt.difft, sens = ">", threshold = seq(0.5,
0.8, 0.1), screening = NULL, ...)
Arguments
tree.type |
must be a character. "class" for classification tree, "reg" for regression tree. |
vt.difft |
|
sens |
must be a character c(">","<"). See |
threshold |
must be numeric. It can be a unique value or a vector. If
numeric vector, a list is returned. See |
screening |
must be logical. If TRUE, only varimp variables of VT.object is used to create the tree. |
... |
rpart() function parameters. Can be used for any tree.type. |
Details
See VT.tree
, VT.tree.class
and
VT.tree.reg
classes.
Value
VT.tree
or a list of VT.tree
depending on threshold
dimension. See examples.
Examples
data(sepsis)
vt.o <- vt.data(sepsis, "survival", "THERAPY", T)
# inside model :
vt.f <- vt.forest("one", vt.o)
# use classification tree
vt.tr <- vt.tree("class", vt.f, threshold = c(0.01, 0.05))
# return a list
class(vt.tr)
# access one of the tree
tree1 <- vt.tr$tree1
# return infos
# vt.tr$tree1$getInfos()
# vt.tr$tree1$getRules()
# use vt.subgroups tool:
subgroups <- vt.subgroups(vt.tr)