Type: | Package |
Title: | Joint Models of Survival and Multivariate Longitudinal Data |
Version: | 0.4.5 |
Date: | 2024-10-05 |
Description: | Fit joint models of survival and multivariate longitudinal data. The longitudinal data is specified by generalised linear mixed models. The joint models are fit via maximum likelihood using an approximate expectation maximisation algorithm. Bernhardt (2015) <doi:10.1016/j.csda.2014.11.011>. |
License: | GPL-3 |
Depends: | R (≥ 3.6.0), glmmTMB, survival |
Imports: | Rcpp (≥ 1.0.6), MASS, methods, mvtnorm, pracma, reformulas, stats, statmod, xtable |
LinkingTo: | Rcpp, RcppArmadillo |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
LazyData: | true |
URL: | https://github.com/jamesmurray7/gmvjoint |
BugReports: | https://github.com/jamesmurray7/gmvjoint/issues |
NeedsCompilation: | yes |
Packaged: | 2024-10-05 21:00:06 UTC; lshjm44 |
Author: | James Murray [aut, cre] |
Maintainer: | James Murray <james.murray@lshtm.ac.uk> |
Repository: | CRAN |
Date/Publication: | 2024-10-05 22:40:02 UTC |
Joint Models of Survival and Multivariate Longitudinal Data
Description
gmvjoint allows the user to fit joint models of survival and multivariate longitudinal data. The
longitudinal data is specified by generalised linear mixed models (GLMMs). The joint models
are fit via maximum likelihood using an approximate EM algorithm first proposed by Bernhardt et
al. (2015). The GLMMs are specified using the same syntax as for package glmmTMB
Brooks et
al. (2017). The joint models themselves are then the flexible extensions to those in e.g.
Wulfsohn and Tsiatis (1997). The user is able to simulate data under many different response
types.
Author(s)
James Murray <j.murray7@ncl.ac.uk>
References
Bernhardt PW, Zhang D and Wang HJ. A fast EM Algorithm for Fitting Joint Models of a Binary Response to Multiple Longitudinal Covariates Subject to Detection Limits. Computational Statistics and Data Analysis 2015; 85; 37–53
Mollie E. Brooks, Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Maechler and Benjamin M. Bolker (2017). glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal, 9(2), 378-400.
Murray, J and Philipson P. A fast approximate EM algorithm for joint models of survival and multivariate longitudinal data.Computational Statistics and Data Analysis 2022
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics. 1997; 53(1), 330-339.
Primary biliary cirrhosis data
Description
Primary biliary cirrhosis (PBC) data. PBC is a chronic liver disease which affects the bile
ducts of the liver, complications of which can ultimately lead to death. The longitudinal
profile of numerous biomarkers were observed for 312 patients at the Mayo Clinic between 1974
and 1984 with patients assigned to either the active (D-penicillamine, n=154 (50.6
placebo treatment arm (Murtaugh 1994). The data is publicly available in numerous places,
including joineRML
and survival
. The presence of many longitudinal biomarkers
of clinical interest as well as an event-time has lead to the PBC data becoming
widely used in literature.
Usage
data('PBC')
Format
data.frame
with 312 patients and 19 variables:
id
Subject identifier
survtime
Survival time in years
drug
Binary indicator covariate: was the patient assigned active (
drug=1
) or placebo?sex
Binary indicator covariate: Takes value 1 if the subject is female, and zero if male.
time
Time of visit (0=baseline).
ascites
Binary response variable. Takes value 1 if accumulation of fluid in abdomen ("ascites") present.
hepatomegaly
Binary response variable. Takes value 1 if enlarged liver ("hepatomegaly") present.
spiders
Binary response variable. Takes value 1 if malformed blood vessels in skin ("hepatomegaly") present.
edema
Factor variable describing edema therapy, see
pbcseq
.serBilir
Serum bilirubin (measured in mg/dl).
serChol
Serum cholesterol (measured in mg/dl).
album
Serum albumin (measured in mg/dl).
alkaline
Alkaline phosphotase (measured in U/liter).
SGOT
Aspartate aminotransferase (measured in U/liter).
platelets
Platelet count per cubic ml/1000.
histologic
Histologic stage of disease, see
pbcseq
.status
Survival status,
status=1
if the subject experienced mortality and=0
if censored.age
Standardised age at baseline visit.
Details
Nine longitudinal biomarkers exist with varying degrees of completeness in the data.
Source
References
Murtaugh PA, Dickson ER, Van Dam GM, Malinchoc M, Grambsch PM, Langworthy AL, Gips CH. Primary biliary cirrhosis: Prediction of short-term survival based on repeated patient visits. Hepatology 1994; 20(1); 126-134.
Receiver Operator Characteristics (ROC) for a joint
model.
Description
Using longitudinal information available up to a time, establish diagnostic capabilities (ROC, AUC and Brier score) of a fitted joint model.
Usage
ROC(fit, data, Tstart, delta, control = list(), progress = TRUE, boot = FALSE)
Arguments
fit |
a joint model fit by the |
data |
the data to which the original |
Tstart |
The start of the time window of interest, |
delta |
scalar denoting the length of time interval to check for failure times. |
control |
list of control arguments to be passed to |
progress |
should a progress bar be shown, showing the current progress of the ROC
function (
to |
boot |
logical. Not currently used, legacy argument. |
Value
A list of class ROC.joint
consisting of:
Tstart
numeric denoting the start of the time window of interest; all dynamic predictions generated used longitudinal information up-to time
T_{\mathrm{start}}
.delta
scalar which denotes length of interval to check, such that the window is defined by
[T_{\mathrm{start}}, T_{\mathrm{start}}, + \delta]
.candidate.u
candidate vector of failure times to calculate dynamic probability of surviving for each subject alive in
data
at timeT_{\mathrm{start}}
.window.failures
numeric denoting the number of observed failures in
[T_{\mathrm{start}}, T_{\mathrm{start}}, + \delta]
.Tstart.alive
numeric denoting the risk set at
Tstart
.metrics
a
data.frame
containing probabilisticthresholds
with:TP
true positives;FN
false negatives;FP
false positives;TN
true negatives;TPR
true positive rate (sensitivity);FPR
false positive rate (1-specificity);Acc
accuracy;PPV
positive predictive value (precision);NPV
negative predictive value;F1s
F1 score andJ
Youden's J statistic.- AUC
the area under the curve.
- BrierScore
The Brier score.
- PE
The predicted error (taking into account censoring), loss function: square.
- MH.acceptance
Raw acceptance percentages for each subject sampled.
- MH.acceptance.bar
mean acceptance of M-H scheme across all subjects.
- simulation.info
list containing information about call to
dynPred
.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
dynPred
, and plot.ROC.joint
.
Examples
data(PBC)
PBC$serBilir <- log(PBC$serBilir)
long.formulas <- list(serBilir ~ drug * time + (1 + time|id))
surv.formula <- Surv(survtime, status) ~ drug
family <- list('gaussian')
fit <- joint(long.formulas, surv.formula, PBC, family)
(roc <- ROC(fit, PBC, Tstart = 8, delta = 2, control = list(nsim = 25)))
plot(roc)
Anova for joint models
Description
Perform a likelihood ratio test between two (nested) joint
models. The user must decide whether the models are truly nested.
Usage
## S3 method for class 'joint'
anova(object, object2, ...)
Arguments
object |
a joint model fit by the |
object2 |
a joint model fit by the |
... |
additional arguments (none used). |
Value
A list of class anova.joint
with elements
mod0
the name of
object
.l0
the log-likelihood of the nested model, i.e. fit under the null.
AIC0
AIC for
object
.BIC0
BIC for
object
.mod1
the name of
object2
.l1
the log-likelihood under the alternative hypothesis.
AIC1
AIC for
object2
.BIC1
BIC for
object2
.LRT
likelihood ratio test statistic.
p
the p-value of
LRT
.warnSurv
internal - logical value for printing difference in survival models.
warnRanefs
internal - logical value for printing difference in random effects specifications.
Author(s)
James Murray (j.murray7@ncl.ac.uk)
See Also
joint
and logLik.joint
.
Examples
rm(list=ls())
data(PBC)
# Compare quadratic vs linear time specification for log(serum bilirubin) -----
PBC$serBilir <- log(PBC$serBilir)
long.formulas1 <- list(serBilir ~ drug * time + (1 + time|id))
long.formulas2 <- list(serBilir ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id))
surv.formula <- Surv(survtime, status) ~ drug
family <- list('gaussian')
# Fit the two competing models (fit is nested in fit2) ------------------------
fit <- joint(long.formulas1, surv.formula, PBC, family,
control = list(verbose = FALSE))
fit2 <- joint(long.formulas2, surv.formula, PBC, family, control = list(verbose = FALSE))
anova(fit, fit2)
# Quadratic terms improve fit significantly.
Bootstrapping a joint
object
Description
Use an existing model fit by joint
along with the data object originally
used and obtain a mean estimate, standard errors and 95% confidence interval using the
bootstrap. The original data is resampled by subject, not by observation.
Usage
boot.joint(
fit,
data,
boot.size = NULL,
nboot = 100L,
replace = TRUE,
progress = TRUE,
use.MLEs = TRUE,
control = list()
)
Arguments
fit |
a joint model fit by the |
data |
the original data used to fit the above joint model. |
boot.size |
integer, specifies the number of subjects to resample in the bootstrapping
approach. The default value is |
nboot |
integer, specifies the number of bootstrap samples, default value is
|
replace |
logical, should sampling be done with replacement? Defaults to
|
progress |
logical, should a text progress bar showing overall progress be shown
and updated after each successful bootstrapped model fit? Defaults to |
use.MLEs |
logical, should the MLEs of the |
control |
a list of control arguments, with same possible arguments as shown in
|
Value
A list of class boot.joint
which contains the MLEs from supplied joint
object, as well as the bootstrapped summaries and some model/computation information.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
Examples
# Bivariate fit on PBC data -----------------------------------------
data(PBC)
# Subset data and remove NAs
PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time',
'albumin', 'platelets'))
PBC <- na.omit(PBC)
# Specify bivariate fit
long.formulas <- list(
albumin ~ time*drug + (1 + time|id),
platelets ~ time * drug + (1 + time|id)
)
surv.formula <- Surv(survtime, status) ~ drug
fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian', 'poisson'))
# Set 50 bootstraps, with lower absolute tolerance and convergence of 'either'.
BOOT <- boot.joint(fit, PBC, nboot = 50L, control = list(tol.abs = 5e-3, conv = 'either'),
use.MLEs = TRUE)
BOOT # Print to console via S3 method
Obtain conditional distribution of the random effects
Description
Obtain the conditional distribution of the random effects of a joint
model
fit. This is achieved by a Metropolis scheme. Approximate normality across random effects is
expected, and could be useful in diagnosing potential issues surrounding model fits.
Usage
cond.ranefs(fit, burnin = 500L, N = 3500L, tune = 2)
Arguments
fit |
a joint model fit by the |
burnin |
Number of burn-in iterations to discard, defaults to 500. |
N |
Number of MC iterations to carry out post burn-in, defaults to 3500. |
tune |
Tuning parameter, problem-specific, defaults to 2. |
Value
A list of class cond.b.joint
containing:
- walks
A list of length
n
containing the history ofb_i
post burn-in.- acceptance
A numeric vector containing the acceptance rate for each sampled subject.
- M
The ModelInfo list from
joint
. Used by S3 methods for classcond.b.joint
.- bhats
Posterior estimates at MLEs for the random effects. Same as
ranef(joint)
.- Sigmahats
The covariances of
bhats
.- D
The MLE estimate for the variance-covariance matrix of random effects from
fit
.- q
Dimension of random effects.
- K
Number of responses.
- qnames
The names of the random effects as determined by call to
joint
.- burnin
The amount of burn-in used.
- N
Number of MC iterations.
- tune
tuning parameter used
- nobs
The number of observations for each subject for each response.
- elapsed.time
Time taken for
cond.ranefs
to complete.
See Also
Examples
dat <- simData()$data
long.formulas <- list(Y.1 ~ time + cont + bin + (1 + time|id),
Y.2 ~ time + cont + bin + (1 + time|id))
surv.formula <- Surv(survtime, status) ~ bin
fit <- joint(long.formulas, surv.formula, dat, list("gaussian","gaussian"))
cond.b <- cond.ranefs(fit, burnin = 50L, N = 1000, tune = 2)
cond.b
plot(cond.b) # Overall
plot(cond.b, id = 1) # Plot the first subject (see plot.cond.b.joint).
Dynamic predictions for survival sub-model in a multivariate joint model.
Description
Calculates individualised conditional survival probabilities for subjects
during a period of follow-up using a joint
model fit along with requisite longitudinal
process history.
Note that this function is largely designed for use within the ROC function which assesses discriminatory power of the joint model, however it does function by itself with proper use of its arguments.
Usage
dynPred(
data,
id,
fit,
u = NULL,
nsim = 200,
progress = TRUE,
scale = NULL,
df = NULL
)
Arguments
data |
the data to which the original |
id |
subject identifier, i.e. for which subject is the conditional survival probabilities desired? |
fit |
a joint model fit by the |
u |
a numeric |
nsim |
how many Monte Carlo simulations should be carried out? Defaults to
|
progress |
a logical, if |
scale |
numeric scales the variance-covariance parameter in the proposal distribution for
the Metropolis-Hastings algorithm. Defaults to |
df |
numeric denotes the degrees of freedom of the proposed |
Details
Dynamic predictions for the time-to-event process based on information available
on the subject's longitudinal process up to given time t
are calculated by Monte Carlo
simulation outlined in Rizopoulos (2011). For a subject last observed at time t
, the
probability that they survive until future time u
is
Pr(T_i \ge u | T \ge t; \boldsymbol{Y}_i, \boldsymbol{b}_i; \boldsymbol{\Omega}) \approx
\frac{S(u|\hat{\boldsymbol{b}}_i; \boldsymbol{\Omega})}
{S(t|\hat{\boldsymbol{b}}_i; \boldsymbol{\Omega})}
where T_i
is the true failure time for subject i
, \boldsymbol{Y}_i
their
longitudinal measurements up to time t
, and S()
the survival function.
\boldsymbol{\Omega}
is drawn from the multivariate normal distribution with mean
\hat{\boldsymbol{\Omega}}
and its variance taken from a fitted joint
object.
\hat{\boldsymbol{b}}
is drawn from the t
distribution by means of a
Metropolis-Hastings algorithm with nsim
iterations.
Value
A list of class dynPred
which consists of three items:
pi
A
data.frame
which contains each candidate failure time (supplied byu
), with the mean, median and 2.5% and 97.5% quantiles of probability of survival until this failure time.pi.raw
A
matrix
of withnsim
rows andlength(u)
columns, each row represents thel
th conditional survival probability of survival eachu
survival time. This is largely for debugging purposes.- MH.accept
The acceptance rate of the Metropolis-Hastings algorithm on the random effects.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
References
Bernhardt PW, Zhang D and Wang HJ. A fast EM Algorithm for Fitting Joint Models of a Binary Response to Multiple Longitudinal Covariates Subject to Detection Limits. Computational Statistics and Data Analysis 2015; 85; 37–53
Rizopoulos D. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics 2011; 67: 819–829.
See Also
ROC
and plot.dynPred
.
Examples
data(PBC)
PBC$serBilir <- log(PBC$serBilir)
# Focus in on id 81, who fails at around 7 years of follow-up. \code{dynPred} allows us to
# infer how the model believes their survival probability would've progressed (ignoring the
# true outcome at start time).
# Univariate -----------------------------------------------------------
long.formulas <- list(serBilir ~ drug * time + (1 + time|id))
surv.formula <- Surv(survtime, status) ~ drug
family <- list('gaussian')
fit <- joint(long.formulas, surv.formula, PBC, family)
preds <- dynPred(PBC, id = 81, fit = fit, u = NULL, nsim = 200,
scale = 2)
preds
plot(preds)
# Bivariate ------------------------------------------------------------
# Does introduction of albumin affect conditional survival probability?
long.formulas <- list(
serBilir ~ drug * time + I(time^2) + (1 + time + I(time^2)|id),
albumin ~ drug * time + (1 + time|id)
)
fit <- joint(long.formulas, surv.formula, data = PBC, family = list("gaussian", "gaussian"))
bi.preds <- dynPred(PBC, id = 81, fit = fit, u = NULL, nsim = 200,
scale = fit$coeffs$D/sqrt(fit$ModelInfo$n))
bi.preds
plot(bi.preds) # Appears to level-off dramatically; perhaps indicative of this id's albumin
# levels, or acceleration in serBilir trajectory around 8.5 years.
Extract AIC from a joint model fit.
Description
Extract AIC from a joint model fit.
Usage
## S3 method for class 'joint'
extractAIC(fit, scale, k = 2, conditional = FALSE, ...)
Arguments
fit |
A fitted |
scale |
See |
k |
Numeric specifying the "weight" of degrees of freedom (default |
conditional |
Should AIC of conditional or observed log-likelihood be used? Defaults
to |
... |
additional arguments (none used). |
Value
A numeric vector of length 2, with first and second element giving
df
The degrees of freedom for the fitted model.
AIC
The Akaike Information Criterion for the fitted model.
Obtain joint model fitted values
Description
returns the fitted values from a joint
object. Note that the
linear predictor for each k=1,\dots,K
response is returned.
Usage
## S3 method for class 'joint'
fitted(object, as = "matrix", ...)
Arguments
object |
a joint model fit by the |
as |
should the fitted values be returned as a |
... |
Additional arguments (none used). |
Value
A matrix (or list) with a column (or list entry) for each of the fitted linear
predictors with class fitted.joint
.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
Examples
# Bivariate fit on PBC data -----------------------------------------
data(PBC)
# Subset data and remove NAs
PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time',
'albumin', 'platelets'))
PBC <- na.omit(PBC)
# Specify bivariate fit
long.formulas <- list(
albumin ~ time*drug + (1 + time|id),
platelets ~ time * drug + (1 + time|id)
)
surv.formula <- Surv(survtime, status) ~ drug
fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian', 'poisson'))
fitted(fit)
Extract fixed effects from a joint
object.
Description
Extract fixed effects from a joint
object.
Usage
## S3 method for class 'joint'
fixef(object, what = c("long", "surv"), ...)
Arguments
object |
a joint model fit by the |
what |
character string. Should the |
... |
additional arguments (none used). |
Value
A vector containing requested fixed effects.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
Examples
# Univariate fit on PBC data -------------------------------------------
data(PBC)
# Subset data and remove NAs
PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time',
'albumin'))
PBC <- na.omit(PBC)
# Specify simple univariate fit
long.formulas <- list(
albumin ~ time + (1 + time|id)
)
surv.formula <- Surv(survtime, status) ~ drug
fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian'))
fixef(fit, 'long')
fixef(fit, 'surv')
Fit a joint model to time-to-event and multivariate longitudinal data
Description
Fit a joint model to time-to-event and multivariate longitudinal data
Usage
joint(
long.formulas,
surv.formula,
data,
family,
disp.formulas = NULL,
control = list()
)
Arguments
long.formulas |
A list of formula objects specifying the |
surv.formula |
A formula specifying the time-to-event sub-model. Must be usable by
|
data |
A |
family |
A list of length |
disp.formulas |
An optional list of length |
control |
A list of control values:
|
Details
Function joint
fits a joint model to time-to-event data and multivariate
longitudinal data. The longitudinal data can be specified by numerous models encompassing
a fairly wide range of data. This joint model fit is achieved by the use of an approximate
EM algorithm first proposed in Bernhardt et al. (2015), and later used in the 'classic'
multivariate joint model in Murray and Philipson (2022). Each longitudinal response is
modelled by
h_k(E[Y_{ik}|b_{ik};\Omega]) = X_{ik}\beta_k + Z_{ik}b_{ik}
where h_k
is a known, monotonic link function. An association is induced between the
K
th response and the hazard \lambda_i(t)
by:
\lambda_i(t)=\lambda_0(t)\exp\{S_i^T\zeta + \sum_{k=1}^K\gamma_kW_k(t)^Tb_{ik}\}
where \gamma_k
is the association parameter and W_k(t)
is the vector function of
time imposed on the K
th random effects structure (i.e. intercept-and-slope; spline).
Value
An object with class joint
. See joint.object
for information.
Family specification
Currently, five families are available for implementation, spanning continuous, binary and count data types:
'gaussian'
Normally distributed. The identity link is used. A term
\sigma_k
will be estimated, denoting the variance of this response'binomial'
For binary data types, a logit link is used.
'poisson'
For count data types where dispersion is either non-consequential or ignored. A log link is used.
'genpois'
For count data types where dispersion is at least of some secondary interest. A log link is used. A term
\sigma_k
is estimated, denoting the dispersion,\varphi
of the response. This follows interpretation of Zamani & Ismail (2012):\varphi>0
: Over-dispersion;\varphi<0
: Under-dispersion.Var[Y]=(1+\varphi)^2\mu
.'Gamma'
For continuous data where a Gamma distribution might be sensible. The log link is used. A term
\sigma_k
is be estimated, denoting the (log) shape of the distribution, which is then reported as\varphi_k=\exp\{\sigma_k\}
."negbin"
For count data types where overdispersion is modelled. A log link is used. A term
\sigma_k
is estimated, which is then reported as\varphi_k=\exp\{\sigma_k\}
which is the overdispersion. The variance of the response isVar[Y]=\mu+\mu^2/\varphi
.
For families "negbin"
, "Gamma"
, "genpois"
, the user can define the
dispersion model desired in disp.formulas
. For the "negbin"
and "Gamma"
cases, we define \varphi_i=\exp\{W_i\sigma_i\}
(i.e. the exponent of the linear
predictor of the dispersion model; and for "genpois"
the identity of the linear
is used.
Dispersion models
The disp.formulas
in the function call allows the user to model the dispersion for
a given sub-model if wanted. The default value disp.formulas = NULL
simply imposes
an 'intercept only' model. If the k
th item in disp.formulas
corresponds to
a longitudinal sub-model with no dispersion term, then it is simply ignored. With this in
mind then, if a dispersion model is only required for, say, one sub-model, then the
corresponding item in this list of models should be specified as such, with the others set to
~1
.
Standard error estimation
We follow the approximation of the observed empirical information matrix detailed by
Mclachlan and Krishnan (2008), and later used in joineRML
(Hickey et al., 2018).
These are only calculated if post.process=TRUE
. Generally, these SEs are well-behaved,
but their reliability will depend on multiple factors: Sample size; number of events;
collinearity of REs of responses; number of observed times, and so on. Some more discussion/
references are given in vcov.joint
.
Convergence of the algorithm
A few convergence criteria (specified by control$conv
) are available:
abs
Convergence reached when maximum absolute change in parameter estimates is
<tol.abs
.rel
Convergence reached when maximum absolute relative change in parameter estimates is
<tol.rel
. A small amount (tol.den
) is added to the denominator to eschew numerical issues if parameters are nearly zero.either
Convergence is reached when either
abs
orrel
are met.sas
Assess convergence for parameters
|\Omega_a|
<tol.thr
by theabs
criterion, elserel
. This is the default.
Note that the baseline hazard is updated at each EM iteration, but is not monitored for convergence.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
References
Bernhardt PW, Zhang D and Wang HJ. A fast EM Algorithm for Fitting Joint Models of a Binary Response to Multiple Longitudinal Covariates Subject to Detection Limits. Computational Statistics and Data Analysis 2015; 85; 37–53
Hickey GL, Philipson P, Jorgensen A, Kolamunnage-Dona R. joineRML
: a joint model and
software package for time-to-event and multivariate longitudinal outcomes.
BMC Med. Res. Methodol. 2018; 50
McLachlan GJ, Krishnan T. The EM Algorithm and Extensions. Second Edition. Wiley-Interscience; 2008.
Murray, J and Philipson P. A fast approximate EM algorithm for joint models of survival and multivariate longitudinal data.Computational Statistics and Data Analysis 2022; 170; 107438
Zamani H and Ismail N. Functional Form for the Generalized Poisson Regression Model, Communications in Statistics - Theory and Methods 2012; 41(20); 3666-3675.
See Also
summary.joint
, logLik.joint
, boot.joint
,
extractAIC.joint
, fixef.joint
, ranef.joint
,
vcov.joint
, joint.object
and xtable.joint
. For
data simulation see simData
.
Examples
# 1) Fit on simulated bivariate data, (1x gaussian, 1x poisson) --------
beta <- do.call(rbind, replicate(2, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(0.3, -0.3)
D <- diag(c(0.25, 0.09, 0.25, 0.05))
family <- list('gaussian', 'poisson')
data <- simData(ntms = 10, beta = beta, D = D, n = 100,
family = family, zeta = c(0, -0.2),
sigma = list(0.16, 0), gamma = gamma)$data
# Specify formulae and target families
long.formulas <- list(
Y.1 ~ time + cont + bin + (1 + time|id), # Gaussian
Y.2 ~ time + cont + bin + (1 + time|id) # Poisson
)
surv.formula <- Surv(survtime, status) ~ bin
fit <- joint(long.formulas, surv.formula, data, family)
# 2) Fit on PBC data -----------------------------------------------------
data(PBC)
# Subset data and remove NAs
PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time',
'serBilir', 'albumin', 'spiders', 'platelets'))
PBC <- na.omit(PBC)
# Specify GLMM sub-models, including interaction and quadratic time terms
long.formulas <- list(
log(serBilir) ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id),
albumin ~ drug * time + (1 + time|id),
platelets ~ drug * time + (1 + time|id),
spiders ~ drug * time + (1|id)
)
surv.formula <- Surv(survtime, status) ~ drug
fit <- joint(long.formulas, surv.formula, PBC,
family = list("gaussian", "gaussian", "poisson", "binomial"),
control = list(verbose = TRUE))
fit
# 3) Fit with dispersion models ----------------------------------------
beta <- do.call(rbind, replicate(2, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(0.3, -0.3)
D <- diag(c(0.25, 0.09, 0.25, 0.05))
family <- list('negbin', 'poisson') # As an example; only requires one dispersion model.
sigma <- list(c(1, 0.2), 0) # Need to specify the model in simData call too.
disp.formulas = list(~time, ~1) # Even though poisson doesn't model dispersion, need to
# populate this element in disp.formulas!
# Simulate some data
data <- simData(ntms = 10, beta = beta, D = D, n = 500,
family = family, zeta = c(0, -0.2), sigma = sigma,
disp.formulas = disp.formulas, gamma = gamma)$data
# Now fit using joint
long.formulas <- list(
Y.1 ~ time + cont + bin + (1+time|id),
Y.2 ~ time + cont + bin + (1+time|id)
)
fit <- joint(
long.formulas, Surv(survtime, status) ~ bin,
data, family, disp.formulas = disp.formulas
)
fit
summary(fit)
Fitted joint
object
Description
An object returned by the joint
function, with class joint
a fitted joint model. Objects of this class currently have methods for: logLik
,
print
, ranef
, fixef
, summary
, AIC
, and vcov
.
Usage
joint.object
Format
An object of class NULL
of length 0.
Value
A list with the following components.
coeffs
A list containing parameter estimates:
D
The variance-covariance matrix of the random effects.
beta
Vector of fixed effects for longitudinal processes.
sigma
List of dispersion parameters, families with no dispersion parameter are returned as an unnamed zero value.
gamma
Vector of association parameters.
zeta
Vector of time-invariant survival coefficients.
hazard
A matrix containing unique failure times
ft
, their hazard contributionhaz
and the number of events at that failure timenev
.ModelInfo
A list containing information on the model fit:
ResponseInfo
A vector containing response names with (family) reported.
Resps
A vector containing response names only.
family
A list of families fit.
K
An integer specifying the number of longitudinal sub-models.
Pcounts
A list containing informations about the number of parameters/random effects:
P
A vector of length K containing the number of fixed effects for each response (in order).
Pd
A vector of length K containing the number of dispersion parameters for each response (in order) 0 denotes no parameter for that response.
q
An integer denoting the number of random effects.
vD
An integer denoting the number of unique variance-covariance parameters estimated.
long.formulas
A list of
long.formulas
(i.e. fromjoint
call).disp.formulas
A list of
disp.formulas
(i.e. fromjoint
call). If nodisp.formulas
are supplied tojoint
, then this is populated by a list ofK
"~1
". The environment is set toparent.frame
in this case to avoid memory overheads in returned objects.surv.formula
Formula object from
joint
call.survtime
The name of the event time used in
surv.formula
.status
The name of the event indicator used in
surv.formula
.control
List of control parameters used, see
joint
.convergence.criteria
List of parameters relating to the stopping rule.
inds
A list of length two, named
R
andCpp
, each of which contains the indices for fixed effects\beta
for each response, or the random effectsb
for the named platform.n
Number of subjects.
nobs
A vector containing total number of observations for each response.
mi
A
K
xn
matrix containing the number of observations for subjecti
for thek
th response.nev
Number of events.
id.assign
A list containing the original ids of subjects in the
data
supplied tojoint
, and the id assigned to them for use in subsequent functions.
Hessian
The (approximated) Hessian found at MLEs. Only returned if control argument
post.process=TRUE
.
vcov
The full variance-covariance matrix between parameters. Only returned if control argument
post.process=TRUE
.SE
A named vector of approximated standard error for each estimated parameter. Only returned if control argument
post.process=TRUE
.logLik
log-likelihood evaluated at parameter estimates. Only returned if control argument
post.process=TRUE
.REs
The random effects, with subject-specific variance matrices attributed. If control argumnet
post.process=TRUE
then these are found at MLEs (i.e. are posterior estimates), otherwise they are taken from the final EM iteration.elapsed.time
Named numeric containing breakdown of elapsed time for
joint
fit.dmats
A list of data matrices on each of the longitudinal and survival processes for each subject.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
Log-likelihood for joint model.
Description
Calculate joint log-likelihood, degrees of freedom, AIC and BIC of joint model fit.
Usage
## S3 method for class 'joint'
logLik(object, conditional = FALSE, ...)
Arguments
object |
a |
conditional |
Logical. Should the conditional or observed data log-likelihood be returned? See details. |
... |
additional arguments (none used). |
Details
Calculate the log-likelihood of a joint model of survival and multivariate longitudinal
data (i.e. a joint
object). The argument conditional
manages whether
or not the log-likelihood conditional on the random effects, or simply
the observed data log-likelihood is returned (the default, conditional = FALSE
).
If conditional = TRUE
, then the log-likelihood conditional on the random
effects is returned. That is
\log f(T_i, \Delta_i, Y_i|b_i;\Omega) =
\log f(Y_i|b_i; \Omega) + \log f(T_i, \Delta_i|b_i; \Omega) + \log f(b_i|\Omega)
If conditional = FALSE
, then the observed data log-likelihood is returned i.e.
\log\int f(Y_i|b_i; \Omega)f(T_i, \Delta_i|b_i; \Omega)f(b_i|\Omega)db_i.
Additionally, the degrees of freedom, \nu
is given by
\nu = \code{length(vech(D))} + \sum_{k=1}^K\{P_k + P_{\sigma_k}\} + P_s,
where P_k
denotes the number of coefficients estimated for the k
th response,
and P_{\sigma_k}
the number of dispersion parameters estimated. P_s
denotes
the number of survival coefficients, i.e. the length of c(zeta, gamma)
. Finally,
all covariance parameters are captured in length(vech(D))
.
With the degrees of freedom, we can additionally compute AIC and BIC, which are defined in no special way; and are calculated using the observed data log-likelihood.
Value
Returns an object of class logLik
, a number which is the log-likelihood
of the fitted model object
. This has multiple attributes: df
which is the
degrees of freedom, df.residual
; the number of residual degrees of freedom;
AIC
and BIC
which are the Akaike or Bayes information criterion evaluated at
either the conditional or observed log-likelihood (as requested by argument
conditional
).
Author(s)
James Murray (j.murray7@ncl.ac.uk)
References
Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics 2000; 1(4); 465-480.
Wulfsohn MS, Tsiatis AA. A joint model for survival and longitudinal data measured with error. Biometrics 1997; 53(1); 330-339.
See Also
extractAIC.joint
and anova.joint
Examples
# Bivariate simulated data (2x Gaussian)
data <- simData(n = 100,
D = diag(c(.25, .04, .2, .02)),
gamma = c(0.4, -0.2), theta = c(-2, .2))$data
fit <- joint(list(
Y.1 ~ time + cont + bin + (1 + time|id),
Y.2 ~ time + cont + bin + (1 + time|id)
), Surv(survtime, status) ~ cont + bin,
data = data,
family = list('gaussian', 'gaussian'))
logLik(fit)
Parsing the survival formula and constructing all survival-related data objects.
Description
Creates a set of survival data and fits a coxph
model
using a survival formula and a data set.
Usage
parseCoxph(surv.formula, data, center = TRUE)
Arguments
surv.formula |
A formula readable by 'coxph'. |
data |
a set of data containing covariate information for variables named by ‘surv.formula'. Can be of any ’completeness', as the function returns a reduced set. |
center |
Should the covariate matrices be mean-centered before being returned?
defaults to |
Value
A list with class parseCoxph
containing:
survdata
reduced version of
data
, with only one row per subject, with covariates specified bysurv.formula
along with survival time and failure status.Smat
matrix containing all requisite survival covariates (one row per subject).
ph
the model fit from
coxph
.Delta
list of failure indicators for each of the unique subjects.
n
number of unique subjects.
ft
vector of unique failure times.
nev
vector containing number of failures at each failure time
ft
.survtime
the name of the
time
variable insurv.formula
.status
the name of the
event
variable insurv.formula
.
Examples
data = simData()$data
parseCoxph(Surv(survtime, status) ~ bin, data = data)
Plot receiver operator characteristics.
Description
Produces a simple plot showing the true positive rate (sensitivity) against
the false positive rate (1-specificy) for a dynamic prediction routine on a joint
model
along a specified time interval.
Usage
## S3 method for class 'ROC.joint'
plot(x, legend = TRUE, show.Youden = TRUE, show.F1 = FALSE, ...)
Arguments
x |
an object with class |
legend |
should a legend displaying the number in risk set; number of failures in interval;
area under the ROC curve and median Brier score be added to the bottom-right corner of the ROC
plot? Default is |
show.Youden |
should a line be drawn showing optimal cut-point using Youden's J statistic?
Defaults to |
show.F1 |
should a line be drawn showing optimal cut-point using the F-score?
Defaults to |
... |
additional arguments (none used). |
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
Plot posterior distribution of the random effects for a joint
model.
Description
Plot posterior distribution of the random effects for a joint
model.
Usage
## S3 method for class 'cond.b.joint'
plot(
x,
id = NULL,
show.cov = TRUE,
nrow = NULL,
ncol = NULL,
title = NULL,
...
)
Arguments
x |
object of class |
id |
integer, if specified this shows the plots the posterior density for the chosen
id (as they appeared in original |
show.cov |
logical, should the 'true' (normal) density be overlaid on these plots? If
|
nrow |
integer specifying the number of rows to use in paneled plot. |
ncol |
integer specifying the number of columns to use in paneled plot. Note that
both |
title |
optional character string to specify the title. This is placed at the top middle of the graphics device. |
... |
Additional arguments, these are passed to the plotting of the individual densities. |
Examples
# Bivariate Gaussian
dat <- simData(n = 100)$data
long.formulas <- list(Y.1 ~ time + cont + bin + (1 + time|id),
Y.2 ~ time + cont + bin + (1 + time|id))
surv.formula <- Surv(survtime, status) ~ bin
fit <- joint(long.formulas, surv.formula, dat, list("gaussian","gaussian"))
cond.b <- cond.ranefs(fit, burnin = 500L, N = 4500L)
# Posterior for entire sample with dummy title
plot(cond.b, title = "Example title")
# Posterior for a randomly selected id
dummy.id <- sample(1:100, 1)
# Should show good agreement between true posterior and approximate normal.
plot(cond.b, id = dummy.id, title = paste0("id: ", dummy.id))
Plot conditional survival probabilities.
Description
Produces a simple plot of the probability the subject survives given failure times along with the 95% confidence interval if probabilities are generated from a simulation scheme, and simply the first-order estimate otherwise.
Usage
## S3 method for class 'dynPred'
plot(x, what = c("median", "mean"), ...)
Arguments
x |
an object with class |
what |
should the |
... |
additional arguments (none used). |
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
Plot joint model residuals
Description
Plot residuals obtained by a joint model (obtained by joint
).
If the residuals.joint
object represents the longitudinal process, a simple (paneled)
plot is produced (one for each response). If the residual object contains the Cox-Snell
residuals then several plots are produced (interactively): The KM estimate of survival
function of said residuals and then repeated for each survival covariate in the model call
to joint
(if requested).
Usage
## S3 method for class 'residuals.joint'
plot(x, strata = FALSE, ...)
Arguments
x |
an object with class |
strata |
logical, should strata (for the survival sub-model only). Defaults to
|
... |
additional arguments (none used). |
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
Extract random effects from a joint
object.
Description
Return the random effects \hat{\boldsymbol{b}}
which maximises the complete
data log-likelihood at the MLEs \hat{\Omega}
.
Usage
## S3 method for class 'joint'
ranef(object, Var = FALSE, ...)
Arguments
object |
a joint model fit by the |
Var |
logical, should the estimated variance of the random effects at |
... |
additional arguments (none used). |
Value
A matrix
containing required random effects effects. If Var=TRUE
,
instead a list is returned with first element the matrix
of random effects and second a
matrix
of the variances \hat{\Sigma}
. Note that these are posterior modes
of the random effects. Conditional distribution can be found by cond.ranefs
.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
Examples
# Univariate fit on PBC data -----------------------------------------
data(PBC)
# Subset data and remove NAs
PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time',
'albumin'))
PBC <- na.omit(PBC)
# Specify univariate fit
long.formulas <- list(
albumin ~ time*drug + (1 + time|id)
)
surv.formula <- Surv(survtime, status) ~ drug
fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian'))
b <- ranef(fit, FALSE)
Obtain joint model residuals
Description
returns the Pearson residuals values from a joint
object.
Usage
## S3 method for class 'joint'
residuals(
object,
what = c("longit", "surv"),
type = c("response", "pearson"),
...
)
Arguments
object |
a joint model fit by |
what |
character string. Should the |
type |
character. The residual type for |
... |
Additional arguments (none used). |
Value
a named list of length K
of class residuals.joint
containing
residuals produced by the joint model for each of the k=1,\dots,K
responses,
along with the fitted values as an attribute.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
fitted.joint
plot.residuals.joint
Examples
# Trivariate fit on PBC data -----------------------------------------
data(PBC)
# Subset data and remove NAs
PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time',
'albumin', 'hepatomegaly', 'platelets'))
PBC <- na.omit(PBC)
# Specify trivariate fit
long.formulas <- list(
albumin ~ time*drug + (1 + time|id),
platelets ~ time * drug + (1 + time|id),
hepatomegaly ~ time * drug + (1|id)
)
surv.formula <- Surv(survtime, status) ~ drug
fit <- joint(long.formulas, surv.formula, PBC,
family = list('gaussian', 'poisson', 'binomial'))
R <- residuals(fit, type = 'pearson')
plot(R)
plot(residuals(fit, what = "surv"))
Simulate realisations from a generalised poisson distribution
Description
Simulate realisations from a generalised poisson distribution
Usage
rgenpois(mu, phi)
Arguments
mu |
A numeric vector of rates |
phi |
A numeric specifying the dispersion |
Details
Follows the "GP-1" implementation of the generalised Poisson distribution outlined
in Zamani & Ismail (2012). The variance of produced Y
is (1+\varphi)^2\mu
. As such
the dispersion parameter is bounded (i.e. not in positive reals as with CMP distribution).
Value
An appropriately-dimensioned vector of count data.
References
Zamani H and Ismail N. Functional Form for the Generalized Poisson Regression Model, Communications in Statistics - Theory and Methods 2012; 41(20); 3666-3675.
Simulate data from a multivariate joint model
Description
Simulate multivariate longitudinal and survival data from a joint model
specification, with potential mixture of response families. Implementation is similar
to existing packages (e.g. joineR
, joineRML
).
Usage
simData(
n = 250,
ntms = 10,
fup = 5,
family = list("gaussian", "gaussian"),
sigma = list(0.16, 0.16),
beta = rbind(c(1, 0.1, 0.33, -0.5), c(1, 0.1, 0.33, -0.5)),
D = NULL,
gamma = c(0.5, -0.5),
zeta = c(0.05, -0.3),
theta = c(-4, 0.2),
cens.rate = exp(-3.5),
regular.times = TRUE,
dof = Inf,
random.formulas = NULL,
disp.formulas = NULL,
return.ranefs = FALSE
)
Arguments
n |
the number of subjects |
ntms |
the number of time points |
fup |
the maximum follow-up time, such that t = [0, ..., fup] with length |
family |
a |
sigma |
a |
beta |
a |
D |
a positive-definite matrix specifying the variance-covariance matrix for the random effects. If not supplied an identity matrix is assumed. |
gamma |
a |
zeta |
a vector of length 2 specifying the coefficients for the baseline covariates in the survival sub-model, in the order of continuous and binary. |
theta |
parameters to control the failure rate, see baseline hazard. |
cens.rate |
parameter for |
regular.times |
logical, if |
dof |
integer, specifies the degrees of freedom of the multivariate t-distribution
used to generate the random effects. If specified, this t-distribution is used. If left
at the default |
random.formulas |
allows user to specify if an intercept-and-slope ( |
disp.formulas |
allows user to specify the dispersion model simulated. Intended use is
to allow swapping between intercept only (the default) and a time-varying one ( |
return.ranefs |
a logical determining whether the true random effects should be
returned. This is largely for internal/simulation use. Default |
Details
simData
simulates data from a multivariate joint model with a mixture of
families for each k=1,\dots,K
response. The specification of family
changes
requisite dispersion parameter sigma
, if applicable. The family
list can
(currently) contain:
"gaussian"
Simulated with identity link, corresponding item in
sigma
will be the variance."poisson"
Simulated with log link, corresponding dispersion in
sigma
can be anything, as it doesn't impact simulation."binomial"
Simulated with logit link, corresponding dispersion in
sigma
can be anything, as it doesn't impact simulation."negbin"
Simulated with a log link, corresponding item in
sigma
will be the overdispersion defined on the log scale. Simulated variance is\mu+\mu^2/\varphi
."genpois"
Simulated with a log link, corresponding item in
sigma
will be the dispersion. Values < 0 correspond to under-dispersion, and values > 0 over- dispersion. Seergenpois
for more information. Simulated variance is(1+\varphi)^2\mu
."Gamma"
Simulated with a log link, corresponding item in
sigma
will be the shape parameter, defined on the log-scale.
Therefore, for families "negbin"
, "Gamma"
, "genpois"
, the user can
define the dispersion model desired in disp.formulas
, which creates a data matrix
W
. For the "negbin"
and "Gamma"
cases, we define
\varphi_i=\exp\{W_i\sigma_i\}
(i.e. the exponent of the linear predictor of the
dispersion model); and for "genpois"
the identity of the linear is used.
Value
A list of two data.frame
s: One with the full longitudinal data, and another
with only survival data. If return.ranefs=TRUE
, a matrix of the true b
values is
also returned. By default (i.e. no arguments provided), a bivariate Gaussian set of joint
data is returned.
Baseline hazard
When simulating the survival time, the baseline hazard is a Gompertz distribution controlled
by theta=c(x,y)
:
\lambda_0(t) = \exp{x + yt}
where y
is the shape parameter, and the scale parameter is \exp{x}
.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
References
Austin PC. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Stat Med. 2012; 31(29): 3946-3958.
See Also
Examples
# 1) A set of univariate data ------------------------------------------
beta <- c(2.0, 0.33, -0.25, 0.15)
# Note that by default arguments are bivariate, so need to specify the univariate case
univ.data <- simData(beta = beta,
gamma = 0.15, sigma = list(0.2), family = list("gaussian"),
D = diag(c(0.25, 0.05)))
# 2) Univariate data, with failure rate controlled ---------------------
# In reality, many facets contribute to the simulated failure rate, in
# this example, we'll just atler the baseline hazard via 'theta'.
univ.data.highfail <- simData(beta = beta,
gamma = 0.15, sigma = list(0.0), family = list("poisson"),
D = diag(c(0.40, 0.08)), theta = c(-2, 0.1))
# 3) Trivariate (K = 3) mixture of families with dispersion parameters -
beta <- do.call(rbind, replicate(3, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(0.3, -0.3, 0.3)
D <- diag(c(0.25, 0.09, 0.25, 0.05, 0.25, 0.09))
family <- list('gaussian', 'genpois', 'negbin')
sigma <- list(.16, 1.5, log(1.5))
triv.data <- simData(ntms=15, family = family, sigma = sigma, beta = beta, D = D,
gamma = gamma, theta = c(-3, 0.2), zeta = c(0,-.2))
# 4) K = 4 mixture of families with/out dispersion ---------------------
beta <- do.call(rbind, replicate(4, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(-0.75, 0.3, -0.6, 0.5)
D <- diag(c(0.25, 0.09, 0.25, 0.05, 0.25, 0.09, 0.16, 0.02))
family <- list('gaussian', 'poisson', 'binomial', 'gaussian')
sigma <- list(.16, 0, 0, .05) # 0 can be anything here, as it is ignored internally.
mix.data <- simData(ntms=15, family = family, sigma = sigma, beta = beta, D = D, gamma = gamma,
theta = c(-3, 0.2), zeta = c(0,-.2))
# 5) Bivariate joint model with two dispersion models. -----------------
disp.formulas <- list(~time, ~time) # Two time-varying dispersion models
sigma <- list(c(0.00, -0.10), c(0.10, 0.15)) # specified in form of intercept, slope
D <- diag(c(.25, 0.04, 0.50, 0.10))
disp.data <- simData(family = list("genpois", "negbin"), sigma = sigma, D = D,
beta = rbind(c(0, 0.05, -0.15, 0.00), 1 + c(0, 0.25, 0.15, -0.20)),
gamma = c(1.5, 1.5),
disp.formulas = disp.formulas, fup = 5)
# 6) Trivariate joint model with mixture of random effects models ------
# It can be hard to e.g. fit a binomial model on an intercept and slope, since e.g.
# glmmTMB might struggle to accurately fit it (singular fits, etc.). To that end, could
# swap the corresponding random effects specification to be an intercept-only.
family <- list("gaussian", "binomial", "gaussian")
# A list of formulae, even though we want to change the second sub-model's specification
# we need to specify the rest of the items, too (same as disp.formulas, sigma).
random.formulas <- list(~time, ~1, ~time)
beta <- rbind(c(2, -0.2, 0.5, -0.25), c(0, 0.5, 1, -1), c(-2, 0.2, -0.5, 0.25))
# NOTE that the specification of RE matrix will need to match.
D <- diag(c(0.25, 0.09, 1, 0.33, 0.05))
# Simulate data, and return REs as a sanity check...
mix.REspec.data <- simData(beta = beta, D = D, family = family,
gamma = c(-0.5, 1, 0.5), sigma = list(0.15, 0, 0.15),
random.formulas = random.formulas, return.ranefs = TRUE)
Summary of an joint
object.
Description
Generate summary of a fitted multivariate joint model.
Usage
## S3 method for class 'joint'
summary(object, ...)
Arguments
object |
a joint model fit by the |
... |
additional arguments (none used). |
Value
Object of class summary.joint
.
Author(s)
James Murray j.murray7@ncl.ac.uk
See Also
joint
and joint.object
Examples
# Simple univariate on log(serum bilirubin) ----------------------------
data(PBC)
long.formulas <- list(
log(serBilir) ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id)
)
surv.formula <- Surv(survtime, status) ~ sex + drug
fit <- joint(long.formulas = long.formulas,
surv.formula = surv.formula,
data = PBC, family = list("gaussian"))
summary(fit)
# Bivariate with a dispersion model ------------------------------------
PBC <- na.omit(PBC[,c('id', 'survtime', 'status', 'sex',
'drug', 'platelets', 'albumin', 'time')])
long.formula <- list(
platelets ~ time * drug + (1 + time|id),
albumin ~ time * drug + (1 + time|id)
)
surv.formula <- Surv(survtime, status) ~ sex + drug
fit <- joint(long.formula, surv.formula, PBC,
family = list("negbin", "gaussian"),
disp.formula = list(~time, ~1))
summary(fit)
Extract the variance-covariance matrix from a joint
fit.
Description
Extract the variance-covariance matrix from a joint
fit.
Usage
## S3 method for class 'joint'
vcov(object, corr = FALSE, ...)
Arguments
object |
a joint model fit by the |
corr |
should the correlation matrix be returned instead of the variance-covariance? |
... |
extra arguments (none used). |
Details
Uses the observed-empirical approximation of information matrix (Mclachlan & Krishnan, 2008). The standard errors for the baseline hazard are not estimated.
Value
A variance-covariance matrix for the joint model object.
Methodology
Many competing ways exist for obtaining the observed information matrix in an EM algorithm.
In the context of joint modelling, the observed empirical approximation of the information
matrix has been used previously (joineRML
, Hickey et al. 2018). Elsewhere,
estimation of the observed information in a semi-parametric setting is outlined neatly in
Xu et al. (2014). Here, they advocate for approximation of this information matrix by
numerical differentiation of the profile Fisher Score vector. We do not consider this
methodology owing to its computational expense. That is, for each element of \Omega
which is perturbed by some small amount \tilde{\Omega}^{p}
, we must re-calculate
\hat{b}_i
and \hat{\Sigma}_i
.
Author(s)
James Murray j.murray7@ncl.ac.uk
References
Hickey GL, Philipson P, Jorgensen A, Kolamunnage-Dona R. joineRML
: a joint model and
software package for time-to-event and multivariate longitudinal outcomes.
BMC Med. Res. Methodol. 2018; 50
McLachlan GJ, Krishnan T. The EM Algorithm and Extensions. Second Edition. Wiley-Interscience; 2008.
Xu C, Baines PD, Wang J. Standard error estimation using the EM algorithm for the joint modeling of survival and longitudinal data. Biostatistics 2014; 15(4).
Examples
# Univariate fit on PBC data -------------------------------------------
data(PBC)
# Subset data and remove NAs
PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time',
'albumin'))
PBC <- na.omit(PBC)
# Specify univariate fit
long.formulas <- list(
albumin ~ time + (1 + time|id)
)
surv.formula <- Surv(survtime, status) ~ drug
fit <- joint(long.formulas, surv.formula, PBC, family = list('gaussian'))
vcov(fit)
Print an LaTeX-ready xtable
for a joint
object.
Description
Prints an xtable
output for a fitted joint
object to the console,
or to a specified save location
Usage
## S3 method for class 'joint'
xtable(
x,
caption = NULL,
label = NULL,
align = NULL,
digits = NULL,
display = NULL,
auto = FALSE,
p.val = FALSE,
max.row = NULL,
dp = 3,
vcov = FALSE,
capture = FALSE,
capture.location = "",
hlines = "middle-bottom",
booktabs = TRUE,
size = "footnotesize",
...
)
Arguments
x |
a joint model fit by the |
caption |
character, specifies the |
label |
character, specifies the |
align |
character, specifies the |
digits |
integer, specifies the |
display |
character, specifies the |
auto |
logical, specifies the |
p.val |
logical, should p-values be returned? Defaults to |
max.row |
integer, the number of rows after which the table is ‘broken’ vertically
and merged horizontally; useful for long tables. Defaults to |
dp |
integer, the number of decimal places to round the estimate, standard error and
confidence intervals to; defaults to |
vcov |
logical, should the half-vectorisation of the block diagonal of covariance
matrix be reported? Default is |
capture |
logical, should the printed |
capture.location |
character, if |
hlines |
character, specifies which horizontal lines are used in the outputted
LaTeX table. Supply a character string which contains |
booktabs |
logical, if |
size |
character, LaTeX size to be placed before the tabular environment, defaults
to |
... |
additional arguments, none used. |
Value
A LaTeX-ready xtable
print-out of the joint model. A list containing
constituent tables is also returned invisibly, along with the final xtable
output.
Author(s)
James Murray (j.murray7@ncl.ac.uk).
See Also
Examples
# Bivariate joint model ------------------------------------------------
require(xtable)
data <- simData(n = 100)$data
long.formula <- list(
Y.1 ~ time + cont + bin + (1 + time|id),
Y.2 ~ time + cont + bin + (1 + time|id)
)
surv.formula <- Surv(survtime, status) ~ cont + bin
family <- list("gaussian", "gaussian")
fit <- joint(long.formula, surv.formula, data, family)
xtable(fit)
# Example of arguments: add dummy caption, add p-values.
xtable(fit, p.val = TRUE, dp = 4, caption = "This is a caption")
# Change size, place horizontal lines everywhere
xtable(fit, size = "normalsize", hlines = c("top-middle-bottom"))
# Make a wider table without booktabs
xtable(fit, booktabs = FALSE, max.row = 6)