clusteredMSM provides nonparametric analysis of
clustered multistate process data, based on Bakoyannis (2021) and
Bakoyannis & Bandyopadhyay (2022). It exposes one user-facing
function, patp(), that:
The package depends only on survival.
clusteredMSM expects data in interval
format: one row per mutually-exclusive time interval per
subject, with columns for interval start time (Tstart), end
time (Tstop), starting state (Sstart), and
ending state (Sstop). The column names are flexible — they
are passed by name through the formula.
Censoring is encoded as Sstart == Sstop on the
final row of a subject’s record. Within each subject,
intervals must be temporally contiguous
(Tstop[k] == Tstart[k+1]) and state contiguous
(Sstop[k] == Sstart[k+1]).
library(clusteredMSM)
# 1=Healthy, 2=Ill, 3=Dead. Allowed: 1->2, 1->3, 2->3.
tmat <- trans_mat(
list(c(2, 3), 3, integer(0)),
names = c("Healthy", "Ill", "Dead")
)
tmat
#> to
#> from Healthy Ill Dead
#> Healthy NA 1 2
#> Ill NA NA 3
#> Dead NA NA NAA small example dataset in interval format:
mydata <- data.frame(
pid = c(1, 1, 2, 3, 4, 4),
site = c(1, 1, 1, 2, 2, 2),
treatment = c("A", "A", "A", "B", "B", "B"),
t0 = c(0.0, 1.5, 0.0, 0.0, 0.0, 1.0),
t1 = c(1.5, 3.0, 2.0, 1.0, 1.0, 2.5),
s0 = c(1, 2, 1, 1, 1, 2),
s1 = c(2, 3, 1, 3, 2, 3)
)
mydata
#> pid site treatment t0 t1 s0 s1
#> 1 1 1 A 0.0 1.5 1 2
#> 2 1 1 A 1.5 3.0 2 3
#> 3 2 1 A 0.0 2.0 1 1
#> 4 3 2 B 0.0 1.0 1 3
#> 5 4 2 B 0.0 1.0 1 2
#> 6 4 2 B 1.0 2.5 2 3Subject 1: H -> I -> D. Subject 2: censored healthy at t = 2.0
(s0 == s1). Subject 3: died at t = 1.0. Subject 4: H ->
I -> D.
fit <- patp(
msm(t0, t1, s0, s1) ~ 1,
data = mydata,
tmat = tmat,
id = "pid",
cluster = "site",
h = 1, j = 2, s = 0,
B = 1000, cband = TRUE,
seed = 1
)
fitfit$curves contains the time-indexed point estimate of
\[\begin{eqnarray*}
P(X(t) = j \,|\, X(s) = h) &=& P(X(t) = 2 \,|\, X(0) = 1) \\
&=& P(X(t) = 2),
\end{eqnarray*}\] (second equality due to the fact that all start
at state = 1 in the classical illness-death model), standard error,
pointwise confidence interval, and (with cband = TRUE)
simultaneous band limits.
If the formula’s right-hand side names a binary grouping variable,
patp() automatically estimates both curves and conducts a
Kolmogorov-Smirnov-type test:
tt <- patp(
msm(t0, t1, s0, s1) ~ treatment,
data = mydata,
tmat = tmat,
id = "pid",
cluster = "site",
h = 1, j = 2, s = 0,
B = 1000,
seed = 1
)
ttThe $curves slot now has one row block per group. The
$test slot contains the observed K-S statistic and the
bootstrap p-value.
For processes with cyclic transitions (e.g., illness-death with
recovery), use the long event format directly. The same
patp() call works:
tmat_rec <- trans_mat(
list(c(2, 3), c(1, 3), integer(0)),
names = c("Healthy", "Ill", "Dead")
)
# Subject who went Healthy -> Ill -> Healthy -> censored:
recovery_data <- data.frame(
pid = c(1, 1, 1),
t0 = c(0.0, 1.0, 2.0),
t1 = c(1.0, 2.0, 3.5),
s0 = c(1, 2, 1),
s1 = c(2, 1, 1) # last row: censored healthy
)
patp(msm(t0, t1, s0, s1) ~ 1,
data = recovery_data, tmat = tmat_rec,
id = "pid",
h = 1, j = 2, s = 0,
B = 500, seed = 1)When s > 0 and the Markov assumption is questionable,
use LMAJ = TRUE:
When cluster size is potentially informative (e.g., sicker centers contribute more patients), use the weighted estimator:
patp(msm(t0, t1, s0, s1) ~ 1,
data = mydata, tmat = tmat,
id = "pid", cluster = "site",
h = 1, j = 2, s = 0,
weighted = TRUE, B = 1000, seed = 1)weighted = TRUE requires the cluster
argument.
B = 0 doesSetting B = 0 returns the point estimate without any
bootstrap inference. This is permitted only for one-sample formulas;
two-sample formulas always require B > 0 (since the
entire point of a two-sample analysis is the test). Use
B = 0 for fast exploratory work, then switch to
B = 1000 for inference.
Bakoyannis G (2021). Nonparametric analysis of nonhomogeneous multistate processes with clustered observations. Biometrics 77(2):533-546. doi:10.1111/biom.13327
Bakoyannis G, Bandyopadhyay D (2022). Nonparametric tests for multistate processes with clustered data. Annals of the Institute of Statistical Mathematics 74(5):837-867. doi:10.1007/s10463-021-00819-x
Putter H, Spitoni C (2018). Non-parametric estimation of transition probabilities in non-Markov multi-state models: the landmark Aalen-Johansen estimator. Statistical Methods in Medical Research 27(7):2081-2092. doi:10.1177/0962280216674497