FastSurvival provides fast alternatives to three standard functions
in the survival package (survfit(),
survdiff(), coxph()) together with a clinical
trial data simulator (simdata_fast()). All functions are
designed for repeated evaluation inside simulation loops, and the core
computations are implemented in C++ via Rcpp.
This vignette has three parts:
survfit_fast(), survdiff_fast(), and
coxph_fast() return the same numerical results as the
corresponding functions in the survival package.survival package using
microbenchmark.simdata_fast() (nsim = 100) and summarize the
results of applying the three analysis functions to each simulated
trial.survival packageWe use the ovarian dataset bundled with the
survival package for the agreement checks.
ord <- order(ovarian$futime)
t_s <- ovarian$futime[ord]
e_s <- ovarian$fustat[ord]
g_s <- ovarian$rx[ord]survfit_fast() versus survfit() +
summary()We evaluate the Kaplan-Meier estimate at the landmark time
t = 500.
# FastSurvival
fit_fast <- survfit_fast(t_s, e_s, t_eval = 500, conf.type = "log")
fit_fast
#> Kaplan-Meier estimate at a single time point
#>
#> survival std.err lower 95% upper 95%
#> t = 500 0.5961 0.0999 0.4291 0.8279
#>
#> Confidence interval type: log
# survival
fit_ref <- survfit(Surv(futime, fustat) ~ 1, data = ovarian, conf.type = "log")
summary(fit_ref, times = 500)
#> Call: survfit(formula = Surv(futime, fustat) ~ 1, data = ovarian, conf.type = "log")
#>
#> time n.risk n.event survival std.err lower 95% CI upper 95% CI
#> 500 12 10 0.596 0.0999 0.429 0.828The survival estimate, standard error, and confidence interval at
t = 500 agree between the two implementations.
survdiff_fast() versus survdiff()# FastSurvival (two-sided chi-square statistic)
sd_fast <- survdiff_fast(ovarian$futime, ovarian$fustat, ovarian$rx,
control = 1, side = 2)
sd_fast
#> Log-rank test (two-group)
#>
#> N = 26
#>
#> Observed Expected (O-E)^2/E (O-E)^2/V
#> control 7 5.2335 0.5962 1.0627
#> treatment 5 6.7665 0.4612 1.0627
#>
#> Chi-square = 1.063 on 1 df, p-value = 0.3026
# survival
sd_ref <- survdiff(Surv(futime, fustat) ~ rx, data = ovarian)
sd_ref
#> Call:
#> survdiff(formula = Surv(futime, fustat) ~ rx, data = ovarian)
#>
#> N Observed Expected (O-E)^2/E (O-E)^2/V
#> rx=1 13 7 5.23 0.596 1.06
#> rx=2 13 5 6.77 0.461 1.06
#>
#> Chisq= 1.1 on 1 degrees of freedom, p= 0.3
cat("survdiff_fast chi-square:", as.numeric(sd_fast), "\n")
#> survdiff_fast chi-square: 1.06274
cat("survdiff chi-square:", sd_ref$chisq, "\n")
#> survdiff chi-square: 1.06274The observed and expected counts, the chi-square statistic, and the p-value agree.
coxph_fast() versus coxph()coxph() treats rx as numeric with
rx = 1 as the reference level, so we set
control = 1 for a consistent comparison.
coxph_fast() is based on the Pike-Halley Estimator (Homma,
2025), a closed-form approximation to the Cox partial likelihood
maximizer; the residual difference is of order O_p(n^{-3/2}).
# FastSurvival
cx_fast <- coxph_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1)
cx_fast
#> Pike-Halley Estimator (closed-form hazard ratio)
#>
#> Coefficients:
#> coef exp(coef) se(coef) z Pr(>|z|)
#> group -0.5964 0.5508 0.5868 -1.016 0.31
#>
#> Hazard ratio and 95% Wald confidence interval:
#> exp(coef) exp(-coef) lower .95 upper .95
#> group 0.5508 1.8155 0.1744 1.7399
# survival
cx_ref <- summary(coxph(Surv(futime, fustat) ~ rx, data = ovarian))
cx_ref$coefficients
#> coef exp(coef) se(coef) z Pr(>|z|)
#> rx -0.59638 0.5508019 0.5869895 -1.015998 0.3096304
cx_ref$conf.int
#> exp(coef) exp(-coef) lower .95 upper .95
#> rx 0.5508019 1.815535 0.1743207 1.740371
cat("coxph_fast HR :", cx_fast["exp(coef)"], "\n")
#> coxph_fast HR : 0.5508019
cat("coxph HR :", cx_ref$coefficients[, "exp(coef)"], "\n")
#> coxph HR : 0.5508019The hazard ratio, standard error, and Wald confidence interval agree to the expected order of accuracy.
The following benchmark compares FastSurvival against
the survival package using microbenchmark on
the ovarian dataset (n = 26). Because the FastSurvival
functions are designed for use inside simulation loops where the data
are generated in sorted order, we set presorted = TRUE and
sort the input once outside the benchmark.
library(microbenchmark)
ord <- order(ovarian$futime)
t_s <- ovarian$futime[ord]
e_s <- ovarian$fustat[ord]
g_s <- ovarian$rx[ord]
bm <- microbenchmark(
survfit_fast = survfit_fast(t_s, e_s, t_eval = 500),
survfit = summary(survfit(Surv(futime, fustat) ~ 1, data = ovarian),
times = 500),
survdiff_fast = survdiff_fast(t_s, e_s, g_s, control = 1, side = 2,
presorted = TRUE),
survdiff = survdiff(Surv(futime, fustat) ~ rx, data = ovarian),
coxph_fast = coxph_fast(t_s, e_s, g_s, control = 1, presorted = TRUE),
coxph = coxph(Surv(futime, fustat) ~ rx, data = ovarian),
times = 100
)
print(bm)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> survfit_fast 13.3 24.00 34.196 29.80 39.10 96.9 100
#> survfit 910.6 1172.15 1454.820 1315.40 1613.55 4482.6 100
#> survdiff_fast 7.8 18.05 24.337 23.50 28.60 51.7 100
#> survdiff 714.7 922.60 1108.567 992.70 1187.30 5118.9 100
#> coxph_fast 13.9 24.45 35.933 35.05 42.75 109.4 100
#> coxph 1053.9 1400.95 1743.869 1522.45 2067.00 4252.7 100The exact timings depend on the hardware and the random fluctuation
inherent in microbenchmarks, but on the ovarian dataset all
three FastSurvival functions are roughly 40 times faster than their
counterparts in the survival package. The speed gain
typically grows with the sample size; on a phase-3 trial sized dataset
(n = 600, event rate 80%) the speed gains observed in development reach
roughly 50x for survfit_fast(), 40x for
survdiff_fast(), and 30x for coxph_fast().
We now generate nsim = 100 trials with
simdata_fast() and apply the three analysis functions to
each simulated dataset.
We consider a two-group parallel trial with 100 patients per group, 12-month accrual, exponential survival with median 18 months in the control group and 24 months in the treatment group, and a small constant dropout hazard.
set.seed(42)
df <- simdata_fast(
nsim = 100,
n = c(100, 100),
a.time = c(0, 12),
a.rate = 200 / 12,
e.median = list(18, 24),
d.hazard = list(0.01, 0.01),
seed = 42
)
head(df)
#> sim group accrual_time surv_time dropout_time tte event calendar_time
#> 1 1 1 10.900445 8.599582 48.405809 8.599582 1 19.500027
#> 2 1 1 3.967349 14.797314 23.985600 14.797314 1 18.764663
#> 3 1 1 11.411225 21.757974 38.095037 21.757974 1 33.169199
#> 4 1 1 2.453655 6.568218 197.635604 6.568218 1 9.021873
#> 5 1 1 4.113958 15.996459 13.580202 13.580202 0 17.694159
#> 6 1 1 1.103466 1.092207 7.924918 1.092207 1 2.195673For each of the 100 simulated trials, we run:
survfit_fast() on the treatment group to obtain the
Kaplan-Meier estimate at the landmark time t = 18
months;survdiff_fast() to obtain the two-sided chi-square
statistic and p-value;coxph_fast() to obtain the hazard ratio estimate and
its Wald confidence interval.nsim <- 100L
t_land <- 18
km_mat <- matrix(NA_real_, nrow = nsim, ncol = 4L,
dimnames = list(NULL,
c("surv", "std.err", "lower", "upper")))
ld_vec <- numeric(nsim) # log-rank chi-square
hr_mat <- matrix(NA_real_, nrow = nsim, ncol = 5L,
dimnames = list(NULL,
c("coef", "exp(coef)", "se(coef)",
"lower .95", "upper .95")))
for (s in seq_len(nsim)) {
d <- df[df$sim == s, ]
# Treatment-group KM at t = 18 (sort once)
d_trt <- d[d$group == 2L, ]
ord_t <- order(d_trt$tte)
km_mat[s, ] <- unclass(
survfit_fast(d_trt$tte[ord_t], d_trt$event[ord_t], t_eval = t_land)
)
# Pooled sort for log-rank and Cox
ord_p <- order(d$tte)
t_p <- d$tte[ord_p]
e_p <- d$event[ord_p]
g_p <- d$group[ord_p]
ld_vec[s] <- as.numeric(
survdiff_fast(t_p, e_p, g_p, control = 1L, side = 2L, presorted = TRUE)
)
hr_mat[s, ] <- unclass(
coxph_fast(t_p, e_p, g_p, control = 1L, presorted = TRUE)
)
}# Landmark survival in the treatment group at t = 18
km_summary <- c(
mean_surv = mean(km_mat[, "surv"]),
sd_surv = sd(km_mat[, "surv"]),
mean_std_err = mean(km_mat[, "std.err"])
)
km_summary
#> mean_surv sd_surv mean_std_err
#> 0.60114248 0.05336939 0.05108699
# Log-rank test: empirical power at alpha = 0.05 (two-sided)
p_logrank <- pchisq(ld_vec, df = 1L, lower.tail = FALSE)
power_empir <- mean(p_logrank < 0.05)
cat(sprintf("Empirical power (log-rank, two-sided, alpha = 0.05): %.2f\n",
power_empir))
#> Empirical power (log-rank, two-sided, alpha = 0.05): 0.47
# Hazard ratio summary
hr_summary <- c(
mean_HR = mean(hr_mat[, "exp(coef)"]),
sd_HR = sd(hr_mat[, "exp(coef)"]),
mean_log_HR = mean(hr_mat[, "coef"]),
sd_log_HR = sd(hr_mat[, "coef"]),
cover_95 = mean(hr_mat[, "lower .95"] <= (18 / 24) &
hr_mat[, "upper .95"] >= (18 / 24))
)
hr_summary
#> mean_HR sd_HR mean_log_HR sd_log_HR cover_95
#> 0.7487242 0.1274661 -0.3031164 0.1655298 0.9300000Under exponential survival with medians 18 and 24 months in the
control and treatment groups, the true hazard ratio is
(log(2) / 24) / (log(2) / 18) = 18 / 24 = 0.75. Across the
100 simulated trials, the average treatment-group landmark survival at
t = 18 months was around 0.60, consistent with
exp(-log(2) * 18 / 24) after accounting for censoring from
the administrative cutoff and dropout. The average hazard ratio estimate
from coxph_fast() was close to the true value 0.75, and the
empirical coverage of the 95% Wald confidence interval was near the
nominal 95% level. The empirical power of the two-sided log-rank test at
alpha = 0.05 was around 0.5, reflecting the modest sample size (200
patients total) relative to the modest treatment effect (HR = 0.75)
under the chosen accrual and dropout settings.
Homma, G. (2025). One step from Pike to Cox: a closed-form hazard ratio estimator. Manuscript under review.
Collett, D. (2014). Modelling Survival Data in Medical Research (3rd ed.). Chapman and Hall/CRC.