Getting Started with CIMEHR

Introduction

Electronic Health Record (EHR) data are longitudinal where patients visit at irregular times, and even when a visit occurs, an outcome or biomarker may not be measured. CIMEHR (Clinical Informative Missingness for Electronic Health Record data) provides methods for longitudinal EHR analyses where patients visit irregularly and outcomes may be selectively measured at visits. The primary methods jointly characterize three linked processes:

  1. the visiting process: when a patient has an EHR visit;
  2. the observation process: whether the outcome/biomarker is recorded at a visit;
  3. the longitudinal outcome process: the outcome trajectory among the target population.

The main estimator is CIMEHR(). Two extensions, CIMEHR_timevarying_integral() and CIMEHR_timevarying_ou(), allow more flexible time-varying/serially correlated observation-process structures.

In addition, the R package CIMEHR provides simple benchmark methods and semiparametric joint models for analyses affected by informative visiting and informative observation.

This vignette starts with the data format, then moves from simpler baseline methods to the more complex CIMEHR estimators. Examples are kept small (a 500-subject subset of the bundled dataset) so each chunk runs in a few seconds.

library(CIMEHR)

1. Input data format

The analysis functions expect long-format data, with one row per subject-visit or subject-time record. The CIMEHR estimators expect at least the following column roles, with default column names shown:

Column role Default name Meaning
Subject identifier id Unique subject identifier
Visit time time Visit time; NA indicates a row without a visit
Outcome Y Longitudinal outcome; usually NA when the outcome was not recorded
Observation indicator R 1 if the outcome was recorded at the visit, 0 otherwise
Censoring/follow-up end C End of follow-up for each subject
Covariates user-named Baseline or model covariates

A note on time units. The methods do not assume a specific unit for time and C — months, weeks, days, or years all work, as long as a single unit is used consistently within a dataset. The choice affects only the interpretation of the slope on time in the longitudinal outcome model: that coefficient is the expected change in outcome per one unit of time. The bundled sim_ehr_data was generated with time and C in months over a 120-month (10-year) follow-up window, parameter values calibrated to match an outpatient HbA1c cadence. For your own data, choose the unit that gives a clinically meaningful slope (typically months or years for chronic-disease EHR cohorts) and document it alongside your analysis.

All package methods should be given long/panel data. The CIMEHR-family functions accept custom column names through id_col, time_col, y_col, r_col, and censor_col, so renaming your data is not required:

fit <- CIMEHR(
  data       = my_data,
  id_col     = "patient_id",
  time_col   = "visit_time",
  y_col      = "hba1c",
  r_col      = "measured",
  censor_col = "followup_end",
  covars_visit_XV               = c("age", "female"),
  covars_outcome_fixed_XY       = c("age", "female"),
  covars_outcome_random_link_ZY = "female",
  covars_obs_fixed_XO           = c("age", "female"),
  covars_obs_random_link_ZO     = "female"
)

1.1 Converting short/wide data to long data

If your dataset has one row per subject and repeated visit/outcome columns, reshape it first:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

wide_dat <- tibble::tibble(
  id = 1:2,
  Age = c(0.4, -0.2),
  Gender = c(1, 0),
  NSES = c(0.2, -0.4),
  C = c(60, 60),
  time_1 = c(5, 8),  time_2 = c(20, NA), time_3 = c(45, 50),
  Y_1 = c(1.2, NA), Y_2 = c(1.6, NA), Y_3 = c(2.0, 1.1)
)

long_dat <- wide_dat %>%
  pivot_longer(
    cols = matches("^(time|Y)_\\d+$"),
    names_to = c(".value", "visit"),
    names_pattern = "(time|Y)_(\\d+)"
  ) %>%
  mutate(R = as.integer(!is.na(Y))) %>%
  arrange(id, time)

head(long_dat)
#> # A tibble: 6 × 9
#>      id   Age Gender  NSES     C visit  time     Y     R
#>   <int> <dbl>  <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <int>
#> 1     1   0.4      1   0.2    60 1         5   1.2     1
#> 2     1   0.4      1   0.2    60 2        20   1.6     1
#> 3     1   0.4      1   0.2    60 3        45   2       1
#> 4     2  -0.2      0  -0.4    60 1         8  NA       0
#> 5     2  -0.2      0  -0.4    60 3        50   1.1     1
#> 6     2  -0.2      0  -0.4    60 2        NA  NA       0

2. Example dataset

The package includes sim_ehr_data, a long-format simulated EHR cohort designed to look like an outpatient HbA1c extract. The bundled dataset is one replicate from the package’s Monte Carlo simulation framework (see Section 8, Simulation study), specifically the continuous_phi0 scenario (focal Z = NSES, phi = 0). The columns are listed below. Latent quantities used internally by the data-generating process (the shared factor, the visiting frailty, and per-process random effects) are not included, since they are not observable in real EHR data.

Column Type Description
id integer Subject identifier
Age continuous Standardized age
Gender binary 1 = male, 0 = female
Marital binary 1 = married, 0 = not
Black binary 1 = Black race, 0 = otherwise
Hispanic binary 1 = Hispanic ethnicity, 0 = otherwise
NSES continuous Standardized neighborhood median household income
time continuous Visit time on [0, 120] months; NA for subjects with no visits
R binary 1 if HbA1c was recorded at the visit, 0 otherwise
log_HbA1c continuous Natural log of HbA1c; recommended outcome for analysis. NA when R = 0
C continuous End-of-follow-up time (120 months for all subjects)
data("sim_ehr_data")
nrow(sim_ehr_data)
#> [1] 16945
head(sim_ehr_data)
#>   id       Age Gender Marital Black Hispanic       NSES     time R log_HbA1c
#> 1  1 0.7868359      1       1     0        1 -0.7088425 19.80177 1  1.749098
#> 2  1 0.7868359      1       1     0        1 -0.7088425 36.48679 1  1.739825
#> 3  1 0.7868359      1       1     0        1 -0.7088425 46.89528 1  1.765329
#> 4  1 0.7868359      1       1     0        1 -0.7088425 50.41513 1  1.804394
#> 5  1 0.7868359      1       1     0        1 -0.7088425 53.87571 0        NA
#> 6  1 0.7868359      1       1     0        1 -0.7088425 58.62913 1  1.759420
#>     C
#> 1 120
#> 2 120
#> 3 120
#> 4 120
#> 5 120
#> 6 120

For the rest of this vignette, we work with a 500-subject subset to keep the executable examples quick:

set.seed(1)
sub_ids <- sample(unique(sim_ehr_data$id), 500)
ehr500  <- sim_ehr_data[sim_ehr_data$id %in% sub_ids, ]
nrow(ehr500)
#> [1] 4497
length(unique(ehr500$id))
#> [1] 500

The true simulation parameters are stored as an attribute and can be passed to method_comparisons() to populate true-value, bias, and RMSE columns:

true_params <- attr(sim_ehr_data, "true_params")
true_params$beta
#> intercept       Age    Gender   Marital     Black  Hispanic      NSES 
#>     -2.00      0.50      0.10     -0.03      0.15      0.05     -0.50
true_params$gamma
#> intercept       Age    Gender   Marital     Black  Hispanic      NSES 
#>     -2.80      0.50     -0.10     -0.05      0.15     -0.05      0.30

3. Method comparison

This section compares the statistical methods for longitudinal EHR data implemented in (or referenced by) the package. Methods are differentiated by whether they account for informative visiting (IP) and informative observation (IO), the specific scope of the shared latent structure linking these processes to the outcome, and the adjustment mechanism applied. Symbols: ✓ = explicitly handled or modeled; ✗ = ignored or not handled.

Abbreviations: LMM, linear mixed model; VA/OA, visit-adjusted / observation-adjusted; JMVL, joint modeling of visiting and longitudinal data; IIRR, inverse intensity rate ratio; MAR, missing at random; MI, multiple imputation; LI, linear increment.

Methods implemented in CIMEHR

The following methods have function implementations in the package:

Method Function name
Summary statistics Summary_stat()
Standard LMM (Laird and Ware, 1982) Linear_mixed_model()
Liang (Liang et al., 2009) Joint_modeling_visiting_and_longitudinal_Liang()
IIRR-Weighting (Bůžková and Lumley, 2007) Inverse_intensity_rate_ratio_weighting()
IIRR-Balanced (Yiu and Su, 2025) Inverse_intensity_rate_ratio_balancing()
Pairwise Likelihood (Chen et al., 2015) Pairwise_likelihood()
Multiple Imputation (Rubin, 1987) Multiple_imputation_IP()
Linear Increment (Aalen and Gunnes, 2010) Linear_increment_IP()
EHRJoint (Du et al., 2025) EHRJoint()
Proposed (Yang, Shi, and Mukherjee, 2026) CIMEHR(), CIMEHR_timevarying_integral(), CIMEHR_timevarying_ou()

VA-LMM, OA-LMM, JMVL-G, and Adapted-Liang appear in the comparison tables below for context but are not implemented as separate functions; VA-LMM and OA-LMM are available through Linear_mixed_model() via the visit_adjust argument.

Outcome-only

Method IP IO Shared Latent Adjustment Mechanism Package
Summary statistics None Reduces the longitudinal series to a single scalar (e.g., min, mean, median, max) for regression.
Standard LMM (Laird and Ware, 1982) None Conditions on observed covariates (valid only under MAR). lme4
VA-LMM (Goldstein et al., 2016) None Includes visit count as a fixed covariate proxy. lme4
OA-LMM None Includes observation count as a fixed covariate proxy. lme4

IP-only

Method IP IO Shared Latent Adjustment Mechanism Package
Liang (Liang et al., 2009) Visiting & Outcome Joint likelihood estimation via shared frailty linking visiting and outcome. IrregLong
JMVL-G (Gasparini et al., 2020) Visiting & Outcome Joint likelihood modeling inter-visit times via shared random effects. merlin
IIRR-Weighting (Bůžková and Lumley, 2007) None Inverse intensity weighting based on visiting intensity.
IIRR-Balanced (Yiu and Su, 2025) None Inverse intensity weighting adjusted for covariate balance.
Pairwise Likelihood (Chen et al., 2015) None Constructs a likelihood from all within-subject pairs of observations; conditioning on observed visit times eliminates the visit intensity from the likelihood.

Imputation + IP-only

Method IP IO Shared Latent Adjustment Mechanism Package
Multiple Imputation (MI) (Rubin, 1987) Determined by IP-only Multiple imputation based on posterior draws, followed by IP-only methods. mice
Linear Increment Methods (LI) (Aalen and Gunnes, 2010) Determined by IP-only Deterministic linear interpolation between observed visits to impute missing values, followed by IP-only methods. slim

IP+IO

Method IP IO Shared Latent Adjustment Mechanism Package
Adapted-Liang (Liang et al., 2009) Visiting & Outcome Joint model structure defined, but observation coefficients are constrained to zero.
EHRJoint (Du et al., 2025) Visiting & Outcome Tripartite model; observation process modeled via covariates only (no latent link).
Proposed (this package) All three processes Tripartite model; full shared latent structure links Visiting, Observation, and Outcome. CIMEHR

IP = informative visiting / informative presence adjustment; IO = informative observation adjustment. The “Proposed” row corresponds to CIMEHR(); CIMEHR_timevarying_integral() and CIMEHR_timevarying_ou() extend this framework with more flexible observation-process modeling.

4. Simpler benchmark methods first

We define a covariate vector once and reuse it. The focal coefficient for the bundled dataset is NSES.

covars_all <- c("Age", "Gender", "Marital", "Black", "Hispanic", "NSES")
out_form   <- log_HbA1c ~ Age + Gender + Marital + Black + Hispanic + NSES + time

4.1 Summary-statistic regression

Summary_stat() is a naive baseline that collapses each subject’s observed outcomes to a summary statistic and fits a cross-sectional regression.

fit_ss <- Summary_stat(
  ehr500,
  outcome = "log_HbA1c",
  formula = ~ Age + Gender + Marital + Black + Hispanic + NSES
)
summary(fit_ss)
#> 
#> Summary-Statistic Regression -- Summary
#> ============================================================ 
#> 
#> Fixed Effect (Coefficients):
#> ------------------------------------------------------------ 
#>             Estimate Std. Error  t value Pr(>|t|)
#> (Intercept)   1.6524     0.0057 291.3986   0.0000
#> Age           0.0214     0.0031   6.8069   0.0000
#> Gender        0.0055     0.0062   0.8802   0.3792
#> Marital      -0.0056     0.0064  -0.8751   0.3820
#> Black         0.0145     0.0066   2.2113   0.0275
#> Hispanic      0.0005     0.0075   0.0679   0.9459
#> NSES         -0.0183     0.0033  -5.4568   0.0000

4.2 Linear mixed model

Linear_mixed_model() wraps nlme::lme() and supports standard, observation-adjusted, and visit-adjusted variants.

fit_lmm <- Linear_mixed_model(
  ehr500,
  fixed         = out_form,
  random        = ~ 1 | id,
  obs_indicator = "R"
)
summary(fit_lmm)
#> 
#> Linear Mixed Model -- Summary
#> ============================================================ 
#> 
#> Fixed Effect
#> ------------------------------------------------------------
#>               Value Std.Error   DF  t-value p-value  CI.low CI.high
#> (Intercept)  1.6270    0.0058 1386 282.0957  0.0000  1.6157  1.6383
#> Age          0.0218    0.0031  440   7.0345  0.0000  0.0157  0.0279
#> Gender       0.0044    0.0061  440   0.7168  0.4739 -0.0076  0.0163
#> Marital     -0.0049    0.0063  440  -0.7897  0.4301 -0.0172  0.0073
#> Black        0.0129    0.0064  440   2.0076  0.0453  0.0003  0.0255
#> Hispanic     0.0006    0.0074  440   0.0822  0.9345 -0.0138  0.0150
#> NSES        -0.0191    0.0033  440  -5.7985  0.0000 -0.0255 -0.0126
#> time         0.0004    0.0000 1386  16.4099  0.0000  0.0004  0.0005

A visit-adjusted variant adds the running visit count as a fixed covariate before observation-indicator filtering:

fit_va <- Linear_mixed_model(
  ehr500,
  fixed         = out_form,
  random        = ~ 1 | id,
  obs_indicator = "R",
  visit_adjust  = "VA"
)
summary(fit_va)
#> 
#> Linear Mixed Model -- Summary
#> ============================================================ 
#> 
#> Fixed Effect
#> ------------------------------------------------------------
#>               Value Std.Error   DF  t-value p-value  CI.low CI.high
#> (Intercept)  1.6274    0.0058 1385 282.9958  0.0000  1.6161  1.6387
#> Age          0.0208    0.0032  440   6.5825  0.0000  0.0146  0.0269
#> Gender       0.0047    0.0061  440   0.7818  0.4348 -0.0071  0.0166
#> Marital     -0.0047    0.0062  440  -0.7498  0.4538 -0.0169  0.0076
#> Black        0.0128    0.0064  440   2.0029  0.0458  0.0003  0.0254
#> Hispanic     0.0010    0.0073  440   0.1355  0.8923 -0.0134  0.0153
#> NSES        -0.0196    0.0033  440  -5.9581  0.0000 -0.0261 -0.0132
#> time         0.0004    0.0000 1385   8.9270  0.0000  0.0003  0.0005
#> n_visits     0.0005    0.0003 1385   1.5831  0.1136 -0.0001  0.0012

4.3 Pairwise likelihood

Pairwise_likelihood() fits a pairwise composite likelihood model for observed outcomes. The outcome column is read from y_col, so we set y_col = "log_HbA1c" explicitly here (the default is "Y").

obs_data <- ehr500[!is.na(ehr500$R) & ehr500$R == 1, ]

fit_pl <- Pairwise_likelihood(
  data        = obs_data,
  formula     = out_form,
  y_col       = "log_HbA1c",
  pair_sample = 1000
)
summary(fit_pl)
#> 
#> Pairwise Likelihood -- Summary
#> ============================================================ 
#> 
#>   Observations: 1834 | Clusters: 447 | Pairs: 1000
#>   Converged: TRUE | Log-likelihood: -74.13
#> 
#> Fixed Effect (Coefficients)
#> ------------------------------------------------------------
#>          Estimate Std.Error  CI.low CI.high
#> Age        0.0237    0.0059  0.0121  0.0352
#> Gender    -0.0017    0.0113 -0.0238  0.0204
#> Marital   -0.0132    0.0122 -0.0372  0.0107
#> Black      0.0159    0.0120 -0.0076  0.0395
#> Hispanic   0.0162    0.0145 -0.0122  0.0446
#> NSES      -0.0236    0.0060 -0.0353 -0.0119
#> time       0.0004    0.0002  0.0001  0.0007

5. Visiting-process and joint methods

5.1 Inverse intensity rate ratio weighting

fit_iirr <- Inverse_intensity_rate_ratio_weighting(
  ehr500,
  outcome    = "log_HbA1c",
  visit_covs = covars_all
)
summary(fit_iirr)
#> 
#> IIRR Weighting Estimator -- Summary
#> ============================================================ 
#> 
#> Note: Standard errors are not available for this method.
#> Consider bootstrap for inference.
#> 
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#>          Estimate
#> Age        0.2879
#> Gender    -0.1568
#> Marital   -0.0318
#> Black      0.1063
#> Hispanic  -0.2235
#> NSES      -0.1132
#> 
#> Longitudinal Outcome -- Fixed Effect (beta)
#> ------------------------------------------------------------
#>             Estimate
#> (Intercept)   1.6311
#> Age           0.0214
#> Gender        0.0000
#> Marital      -0.0095
#> Black         0.0143
#> Hispanic      0.0022
#> NSES         -0.0210
#> time          0.0005

5.2 Inverse intensity rate ratio balancing

Inverse_intensity_rate_ratio_balancing() extends IIRR with covariate-balancing weights. The balance_covs argument defaults to visit_covs when omitted. The phi argument controls a sensitivity tuning parameter.

fit_iirr_bal <- Inverse_intensity_rate_ratio_balancing(
  ehr500,
  outcome    = "log_HbA1c",
  visit_covs = covars_all
)
#> Note: initial value of fn function contains non-finite values (starting at index=2)
#>   Check initial x and/or correctness of function. Check initial x and/or correctness of function. Falling back to Maximum Likelihood Estimation (MLE) weights.
summary(fit_iirr_bal)
#> 
#> IIRR Balancing-Weights Estimator -- Summary
#> ============================================================ 
#> 
#> Note: Standard errors are not available for this method.
#> Consider bootstrap for inference.
#> 
#>   Sensitivity parameter (phi): 0.0000
#> 
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#>          Estimate
#> Age             0
#> Gender          0
#> Marital         0
#> Black           0
#> Hispanic        0
#> NSES            0
#> 
#> Outcome -- Fixed Effect (beta) -- Balancing Weights
#> ------------------------------------------------------------
#>             Estimate
#> (Intercept)   1.6326
#> Age           0.0205
#> Gender       -0.0003
#> Marital      -0.0137
#> Black         0.0132
#> Hispanic      0.0048
#> NSES         -0.0204
#> time          0.0005
#> 
#> Outcome -- Fixed Effect (beta) -- MLE Weights
#> ------------------------------------------------------------
#>             Estimate
#> (Intercept)   1.6326
#> Age           0.0205
#> Gender       -0.0003
#> Marital      -0.0137
#> Black         0.0132
#> Hispanic      0.0048
#> NSES         -0.0204
#> time          0.0005

5.3 Liang–Lu–Ying joint visiting/outcome model

Joint_modeling_visiting_and_longitudinal_Liang() fits a joint model for the visiting process and the longitudinal outcome, returning model-based standard errors for the visiting stage and a clustered sandwich estimator for the outcome stage (conditional on the Stage 1 nuisance estimates).

fit_liang <- Joint_modeling_visiting_and_longitudinal_Liang(
  ehr500,
  outcome     = "log_HbA1c",
  visit_covs  = covars_all,
  long_covs   = covars_all,
  random_covs = "NSES"
)
summary(fit_liang)
#> 
#> JMVL (Liang) Joint Model -- Summary
#> ============================================================ 
#> 
#> Standard errors: model-based (visiting) and clustered sandwich (outcome).
#> The outcome sandwich is conditional on the Stage 1 nuisance estimates.
#> 
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#>          Estimate Std.Error  CI.low CI.high
#> Age        0.2879    0.0245  0.2398  0.3360
#> Gender    -0.1568    0.0468 -0.2486 -0.0650
#> Marital   -0.0318    0.0482 -0.1263  0.0628
#> Black      0.1063    0.0485  0.0112  0.2014
#> Hispanic  -0.2235    0.0595 -0.3402 -0.1069
#> NSES      -0.1132    0.0245 -0.1613 -0.0652
#> 
#>   Frailty variance: 0.2331
#> 
#> Longitudinal Outcome -- Fixed Effect (beta)
#> ------------------------------------------------------------
#>          Estimate Std.Error  CI.low CI.high
#> Age        0.0363    0.0968 -0.1535  0.2261
#> Gender    -0.0109    0.1953 -0.3936  0.3718
#> Marital   -0.0182    0.2022 -0.4144  0.3781
#> Black      0.0209    0.2011 -0.3734  0.4151
#> Hispanic   0.0074    0.2298 -0.4429  0.4577
#> NSES      -0.0134    0.0960 -0.2015  0.1746
#> 
#> Longitudinal Outcome -- Random Link (theta)
#> ------------------------------------------------------------
#>           Estimate Std.Error  CI.low CI.high
#> Intercept   0.0386    0.3162 -0.5811  0.6583
#> NSES       -0.0032    0.3501 -0.6893  0.6829

5.4 EHRJoint

EHRJoint() is a tripartite joint model. Its observation process uses a Generalized Linear Model (GLM) with a probit link by default (set obs_link = "logit" to switch). The fitted object returns model-based standard errors for the visiting and observation stages and a clustered sandwich estimator for the outcome stage.

fit_ej <- EHRJoint(
  ehr500,
  outcome     = "log_HbA1c",
  visit_covs  = covars_all,
  long_covs   = covars_all,
  random_covs = "NSES"
)
summary(fit_ej)
#> 
#> EHRJoint Model -- Summary
#> ============================================================ 
#> 
#> Model note: visiting process uses a visit-count/frailty-compensation component.
#> Observation process uses a Generalized Linear Model (GLM) with a probit link.
#> Standard errors: model-based (visiting, observation) and clustered sandwich (outcome).
#> 
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#>          Estimate Std.Error  CI.low CI.high
#> Age        0.4469    0.0160  0.4155  0.4783
#> Gender    -0.1552    0.0301 -0.2142 -0.0962
#> Marital   -0.0739    0.0310 -0.1347 -0.0131
#> Black      0.0731    0.0318  0.0107  0.1354
#> Hispanic  -0.1013    0.0361 -0.1720 -0.0306
#> NSES       0.2861    0.0150  0.2567  0.3154
#> 
#>   Frailty variance: 0.2564
#> 
#> Observation Process -- Fixed Effect (alpha)
#> ------------------------------------------------------------
#>             Estimate Std.Error  CI.low CI.high
#> (Intercept)  -0.0406    0.0377 -0.1144  0.0333
#> Age          -0.1599    0.0218 -0.2026 -0.1173
#> Gender        0.0171    0.0405 -0.0622  0.0964
#> Marital       0.0412    0.0417 -0.0405  0.1229
#> Black         0.1058    0.0423  0.0229  0.1888
#> Hispanic     -0.1295    0.0487 -0.2249 -0.0340
#> NSES         -0.5511    0.0222 -0.5947 -0.5076
#> 
#> Longitudinal Outcome -- Fixed Effect (beta)
#> ------------------------------------------------------------
#>          Estimate Std.Error  CI.low CI.high
#> Age       -0.0090    0.1150 -0.2344  0.2164
#> Gender    -0.0418    0.2215 -0.4759  0.3923
#> Marital   -0.0759    0.2316 -0.5299  0.3781
#> Black      0.0782    0.2260 -0.3649  0.5212
#> Hispanic   0.1220    0.2706 -0.4083  0.6524
#> NSES       0.0248    0.1000 -0.1712  0.2208
#> 
#> Longitudinal Outcome -- Random Link (theta)
#> ------------------------------------------------------------
#>             Estimate Std.Error  CI.low CI.high
#> (Intercept)  -0.6611    0.4011 -1.4473  0.1251
#> NSES         -0.9458    0.3824 -1.6953 -0.1964

6. Primary CIMEHR methods

6.1 Base CIMEHR

CIMEHR() jointly models informative visiting, informative observation, and the longitudinal outcome. Its visiting process uses Andersen–Gill partial likelihood with log-normal frailty, and its observation process uses a probit model with latent-random-link random effects when specified.

fit_cimehr <- CIMEHR(
  data                          = ehr500,
  y_col                         = "log_HbA1c",
  covars_visit_XV               = covars_all,
  covars_outcome_fixed_XY       = covars_all,
  covars_outcome_random_link_ZY = "NSES",
  covars_obs_fixed_XO           = covars_all,
  covars_obs_random_link_ZO     = "NSES"
)
print(fit_cimehr) # Print function only prinnts the outcome process estimates
#> 
#> CIMEHR -- Outcome Fixed Effect (beta)
#> ================================================== 
#>          Estimate Std.Error  CI.low CI.high
#> Age        0.0182    0.0984 -0.1747  0.2111
#> Gender     0.0040    0.1954 -0.3790  0.3869
#> Marital   -0.0034    0.2040 -0.4031  0.3964
#> Black      0.0096    0.2038 -0.3899  0.4091
#> Hispanic   0.0050    0.2320 -0.4497  0.4597
#> NSES      -0.0135    0.1027 -0.2147  0.1877
#> 
#> Use summary() for all process estimates, or
#> summary(x, specify = "visiting" / "observation" / "outcome") for one layer.
summary(fit_cimehr)
#> 
#> CIMEHR Joint Model -- Summary
#> ============================================================ 
#> 
#> Model note: visiting process uses Andersen-Gill partial likelihood with log-normal frailty where estimated.
#> Observation process uses a probit link; the OU variant adds OU serial correlation via PCL.
#> 
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#>          Estimate Std.Error  CI.low CI.high
#> Age        0.4469    0.0306  0.3870  0.5068
#> Gender    -0.1552    0.0602 -0.2733 -0.0371
#> Marital   -0.0739    0.0627 -0.1968  0.0490
#> Black      0.0731    0.0653 -0.0550  0.2011
#> Hispanic  -0.1013    0.0779 -0.2539  0.0514
#> NSES       0.2861    0.0305  0.2264  0.3458
#> 
#>   Frailty variance (sigma^2_zeta): 0.2283
#> 
#> Observation Process -- Fixed Effect (alpha)
#> ------------------------------------------------------------
#>             Estimate Std.Error  CI.low CI.high
#> (Intercept)   0.1167    0.0872 -0.0543  0.2876
#> Age          -0.2577    0.0469 -0.3496 -0.1658
#> Gender       -0.0118    0.0917 -0.1915  0.1678
#> Marital       0.0423    0.0996 -0.1529  0.2375
#> Black         0.1705    0.1032 -0.0318  0.3727
#> Hispanic     -0.1320    0.1163 -0.3598  0.0959
#> NSES         -0.6846    0.0450 -0.7727 -0.5964
#> 
#> Observation Process -- Random Link (delta)
#> ------------------------------------------------------------
#>             Estimate Std.Error  CI.low CI.high
#> (Intercept)  -0.3224    0.0700 -0.4597 -0.1851
#> NSES         -0.2488    0.0755 -0.3969 -0.1008
#> 
#>   Random-effect SD (Sigma_q):
#> (Intercept)        NSES 
#>      0.8768      0.0000 
#> 
#> Longitudinal Outcome -- Fixed Effect (beta)
#> ------------------------------------------------------------
#>          Estimate Std.Error  CI.low CI.high
#> Age        0.0182    0.0984 -0.1747  0.2111
#> Gender     0.0040    0.1954 -0.3790  0.3869
#> Marital   -0.0034    0.2040 -0.4031  0.3964
#> Black      0.0096    0.2038 -0.3899  0.4091
#> Hispanic   0.0050    0.2320 -0.4497  0.4597
#> NSES      -0.0135    0.1027 -0.2147  0.1877
#> 
#> Longitudinal Outcome -- Random Link (theta)
#> ------------------------------------------------------------
#>           Estimate Std.Error  CI.low CI.high
#> Intercept   0.0187    0.1395 -0.2547  0.2921
#> NSES        0.0054    0.1512 -0.2908  0.3017

6.1.1 Stage-specific summaries and coefficient extraction

You can print one process at a time and extract a particular coefficient programmatically:

summary_visiting(fit_cimehr)
#> 
#> CIMEHR Joint Model -- Summary
#> ============================================================ 
#> 
#> Model note: visiting process uses Andersen-Gill partial likelihood with log-normal frailty where estimated.
#> Observation process uses a probit link; the OU variant adds OU serial correlation via PCL.
#> Layer requested: visiting
#> 
#> Visiting Process -- Fixed Effect (gamma)
#> ------------------------------------------------------------
#>          Estimate Std.Error  CI.low CI.high
#> Age        0.4469    0.0306  0.3870  0.5068
#> Gender    -0.1552    0.0602 -0.2733 -0.0371
#> Marital   -0.0739    0.0627 -0.1968  0.0490
#> Black      0.0731    0.0653 -0.0550  0.2011
#> Hispanic  -0.1013    0.0779 -0.2539  0.0514
#> NSES       0.2861    0.0305  0.2264  0.3458
#> 
#>   Frailty variance (sigma^2_zeta): 0.2283
summary_observation(fit_cimehr)
#> 
#> CIMEHR Joint Model -- Summary
#> ============================================================ 
#> 
#> Model note: visiting process uses Andersen-Gill partial likelihood with log-normal frailty where estimated.
#> Observation process uses a probit link; the OU variant adds OU serial correlation via PCL.
#> Layer requested: observation
#> 
#> Observation Process -- Fixed Effect (alpha)
#> ------------------------------------------------------------
#>             Estimate Std.Error  CI.low CI.high
#> (Intercept)   0.1167    0.0872 -0.0543  0.2876
#> Age          -0.2577    0.0469 -0.3496 -0.1658
#> Gender       -0.0118    0.0917 -0.1915  0.1678
#> Marital       0.0423    0.0996 -0.1529  0.2375
#> Black         0.1705    0.1032 -0.0318  0.3727
#> Hispanic     -0.1320    0.1163 -0.3598  0.0959
#> NSES         -0.6846    0.0450 -0.7727 -0.5964
#> 
#> Observation Process -- Random Link (delta)
#> ------------------------------------------------------------
#>             Estimate Std.Error  CI.low CI.high
#> (Intercept)  -0.3224    0.0700 -0.4597 -0.1851
#> NSES         -0.2488    0.0755 -0.3969 -0.1008
#> 
#>   Random-effect SD (Sigma_q):
#> (Intercept)        NSES 
#>      0.8768      0.0000
summary_outcome(fit_cimehr)
#> 
#> CIMEHR Joint Model -- Summary
#> ============================================================ 
#> 
#> Model note: visiting process uses Andersen-Gill partial likelihood with log-normal frailty where estimated.
#> Observation process uses a probit link; the OU variant adds OU serial correlation via PCL.
#> Layer requested: outcome
#> 
#> Longitudinal Outcome -- Fixed Effect (beta)
#> ------------------------------------------------------------
#>          Estimate Std.Error  CI.low CI.high
#> Age        0.0182    0.0984 -0.1747  0.2111
#> Gender     0.0040    0.1954 -0.3790  0.3869
#> Marital   -0.0034    0.2040 -0.4031  0.3964
#> Black      0.0096    0.2038 -0.3899  0.4091
#> Hispanic   0.0050    0.2320 -0.4497  0.4597
#> NSES      -0.0135    0.1027 -0.2147  0.1877
#> 
#> Longitudinal Outcome -- Random Link (theta)
#> ------------------------------------------------------------
#>           Estimate Std.Error  CI.low CI.high
#> Intercept   0.0187    0.1395 -0.2547  0.2921
#> NSES        0.0054    0.1512 -0.2908  0.3017

extract_coefficient(fit_cimehr, stage = "outcome",     parameter = "NSES")
#>        NSES 
#> -0.01349089
extract_coefficient(fit_cimehr, stage = "visiting",    parameter = "NSES")
#>      NSES 
#> 0.2860714
extract_coefficient(fit_cimehr, stage = "observation", parameter = "NSES")
#>       NSES 
#> -0.6845686

6.2 CIMEHR with Gauss–Hermite quadrature

CIMEHR_timevarying_integral() uses Gauss–Hermite quadrature and online filtering for a more flexible observation process. Analytic standard errors are not currently exposed for this variant, so bootstrap inference is recommended (see Section 9).

fit_integral <- CIMEHR_timevarying_integral(
  data                          = ehr500,
  y_col                         = "log_HbA1c",
  covars_visit_XV               = covars_all,
  covars_outcome_fixed_XY       = covars_all,
  covars_outcome_random_link_ZY = "NSES",
  covars_obs_fixed_XO           = covars_all,
  covars_obs_random_link_ZO     = "NSES",
  gh_points_U = 7,
  gh_points_q = 5
)

6.3 CIMEHR with OU pairwise composite likelihood

CIMEHR_timevarying_ou() models serial correlation in the observation process using an Ornstein–Uhlenbeck process and pairwise composite likelihood.

fit_ou <- CIMEHR_timevarying_ou(
  data                          = ehr500,
  y_col                         = "log_HbA1c",
  covars_visit_XV               = covars_all,
  covars_outcome_fixed_XY       = covars_all,
  covars_outcome_random_link_ZY = "NSES",
  covars_obs_fixed_XO           = covars_all,
  covars_obs_random_link_ZO     = "NSES",
  pair_type = "adjacent"
)

7. Comparing methods with method_comparisons()

The helper method_comparisons() fits any user-specified combination of methods and returns separate tables for the visiting, observation, and outcome processes. If a method does not estimate a given process, that entry is shown as NA.

available_comparison_methods()
#>  [1] "CIMEHR"                                        
#>  [2] "CIMEHR_timevarying_integral"                   
#>  [3] "CIMEHR_timevarying_ou"                         
#>  [4] "EHRJoint"                                      
#>  [5] "Inverse_intensity_rate_ratio_balancing"        
#>  [6] "Inverse_intensity_rate_ratio_weighting"        
#>  [7] "Joint_modeling_visiting_and_longitudinal_Liang"
#>  [8] "Linear_increment"                              
#>  [9] "Linear_mixed_model"                            
#> [10] "Multiple_imputation"                           
#> [11] "Pairwise_likelihood"                           
#> [12] "Summary_stat"

The full call below fits five methods at once on the 500-subject subset. It takes a minute or two to run; if you are building the vignette and want to skip the runtime, change eval = TRUE to eval = FALSE in the chunk header.

cmp_subset <- method_comparisons(
  data    = ehr500,
  methods = c("Summary_stat", "Linear_mixed_model",
              "Inverse_intensity_rate_ratio_weighting",
              "Joint_modeling_visiting_and_longitudinal_Liang",
              "CIMEHR"),
  method_args = list(
    Summary_stat = list(
      outcome = "log_HbA1c",
      formula = ~ Age + Gender + Marital + Black + Hispanic + NSES
    ),
    Linear_mixed_model = list(
      fixed = out_form, random = ~ 1 | id, obs_indicator = "R"
    ),
    Inverse_intensity_rate_ratio_weighting = list(
      outcome = "log_HbA1c", visit_covs = covars_all
    ),
    Joint_modeling_visiting_and_longitudinal_Liang = list(
      outcome = "log_HbA1c",
      visit_covs = covars_all, long_covs = covars_all, random_covs = "NSES"
    ),
    CIMEHR = list(
      y_col                         = "log_HbA1c",
      covars_visit_XV               = covars_all,
      covars_outcome_fixed_XY       = covars_all,
      covars_outcome_random_link_ZY = "NSES",
      covars_obs_fixed_XO           = covars_all,
      covars_obs_random_link_ZO     = "NSES"
    )
  ),
  printSE = TRUE,
  print95CI = TRUE,
  true_value_verbose = TRUE
)

print(cmp_subset)
#> 
#> ---------------------------------
#> visiting process
#> ---------------------------------
#>                                          Method Parameter of interest Estimate
#>                                    Summary_stat                  <NA>       NA
#>                              Linear_mixed_model                  <NA>       NA
#>          Inverse_intensity_rate_ratio_weighting                   Age   0.2879
#>          Inverse_intensity_rate_ratio_weighting                Gender  -0.1568
#>          Inverse_intensity_rate_ratio_weighting               Marital  -0.0318
#>          Inverse_intensity_rate_ratio_weighting                 Black   0.1063
#>          Inverse_intensity_rate_ratio_weighting              Hispanic  -0.2235
#>          Inverse_intensity_rate_ratio_weighting                  NSES  -0.1132
#>  Joint_modeling_visiting_and_longitudinal_Liang                   Age   0.2879
#>  Joint_modeling_visiting_and_longitudinal_Liang                Gender  -0.1568
#>  Joint_modeling_visiting_and_longitudinal_Liang               Marital  -0.0318
#>  Joint_modeling_visiting_and_longitudinal_Liang                 Black   0.1063
#>  Joint_modeling_visiting_and_longitudinal_Liang              Hispanic  -0.2235
#>  Joint_modeling_visiting_and_longitudinal_Liang                  NSES  -0.1132
#>                                          CIMEHR                   Age   0.4469
#>                                          CIMEHR                Gender  -0.1552
#>                                          CIMEHR               Marital  -0.0739
#>                                          CIMEHR                 Black   0.0731
#>                                          CIMEHR              Hispanic  -0.1013
#>                                          CIMEHR                  NSES   0.2861
#>      SE             95% CI True value    Bias   RMSE
#>      NA               <NA>         NA      NA     NA
#>      NA               <NA>         NA      NA     NA
#>      NA               <NA>       0.50 -0.2121 0.2121
#>      NA               <NA>      -0.10 -0.0568 0.0568
#>      NA               <NA>      -0.05  0.0182 0.0182
#>      NA               <NA>       0.15 -0.0437 0.0437
#>      NA               <NA>      -0.05 -0.1735 0.1735
#>      NA               <NA>       0.30 -0.4132 0.4132
#>  0.0245    (0.2398, 0.336)       0.50 -0.2121 0.2121
#>  0.0468  (-0.2486, -0.065)      -0.10 -0.0568 0.0568
#>  0.0482  (-0.1263, 0.0628)      -0.05  0.0182 0.0182
#>  0.0485   (0.0112, 0.2014)       0.15 -0.0437 0.0437
#>  0.0595 (-0.3402, -0.1069)      -0.05 -0.1735 0.1735
#>  0.0245 (-0.1613, -0.0652)       0.30 -0.4132 0.4132
#>  0.0306    (0.387, 0.5068)       0.50 -0.0531 0.0531
#>  0.0602 (-0.2733, -0.0371)      -0.10 -0.0552 0.0552
#>  0.0627   (-0.1968, 0.049)      -0.05 -0.0239 0.0239
#>  0.0653   (-0.055, 0.2011)       0.15 -0.0769 0.0769
#>  0.0779  (-0.2539, 0.0514)      -0.05 -0.0513 0.0513
#>  0.0305   (0.2264, 0.3458)       0.30 -0.0139 0.0139
#> 
#> ---------------------------------
#> observation process
#> ---------------------------------
#>                                          Method Parameter of interest Estimate
#>                                    Summary_stat                  <NA>       NA
#>                              Linear_mixed_model                  <NA>       NA
#>          Inverse_intensity_rate_ratio_weighting                  <NA>       NA
#>  Joint_modeling_visiting_and_longitudinal_Liang                  <NA>       NA
#>                                          CIMEHR           (Intercept)   0.1167
#>                                          CIMEHR                   Age  -0.2577
#>                                          CIMEHR                Gender  -0.0118
#>                                          CIMEHR               Marital   0.0423
#>                                          CIMEHR                 Black   0.1705
#>                                          CIMEHR              Hispanic  -0.1320
#>                                          CIMEHR                  NSES  -0.6846
#>                                          CIMEHR           (Intercept)  -0.3224
#>                                          CIMEHR                  NSES  -0.2488
#>      SE             95% CI True value    Bias   RMSE
#>      NA               <NA>         NA      NA     NA
#>      NA               <NA>         NA      NA     NA
#>      NA               <NA>         NA      NA     NA
#>      NA               <NA>         NA      NA     NA
#>  0.0872  (-0.0543, 0.2876)         NA      NA     NA
#>  0.0469 (-0.3496, -0.1658)      -0.30  0.0423 0.0423
#>  0.0917  (-0.1915, 0.1678)      -0.10  0.0882 0.0882
#>  0.0996  (-0.1529, 0.2375)       0.00  0.0423 0.0423
#>  0.1032  (-0.0318, 0.3727)       0.05  0.1205 0.1205
#>  0.1163  (-0.3598, 0.0959)      -0.05 -0.0820 0.0820
#>  0.0450 (-0.7727, -0.5964)      -0.60 -0.0846 0.0846
#>      NA               <NA>         NA      NA     NA
#>      NA               <NA>         NA      NA     NA
#> 
#> ---------------------------------
#> outcome process
#> ---------------------------------
#>                                          Method Parameter of interest Estimate
#>                                    Summary_stat           (Intercept)   1.6524
#>                                    Summary_stat                   Age   0.0214
#>                                    Summary_stat                Gender   0.0055
#>                                    Summary_stat               Marital  -0.0056
#>                                    Summary_stat                 Black   0.0145
#>                                    Summary_stat              Hispanic   0.0005
#>                                    Summary_stat                  NSES  -0.0183
#>                              Linear_mixed_model           (Intercept)   1.6270
#>                              Linear_mixed_model                   Age   0.0218
#>                              Linear_mixed_model                Gender   0.0044
#>                              Linear_mixed_model               Marital  -0.0049
#>                              Linear_mixed_model                 Black   0.0129
#>                              Linear_mixed_model              Hispanic   0.0006
#>                              Linear_mixed_model                  NSES  -0.0191
#>                              Linear_mixed_model                  time   0.0004
#>          Inverse_intensity_rate_ratio_weighting           (Intercept)   1.6311
#>          Inverse_intensity_rate_ratio_weighting                   Age   0.0214
#>          Inverse_intensity_rate_ratio_weighting                Gender   0.0000
#>          Inverse_intensity_rate_ratio_weighting               Marital  -0.0095
#>          Inverse_intensity_rate_ratio_weighting                 Black   0.0143
#>          Inverse_intensity_rate_ratio_weighting              Hispanic   0.0022
#>          Inverse_intensity_rate_ratio_weighting                  NSES  -0.0210
#>          Inverse_intensity_rate_ratio_weighting                  time   0.0005
#>  Joint_modeling_visiting_and_longitudinal_Liang                   Age   0.0363
#>  Joint_modeling_visiting_and_longitudinal_Liang                Gender  -0.0109
#>  Joint_modeling_visiting_and_longitudinal_Liang               Marital  -0.0182
#>  Joint_modeling_visiting_and_longitudinal_Liang                 Black   0.0209
#>  Joint_modeling_visiting_and_longitudinal_Liang              Hispanic   0.0074
#>  Joint_modeling_visiting_and_longitudinal_Liang                  NSES  -0.0134
#>                                          CIMEHR                   Age   0.0182
#>                                          CIMEHR                Gender   0.0040
#>                                          CIMEHR               Marital  -0.0034
#>                                          CIMEHR                 Black   0.0096
#>                                          CIMEHR              Hispanic   0.0050
#>                                          CIMEHR                  NSES  -0.0135
#>                                          CIMEHR             Intercept   0.0187
#>                                          CIMEHR                  NSES   0.0054
#>      SE             95% CI True value    Bias   RMSE
#>  0.0057   (1.6413, 1.6635)         NA      NA     NA
#>  0.0031   (0.0152, 0.0276)       0.50 -0.4786 0.4786
#>  0.0062  (-0.0067, 0.0176)       0.10 -0.0945 0.0945
#>  0.0064  (-0.0181, 0.0069)      -0.03  0.0244 0.0244
#>  0.0066   (0.0017, 0.0274)       0.15 -0.1355 0.1355
#>  0.0075  (-0.0141, 0.0151)       0.05 -0.0495 0.0495
#>  0.0033 (-0.0248, -0.0117)      -0.50  0.4817 0.4817
#>  0.0058   (1.6157, 1.6383)         NA      NA     NA
#>  0.0031   (0.0157, 0.0279)       0.50 -0.4782 0.4782
#>  0.0061  (-0.0076, 0.0163)       0.10 -0.0956 0.0956
#>  0.0063  (-0.0172, 0.0073)      -0.03  0.0251 0.0251
#>  0.0064    (3e-04, 0.0255)       0.15 -0.1371 0.1371
#>  0.0074   (-0.0138, 0.015)       0.05 -0.0494 0.0494
#>  0.0033 (-0.0255, -0.0126)      -0.50  0.4809 0.4809
#>  0.0000     (4e-04, 5e-04)         NA      NA     NA
#>      NA               <NA>         NA      NA     NA
#>      NA               <NA>       0.50 -0.4786 0.4786
#>      NA               <NA>       0.10 -0.1000 0.1000
#>      NA               <NA>      -0.03  0.0205 0.0205
#>      NA               <NA>       0.15 -0.1357 0.1357
#>      NA               <NA>       0.05 -0.0478 0.0478
#>      NA               <NA>      -0.50  0.4790 0.4790
#>      NA               <NA>         NA      NA     NA
#>  0.0968  (-0.1535, 0.2261)       0.50 -0.4637 0.4637
#>  0.1953  (-0.3936, 0.3718)       0.10 -0.1109 0.1109
#>  0.2022  (-0.4144, 0.3781)      -0.03  0.0118 0.0118
#>  0.2011  (-0.3734, 0.4151)       0.15 -0.1291 0.1291
#>  0.2298  (-0.4429, 0.4577)       0.05 -0.0426 0.0426
#>  0.0960  (-0.2015, 0.1746)      -0.50  0.4866 0.4866
#>  0.0984  (-0.1747, 0.2111)       0.50 -0.4818 0.4818
#>  0.1954   (-0.379, 0.3869)       0.10 -0.0960 0.0960
#>  0.2040  (-0.4031, 0.3964)      -0.03  0.0266 0.0266
#>  0.2038  (-0.3899, 0.4091)       0.15 -0.1404 0.1404
#>  0.2320  (-0.4497, 0.4597)       0.05 -0.0450 0.0450
#>  0.1027  (-0.2147, 0.1877)      -0.50  0.4865 0.4865
#>  0.1395  (-0.2547, 0.2921)         NA      NA     NA
#>      NA               <NA>         NA      NA     NA
cmp_subset$tables$outcome
#>                                            Method Parameter of interest
#> 1                                    Summary_stat           (Intercept)
#> 2                                    Summary_stat                   Age
#> 3                                    Summary_stat                Gender
#> 4                                    Summary_stat               Marital
#> 5                                    Summary_stat                 Black
#> 6                                    Summary_stat              Hispanic
#> 7                                    Summary_stat                  NSES
#> 8                              Linear_mixed_model           (Intercept)
#> 9                              Linear_mixed_model                   Age
#> 10                             Linear_mixed_model                Gender
#> 11                             Linear_mixed_model               Marital
#> 12                             Linear_mixed_model                 Black
#> 13                             Linear_mixed_model              Hispanic
#> 14                             Linear_mixed_model                  NSES
#> 15                             Linear_mixed_model                  time
#> 16         Inverse_intensity_rate_ratio_weighting           (Intercept)
#> 17         Inverse_intensity_rate_ratio_weighting                   Age
#> 18         Inverse_intensity_rate_ratio_weighting                Gender
#> 19         Inverse_intensity_rate_ratio_weighting               Marital
#> 20         Inverse_intensity_rate_ratio_weighting                 Black
#> 21         Inverse_intensity_rate_ratio_weighting              Hispanic
#> 22         Inverse_intensity_rate_ratio_weighting                  NSES
#> 23         Inverse_intensity_rate_ratio_weighting                  time
#> 24 Joint_modeling_visiting_and_longitudinal_Liang                   Age
#> 25 Joint_modeling_visiting_and_longitudinal_Liang                Gender
#> 26 Joint_modeling_visiting_and_longitudinal_Liang               Marital
#> 27 Joint_modeling_visiting_and_longitudinal_Liang                 Black
#> 28 Joint_modeling_visiting_and_longitudinal_Liang              Hispanic
#> 29 Joint_modeling_visiting_and_longitudinal_Liang                  NSES
#> 30                                         CIMEHR                   Age
#> 31                                         CIMEHR                Gender
#> 32                                         CIMEHR               Marital
#> 33                                         CIMEHR                 Black
#> 34                                         CIMEHR              Hispanic
#> 35                                         CIMEHR                  NSES
#> 36                                         CIMEHR             Intercept
#> 37                                         CIMEHR                  NSES
#>         Estimate           SE             95% CI True value        Bias
#> 1   1.652435e+00 0.0056707035   (1.6413, 1.6635)         NA          NA
#> 2   2.140045e-02 0.0031439222   (0.0152, 0.0276)       0.50 -0.47859955
#> 3   5.459117e-03 0.0062018680  (-0.0067, 0.0176)       0.10 -0.09454088
#> 4  -5.582944e-03 0.0063798400  (-0.0181, 0.0069)      -0.03  0.02441706
#> 5   1.452324e-02 0.0065677511   (0.0017, 0.0274)       0.15 -0.13547676
#> 6   5.072482e-04 0.0074696992  (-0.0141, 0.0151)       0.05 -0.04949275
#> 7  -1.827476e-02 0.0033489993 (-0.0248, -0.0117)      -0.50  0.48172524
#> 8   1.627023e+00 0.0057676288   (1.6157, 1.6383)         NA          NA
#> 9   2.179495e-02 0.0030983099   (0.0157, 0.0279)       0.50 -0.47820505
#> 10  4.361431e-03 0.0060842291  (-0.0076, 0.0163)       0.10 -0.09563857
#> 11 -4.946489e-03 0.0062635374  (-0.0172, 0.0073)      -0.03  0.02505351
#> 12  1.290570e-02 0.0064283184    (3e-04, 0.0255)       0.15 -0.13709430
#> 13  6.044320e-04 0.0073507684   (-0.0138, 0.015)       0.05 -0.04939557
#> 14 -1.907900e-02 0.0032903401 (-0.0255, -0.0126)      -0.50  0.48092100
#> 15  4.431722e-04 0.0000270064     (4e-04, 5e-04)         NA          NA
#> 16  1.631125e+00           NA               <NA>         NA          NA
#> 17  2.137436e-02           NA               <NA>       0.50 -0.47862564
#> 18 -2.736955e-05           NA               <NA>       0.10 -0.10002737
#> 19 -9.474384e-03           NA               <NA>      -0.03  0.02052562
#> 20  1.428874e-02           NA               <NA>       0.15 -0.13571126
#> 21  2.197782e-03           NA               <NA>       0.05 -0.04780222
#> 22 -2.097999e-02           NA               <NA>      -0.50  0.47902001
#> 23  4.634290e-04           NA               <NA>         NA          NA
#> 24  3.630713e-02 0.0968361196  (-0.1535, 0.2261)       0.50 -0.46369287
#> 25 -1.089798e-02 0.1952628475  (-0.3936, 0.3718)       0.10 -0.11089798
#> 26 -1.815188e-02 0.2021551009  (-0.4144, 0.3781)      -0.03  0.01184812
#> 27  2.085107e-02 0.2011319982  (-0.3734, 0.4151)       0.15 -0.12914893
#> 28  7.409450e-03 0.2297644846  (-0.4429, 0.4577)       0.05 -0.04259055
#> 29 -1.344890e-02 0.0959530255  (-0.2015, 0.1746)      -0.50  0.48655110
#> 30  1.818674e-02 0.0984255929  (-0.1747, 0.2111)       0.50 -0.48181326
#> 31  3.986490e-03 0.1953825128   (-0.379, 0.3869)       0.10 -0.09601351
#> 32 -3.362864e-03 0.2039689000  (-0.4031, 0.3964)      -0.03  0.02663714
#> 33  9.584645e-03 0.2038442431  (-0.3899, 0.4091)       0.15 -0.14041536
#> 34  5.007985e-03 0.2319961318  (-0.4497, 0.4597)       0.05 -0.04499202
#> 35 -1.349089e-02 0.1026605627  (-0.2147, 0.1877)      -0.50  0.48650911
#> 36  1.871638e-02 0.1394998001  (-0.2547, 0.2921)         NA          NA
#> 37  5.422782e-03           NA               <NA>         NA          NA
#>          RMSE
#> 1          NA
#> 2  0.47859955
#> 3  0.09454088
#> 4  0.02441706
#> 5  0.13547676
#> 6  0.04949275
#> 7  0.48172524
#> 8          NA
#> 9  0.47820505
#> 10 0.09563857
#> 11 0.02505351
#> 12 0.13709430
#> 13 0.04939557
#> 14 0.48092100
#> 15         NA
#> 16         NA
#> 17 0.47862564
#> 18 0.10002737
#> 19 0.02052562
#> 20 0.13571126
#> 21 0.04780222
#> 22 0.47902001
#> 23         NA
#> 24 0.46369287
#> 25 0.11089798
#> 26 0.01184812
#> 27 0.12914893
#> 28 0.04259055
#> 29 0.48655110
#> 30 0.48181326
#> 31 0.09601351
#> 32 0.02663714
#> 33 0.14041536
#> 34 0.04499202
#> 35 0.48650911
#> 36         NA
#> 37         NA

If a method fails, method_comparisons() keeps going by default, stores the error in cmp$errors, and prints NA rows for that method.

8. Simulation study

A Monte Carlo simulation study is provided in simulations/simulation_study.R. The script crosses two focal-covariate types (continuous NSES and binary Gender) with four values of the Ornstein-Uhlenbeck serial-correlation parameter (phi \(\in\) {0, 0.001, 0.1, 0.3}), producing eight scenarios. Each scenario runs N_SIM = 1000 Monte Carlo replications; each replication generates a fresh dataset of n = 2000 participants on the [0, 120] window via sim_data_gen() and fits four methods:

For each method × scenario × parameter (beta on each of Age, Gender, Marital, Black, Hispanic, NSES), the script reports bias, empirical SE, mean analytic SE, SE ratio, RMSE, and 95% CI coverage. Coverage is reported as --- for Inverse_intensity_rate_ratio_weighting() because that method does not return analytic standard errors; bootstrap inference would be needed there.

The bundled sim_ehr_data used throughout this vignette is itself one replicate from this framework (the continuous_phi0 scenario, sim_id = 1), so the data-generating spec used by the study and the spec of the example dataset are identical aside from phi and the focal covariate.

To run the study:

Rscript simulations/simulation_study.R
Rscript simulations/simulation_study.R --cores 8 --nsim 100
Rscript simulations/simulation_study.R --scenarios continuous_phi0,binary_phi0.3

9. Bootstrap inference

CIMEHR(), CIMEHR_timevarying_ou(), EHRJoint(), and Joint_modeling_visiting_and_longitudinal_Liang() already return analytic standard errors (sandwich or model-based). bootstrap() is recommended for the methods that do not (CIMEHR_timevarying_integral(), Inverse_intensity_rate_ratio_weighting(), Inverse_intensity_rate_ratio_balancing()), and as an optional robustness check or for propagating Stage 1 nuisance uncertainty for the methods that do. The bootstrap resamples subjects with replacement and refits the selected method on each replicate.

bs <- bootstrap(
  fit_iirr,
  data = ehr500,
  B    = 200,
  seed = 1
)
print(bs)
summary(bs)

For CIMEHR-family methods, forward covariate specifications through covars:

bs_cimehr <- bootstrap(
  fit_cimehr,
  data = ehr500,
  B    = 200,
  seed = 1,
  covars = list(
    covars_visit_XV               = covars_all,
    covars_outcome_fixed_XY       = covars_all,
    covars_outcome_random_link_ZY = "NSES",
    covars_obs_fixed_XO           = covars_all,
    covars_obs_random_link_ZO     = "NSES"
  )
)

10. Simulating custom data

sim_data_gen() accepts a covariates list spec; per-process coefficient vectors are keyed by the names declared there. The default spec is the generic two-covariate Z/X form used in the package’s internal tests, but the EHR-style spec used to build sim_ehr_data is an instructive example:

ehr_covs <- list(
  Age      = list(type = "continuous", dist = "uniform",
                  min = 18, max = 99, standardize = TRUE),
  Gender   = list(type = "binary", prob = 0.5),
  Marital  = list(type = "binary", prob = 0.4),
  Black    = list(type = "binary", prob = 0.30),
  Hispanic = list(type = "binary", prob = 0.20),
  NSES     = list(type = "continuous", dist = "normal",
                  mean = 0, sd = 1, standardize = FALSE)
)

dat <- sim_data_gen(
  n          = 500,
  time.start = 0,
  time.end   = 120,
  seed       = 42,
  covariates = ehr_covs,
  gamma = c(intercept = -2.8, Age = 0.5, Gender = -0.10, Marital = -0.05,
            Black = 0.15, Hispanic = -0.05, NSES = 0.3),
  sigma_zeta = 0.5,
  alpha = c(intercept = 0.2, Age = -0.3, Gender = -0.10, Marital = 0.00,
            Black = 0.05, Hispanic = -0.05, NSES = -0.6),
  obs_random_covs = "NSES",
  delta   = c(intercept = -0.3, NSES = -0.3),
  sigma_q = c(intercept =  0.5, NSES =  0.2),
  phi     = 0,
  beta = c(intercept = -2.0, Age = 0.5, Gender = 0.10,
           Marital = -0.03, Black = 0.15, Hispanic = 0.05,
           NSES = -0.5),
  beta_t              = 0.01,
  outcome_random_covs = "NSES",
  theta   = c(intercept = 0.4, NSES = 0.0),
  Sigma_b = matrix(c(0.5, 0, 0, 0.8), 2, 2),
  sigma_y = 0.7,
  verbose = FALSE
)

head(dat$long_data)
#>   id      Age Gender Marital Black Hispanic       NSES  Age_raw     time R
#> 1  1 1.450938      0       1     0        1 -1.5515833 92.09929 21.61704 1
#> 2  1 1.450938      0       1     0        1 -1.5515833 92.09929 76.51161 1
#> 3  1 1.450938      0       1     0        1 -1.5515833 92.09929 83.24414 1
#> 4  2 1.526923      0       0     1        0 -0.3881794 93.90311 21.83727 0
#> 5  2 1.526923      0       0     1        0 -0.3881794 93.90311 43.17168 0
#> 6  2 1.526923      0       0     1        0 -0.3881794 93.90311 78.57681 0
#>            Y   C
#> 1 -0.6228701 120
#> 2 -1.2704788 120
#> 3 -0.5537626 120
#> 4         NA 120
#> 5         NA 120
#> 6         NA 120

References

Aalen, O. O., Borgan, Ø., and Gjessing, H. K. (2008). Survival and event history analysis: A process point of view. Springer.

Aalen, O. O. and Gunnes, N. (2010). A dynamic approach for reconstructing missing longitudinal data using the linear increments model. Biostatistics, 11, 453–472.

Bůžková, P. and Lumley, T. (2007). Longitudinal data analysis for generalized linear models with follow-up dependent on outcome-related variables. Canadian Journal of Statistics, 35, 485–500.

Chen, Y., Ning, J., and Cai, C. (2015). Regression analysis of longitudinal data with irregular and informative observation times. Biostatistics, 16, 727–739.

Du, J., Shi, X., and Mukherjee, B. (2025). A new statistical approach for joint modeling of longitudinal outcomes measured in electronic health records with clinically informative presence and observation processes.

Gasparini, A., Abrams, K. R., Barrett, J. K., Major, R. W., Sweeting, M. J., Brunskill, N. J., and Crowther, M. J. (2020). Mixed-effects models for health care longitudinal data with an informative visiting process: A Monte Carlo simulation study. Statistica Neerlandica, 74, 5–23.

Goldstein, B. A., Bhavsar, N. A., Phelan, M., and Pencina, M. J. (2016). Controlling for informed presence bias due to the number of health encounters in an electronic health record. American Journal of Epidemiology, 184, 847–855.

Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963–974.

Liang, Y., Lu, W., and Ying, Z. (2009). Joint modeling and analysis of longitudinal data with informative observation times. Biometrics, 65, 377–384.

Pullenayegum, E. M. (2026). IrregLong: Analysis of Longitudinal Data with Irregular Observation Times. R package version 0.4.1.

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons, New York.

Yiu, S. and Su, L. (2025). Accommodating informative visit times for analysing irregular longitudinal data: A sensitivity analysis approach with balancing weights estimators. Journal of the Royal Statistical Society Series C: Applied Statistics, 74, 824–843.

Yang, C.-H., Shi, X., and Mukherjee, B. (2026). Joint modeling of longitudinal EHR data with shared random effects for informative visiting and observation processes. arXiv:2602.15374.