RobustFlow is an R package for auditing temporal drift, subgroup disparities, and robustness in longitudinal decision systems.
It is built around three methodological contributions:
| Metric | Definition | Key function |
|---|---|---|
| Drift Intensity Index (DII) | Frobenius norm of consecutive transition matrix differences | compute_drift() |
| Bias Amplification Index (BAI) | Change in disparity gap from first to last time point | compute_bai() |
| Temporal Fragility Index (TFI) | Minimum hidden-bias perturbation to nullify a trend conclusion | compute_tfi_simple() |
An interactive Shiny application (run_app()) provides a
seven-tab workflow with point-and-click access to all analyses plus
downloadable HTML reports, CSV bundles, and reproducible R scripts.
We construct a panel that mimics a study of mathematics risk status across five elementary school waves, with three SES groups.
set.seed(2024)
n_children <- 200
n_waves <- 5
# Simulate SES group with unequal sizes
ses <- sample(
c("Low SES", "Mid SES", "High SES"),
size = n_children,
replace = TRUE,
prob = c(0.35, 0.40, 0.25)
)
# Risk probability increases over waves for Low SES,
# decreases slightly for High SES
risk_prob <- function(ses_grp, wave) {
base <- switch(ses_grp,
"Low SES" = 0.55,
"Mid SES" = 0.35,
"High SES" = 0.20
)
trend <- switch(ses_grp,
"Low SES" = 0.04,
"Mid SES" = 0.01,
"High SES" = -0.02
)
pmin(pmax(base + trend * (wave - 1), 0.02), 0.98)
}
rows <- lapply(seq_len(n_children), function(i) {
data.frame(
child_id = i,
wave = seq_len(n_waves),
ses_group = ses[i],
school_id = sample(seq_len(20), 1),
risk_math = rbinom(n_waves, 1,
vapply(seq_len(n_waves),
function(w) risk_prob(ses[i], w),
numeric(1)))
)
})
ecls_demo <- do.call(rbind, rows)
head(ecls_demo, 8)
#> child_id wave ses_group school_id risk_math
#> 1 1 1 High SES 5 0
#> 2 1 2 High SES 5 0
#> 3 1 3 High SES 5 0
#> 4 1 4 High SES 5 0
#> 5 1 5 High SES 5 0
#> 6 2 1 Mid SES 14 0
#> 7 2 2 Mid SES 14 0
#> 8 2 3 Mid SES 14 1validate_panel_data() checks variable existence, sorts
the data, and reports panel balance and missingness.
validated <- validate_panel_data(
data = ecls_demo,
id = "child_id",
time = "wave",
decision = "risk_math",
group = "ses_group",
cluster = "school_id"
)
cat(sprintf(
"Individuals: %d\nWaves: %d\nBalanced: %s\n",
validated$n_ids,
validated$n_times,
validated$balanced
))
#> Individuals: 200
#> Waves: 5
#> Balanced: TRUEbuild_paths() constructs each child’s complete risk
trajectory and returns aggregate path frequencies, the pooled transition
matrix, and Shannon entropy.
paths <- build_paths(
data = validated$data,
id = "child_id",
time = "wave",
decision = "risk_math"
)
# Top 10 paths
head(paths$path_counts, 10)
#> path n pct
#> 1 0->0->0->0->0 28 14.0
#> 2 0->0->0->1->0 16 8.0
#> 3 0->0->1->0->0 11 5.5
#> 4 1->0->0->0->0 11 5.5
#> 5 0->0->0->0->1 10 5.0
#> 6 0->0->0->1->1 8 4.0
#> 7 0->0->1->0->1 8 4.0
#> 8 1->1->0->1->1 8 4.0
#> 9 0->1->0->0->0 7 3.5
#> 10 0->1->0->0->1 7 3.5Shannon entropy measures trajectory diversity — higher values mean more heterogeneous individual histories.
df_paths <- head(paths$path_counts, 10)
df_paths$path <- factor(df_paths$path, levels = rev(df_paths$path))
ggplot(df_paths, aes(x = path, y = n)) +
geom_col(fill = "#2c7bb6") +
coord_flip() +
labs(x = "Path", y = "Count",
title = "Top 10 Decision Paths") +
theme_minimal()Top 10 most common five-wave risk trajectories.
compute_drift() estimates a period-specific transition
matrix for each consecutive wave pair and computes the Frobenius norm of
their difference:
\[DII_t = \| P_t - P_{t-1} \|_F\]
A larger DII signals greater structural instability between two periods.
drift <- compute_drift(
data = validated$data,
id = "child_id",
time = "wave",
decision = "risk_math",
normalize = TRUE
)
drift$summary
#> time DII
#> 1 1 NA
#> 2 2 NA
#> 3 3 0.089542
#> 4 4 0.140190
#> 5 5 0.095655
cat(sprintf("Mean DII: %.4f\n", drift$mean_dii))
#> Mean DII: 0.1085
cat(sprintf("Period of max drift: wave %s\n", drift$max_dii_period))
#> Period of max drift: wave 4ggplot(drift$summary, aes(x = time, y = DII)) +
geom_line(color = "#1a9641", linewidth = 1) +
geom_point(size = 2.5) +
labs(x = "Wave", y = "DII",
title = "Drift Intensity Index Over Time") +
theme_minimal()
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_line()`).
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_point()`).DII over time. Higher values indicate greater structural change in risk transitions.
compute_group_gaps() computes the at-risk rate for each
SES group at each wave and the pairwise gap (alphabetically first group
minus second).
compute_bai() then summarizes whether that gap widens or
narrows:
\[BAI = Gap_T - Gap_1\]
gaps <- compute_group_gaps(
data = validated$data,
time = "wave",
decision = "risk_math",
group = "ses_group",
focal_value = 1
)
gaps$gap_df
#> time gap
#> 1 1 -0.277778
#> 2 2 -0.534722
#> 3 3 -0.347222
#> 4 4 -0.479167
#> 5 5 -0.590277ggplot(gaps$long_format,
aes(x = time, y = rate, color = group, group = group)) +
geom_line(linewidth = 1) +
geom_point(size = 2.5) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
limits = c(0, 1)) +
labs(x = "Wave", y = "At-Risk Rate", color = "SES Group",
title = "Math Risk Trajectories by SES Group") +
theme_minimal() +
theme(legend.position = "bottom")At-risk rates by SES group across five waves.
ggplot(gaps$gap_df, aes(x = time, y = gap)) +
geom_line(color = "#762a83", linewidth = 1) +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey40") +
geom_point(size = 2.5) +
labs(x = "Wave", y = "Gap",
title = "Disparity Gap Over Time") +
theme_minimal()Disparity gap (first two alphabetic SES groups) over time.
bai <- compute_bai(gaps$gap)
cat(sprintf("BAI: %.4f\n", bai$bai))
#> BAI: -0.3125
cat(sprintf("Direction: %s\n", bai$direction))
#> Direction: convergence
cat(sprintf("Gap at t=1: %.4f | Gap at t=T: %.4f\n",
bai$gap_start, bai$gap_end))
#> Gap at t=1: -0.2778 | Gap at t=T: -0.5903The standardized version divides by the SD of the gap series:
bai_std <- compute_bai(gaps$gap, standardize = TRUE)
cat(sprintf("Standardized BAI: %.4f\n", bai_std$bai))
#> Standardized BAI: -2.3995compute_tfi_simple() estimates how much hidden bias —
modelled as a scalar attenuation \(u\)
— would be required to nullify the observed longitudinal trend in DII.
The adjusted slope under perturbation \(u\) is:
\[\hat{\beta}(u) = \hat{\beta}_{obs} \times (1 - u)\]
The TFI is the minimum \(u\) at which the trend reverses or reaches zero.
dii_vals <- drift$summary$DII[!is.na(drift$summary$DII)]
tfi <- compute_tfi_simple(dii_vals)
tfi$summary_table
#> Metric Value
#> 1 Observed slope 0.003056
#> 2 TFI 1
#> 3 Interpretation Moderately robustsc <- tfi$sensitivity_curve
ggplot(sc, aes(x = perturbation, y = adjusted_effect)) +
geom_line(color = "#4575b4", linewidth = 1) +
geom_hline(yintercept = 0, linetype = "dashed", color = "#d73027",
linewidth = 0.8) +
labs(x = "Hidden Bias Parameter (u)",
y = "Adjusted Slope",
title = "TFI Sensitivity Curve") +
theme_minimal()TFI sensitivity curve. The red dashed line marks the null effect threshold.
| TFI value | Interpretation |
|---|---|
Inf |
Conclusion cannot be nullified by scalar attenuation — highly robust |
> 0.5 |
Moderately robust |
0.1 – 0.5 |
Requires scrutiny — modest hidden bias could reverse conclusions |
< 0.1 |
Fragile — small bias sufficient to nullify the trend |
generate_r_script(
id_var = "child_id",
time_var = "wave",
decision_var = "risk_math",
group_var = "ses_group",
cluster_var = "school_id",
focal_value = 1,
output_file = "my_robustflow_analysis.R"
)The generated script is a self-contained .R file that a
collaborator can run to reproduce the full analysis without the Shiny
app.
The app loads in your default browser and provides:
| Tab | Content |
|---|---|
| Data | Upload, variable mapping, balance diagnostics, missingness |
| Paths | Path frequencies, transition matrix, entropy |
| Drift | Prevalence over time, DII trend, period-specific matrices |
| Disparities | Group trajectories, gap evolution, BAI |
| Robustness | TFI, sensitivity curve |
| Intervention | Highest-risk transitions, disparity-generating steps |
| Report | HTML report, CSV bundle, reproducible R script |
If you use RobustFlow in published research, please cite:
Hait, S. (2025). RobustFlow: Robustness and drift auditing for
longitudinal decision systems. R package version 0.1.0.
https://github.com/subirhait/RobustFlow
sessionInfo()
#> R version 4.5.1 (2025-06-13 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: America/Chicago
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.2 RobustFlow_0.1.1
#>
#> loaded via a namespace (and not attached):
#> [1] config_0.3.2 plotly_4.11.0 sass_0.4.10 generics_0.1.4
#> [5] tidyr_1.3.2 digest_0.6.37 magrittr_2.0.4 evaluate_1.0.5
#> [9] attempt_0.3.1 grid_4.5.1 RColorBrewer_1.1-3 golem_0.5.1
#> [13] fastmap_1.2.0 jsonlite_2.0.0 promises_1.3.3 httr_1.4.7
#> [17] purrr_1.2.1 viridisLite_0.4.2 scales_1.4.0 lazyeval_0.2.2
#> [21] jquerylib_0.1.4 cli_3.6.5 shiny_1.11.1 rlang_1.2.0
#> [25] withr_3.0.2 cachem_1.1.0 yaml_2.3.10 tools_4.5.1
#> [29] dplyr_1.2.1 httpuv_1.6.16 DT_0.34.0 vctrs_0.7.1
#> [33] R6_2.6.1 mime_0.13 lifecycle_1.0.5 htmlwidgets_1.6.4
#> [37] pkgconfig_2.0.3 pillar_1.11.1 bslib_0.9.0 later_1.4.4
#> [41] gtable_0.3.6 glue_1.8.0 data.table_1.17.8 Rcpp_1.1.0
#> [45] xfun_0.53 tibble_3.3.1 tidyselect_1.2.1 rstudioapi_0.17.1
#> [49] knitr_1.51 farver_2.1.2 xtable_1.8-4 htmltools_0.5.8.1
#> [53] rmarkdown_2.31 labeling_0.4.3 compiler_4.5.1 S7_0.2.0