SAS IF-Style Data Step Logic for R — Improving readability and consistency in SDTM & ADaM derivations.
sasif lets clinical programmers write one condition that
governs multiple variable assignments in a single block
— just like SAS IF ... THEN DO. No repeated conditions. No
logic drift. No maintenance burden.
In traditional R, every variable assignment needs its own repeated condition:
# ❌ Traditional R (case_when) — condition repeated for EVERY variable
adsl <- adsl %>% mutate(
SAFFL = case_when(ACTARMCD == "TRTA" ~ "Y"),
SAFFLN = case_when(ACTARMCD == "TRTA" ~ 1),
TRT01A = case_when(ACTARMCD == "TRTA" ~ ACTARMCD),
TRT01AN = case_when(ACTARMCD == "TRTA" ~ 1),
TRTSDT = case_when(ACTARMCD == "TRTA" ~ as.Date(RFSTDTC, "%Y-%m-%d")),
TRTEDT = case_when(ACTARMCD == "TRTA" ~ as.Date(RFENDTC, "%Y-%m-%d")),
ITTFL = case_when(ACTARMCD == "TRTA" ~ "Y"),
FASFL = case_when(ACTARMCD == "TRTA" ~ "Y"),
RANDFL = case_when(ACTARMCD == "TRTA" ~ "Y"),
PPFL = case_when(ACTARMCD == "TRTA" ~ "Y")
# condition repeated 10 times — high QC risk if any diverge
)If one condition is ever updated, you must find and change it in every single line. Miss one and your derivation silently diverges. This is a real QC risk in regulated clinical trial data.
One condition. All assignments grouped together. Just like SAS:
# ✅ sasif — condition written ONCE, governs all assignments
library(sasif)
ADSL <- data_step(adsl,
if_do(ACTARMCD == "TRTA",
SAFFL = "Y",
SAFFLN = 1,
TRT01A = ACTARMCD,
TRT01AN = 1,
TRTSDT = as.Date(RFSTDTC, "%Y-%m-%d"),
TRTEDT = as.Date(RFENDTC, "%Y-%m-%d"),
ITTFL = "Y",
FASFL = "Y",
RANDFL = "Y",
PPFL = "Y"
)
)Clean. Readable. Audit-friendly. Identical in intent to SAS
IF ... THEN DO.
# Install from CRAN
install.packages("sasif")
# Install development version from GitHub
# install.packages("devtools")
devtools::install_github("chandrt23-lang/sasif")Note:
sasifrequires adata.tableas input. Convert withsetDT(df)before callingdata_step().
| Function | SAS Equivalent | Description |
|---|---|---|
data_step() |
DATA step |
Opens a SAS-style processing block on a data.table |
if_do() |
IF ... THEN DO |
One condition governs multiple variable assignments |
else_if_do() |
ELSE IF ... THEN DO |
Secondary condition — skipped if prior block matched |
else_do() |
ELSE DO |
Default assignments when all prior conditions are FALSE |
delete_if() |
DELETE |
Removes rows where condition is TRUE |
if_independent() |
Multiple standalone IF |
Each condition evaluated independently — not a chain |
The most common use case — one treatment condition drives 10 variable assignments simultaneously:
ADSL <- data_step(adsl,
if_do(ACTARMCD == "TRTA",
SAFFL = "Y",
SAFFLN = 1,
TRT01A = ACTARMCD,
TRT01AN = 1,
TRTSDT = as.Date(RFSTDTC, "%Y-%m-%d"),
TRTEDT = as.Date(RFENDTC, "%Y-%m-%d"),
ITTFL = "Y",
FASFL = "Y",
RANDFL = "Y",
PPFL = "Y"
)
)Mutually exclusive chain — first matching condition wins, others are
skipped. Both AGECAT (character) and AGECATN
(numeric) are derived together:
ADSL <- data_step(adsl,
if_do(AGE <= 45,
AGECAT = "YOUNG",
AGECATN = 1
),
else_if_do(AGE <= 70,
AGECAT = "MIDDLE",
AGECATN = 2
),
else_do(
AGECAT = "OLD",
AGECATN = 3
)
)Compare this to data.table nested fifelse()
— which forces you to repeat the condition separately for each
variable:
# ❌ data.table nested fifelse — condition repeated per variable
adlb[, `:=`(
AGECAT = fifelse(AGE <= 45, "YOUNG", fifelse(AGE <= 70, "MIDDLE", "OLD")),
AGECATN = fifelse(AGE <= 45, 1L, fifelse(AGE <= 70, 2L, 3L))
)]Derive both the category label and its numeric code from one condition block:
out <- data_step(adlb,
if_do(LBTESTCD == "ALB" & AVAL < ANRLO,
ALBCAT = "LOW",
ALBCATN = 1
),
else_if_do(LBTESTCD == "ALB" & AVAL > ANRHI,
ALBCAT = "HIGH",
ALBCATN = 2
),
else_do(
ALBCAT = "NORMAL",
ALBCATN = 3
)
)Flag adverse events that started on or after treatment start date:
ADAE <- data_step(adae,
if_do(ASTDT >= TRTSDT & ASTDT <= TRTEDT,
TRTEMFL = "Y",
TRTEMA = AEDECOD
)
)Assign treatment label, numeric code, and start date together per arm:
ADSL <- data_step(adsl,
if_do(ACTARMCD == "TRTA",
TRT01A = "Treatment A",
TRT01AN = 1,
TRTSDT = as.Date(RFSTDTC, "%Y-%m-%d")
),
else_if_do(ACTARMCD == "TRTB",
TRT01A = "Treatment B",
TRT01AN = 2,
TRTSDT = as.Date(RFSTDTC, "%Y-%m-%d")
),
else_do(
TRT01A = "Placebo",
TRT01AN = 99,
TRTSDT = as.Date(RFSTDTC, "%Y-%m-%d")
)
)Use if_independent() when conditions are
not mutually exclusive — each is evaluated on its own,
so multiple flags can apply to the same row:
out <- data_step(adlb,
if_independent(AVAL < ANRLO, LOWNFL = "Y"),
if_independent(AVAL > ANRHI, HINFL = "Y"),
if_independent(LBTESTCD == "ALB", ALBFL = "Y")
)Important: Do not mix
if_do()chains withif_independent()on the same variable —if_independent()runs after the chain and will overwrite it. Use one approach consistently per variable.
Remove screen failures and unscheduled visits explicitly:
# Remove screen failure subjects
ADSL <- data_step(adsl,
delete_if(ACTARMCD == "SCRNFAIL")
)
# Remove records with missing test codes and unscheduled visits
ADLB <- data_step(adlb,
delete_if(is.na(LBTESTCD)),
delete_if(VISIT == "UNSCHEDULED")
)| Feature | sasif | case_when() | data.table fifelse() |
|---|---|---|---|
| One condition → multiple variables | ✅ Natural | ❌ Repeated per variable | ❌ Repeated per variable |
| IF / ELSE IF / ELSE chain | ✅ Native | ⚠️ Simulated | ⚠️ Nested |
| SAS programmer readability | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ |
| Risk of condition drift across variables | Low ✅ | High ⚠️ | High ⚠️ |
| Vectorized performance | ✅ | ✅ | ✅ |
| Audit-friendly derivation flow | ✅ | ⚠️ | ⚠️ |
SAFFL,
ITTFL, FASFL, RANDFL) from one
conditionIF ... THEN DO blocks directly to Rsasif is focused on conditional derivation logic. Use
standard R packages for:
RETAIN / stateful accumulation → use
data.table or base RLAG / previous-row values → use shift() in
data.tableARRAY processing → use lapply or
data.table column operationsdata.table
by= syntaxA formal IQ/OQ/PQ Validation Document is available for GxP-regulated environments, covering all 6 functions in accordance with:
Contact the maintainer to request the validation document.
citation("sasif")MIT © Thiyagarajan Chandrasekaran