sasif

CRAN status CRAN downloads License: MIT

SAS IF-Style Data Step Logic for R — Improving readability and consistency in SDTM & ADaM derivations.

sasif lets clinical programmers write one condition that governs multiple variable assignments in a single block — just like SAS IF ... THEN DO. No repeated conditions. No logic drift. No maintenance burden.


The Core Problem

In traditional R, every variable assignment needs its own repeated condition:

# ❌ Traditional R (case_when) — condition repeated for EVERY variable
adsl <- adsl %>% mutate(
  SAFFL   = case_when(ACTARMCD == "TRTA" ~ "Y"),
  SAFFLN  = case_when(ACTARMCD == "TRTA" ~ 1),
  TRT01A  = case_when(ACTARMCD == "TRTA" ~ ACTARMCD),
  TRT01AN = case_when(ACTARMCD == "TRTA" ~ 1),
  TRTSDT  = case_when(ACTARMCD == "TRTA" ~ as.Date(RFSTDTC, "%Y-%m-%d")),
  TRTEDT  = case_when(ACTARMCD == "TRTA" ~ as.Date(RFENDTC, "%Y-%m-%d")),
  ITTFL   = case_when(ACTARMCD == "TRTA" ~ "Y"),
  FASFL   = case_when(ACTARMCD == "TRTA" ~ "Y"),
  RANDFL  = case_when(ACTARMCD == "TRTA" ~ "Y"),
  PPFL    = case_when(ACTARMCD == "TRTA" ~ "Y")
  # condition repeated 10 times — high QC risk if any diverge
)

If one condition is ever updated, you must find and change it in every single line. Miss one and your derivation silently diverges. This is a real QC risk in regulated clinical trial data.


The sasif Solution

One condition. All assignments grouped together. Just like SAS:

# ✅ sasif — condition written ONCE, governs all assignments
library(sasif)

ADSL <- data_step(adsl,
  if_do(ACTARMCD == "TRTA",
    SAFFL   = "Y",
    SAFFLN  = 1,
    TRT01A  = ACTARMCD,
    TRT01AN = 1,
    TRTSDT  = as.Date(RFSTDTC, "%Y-%m-%d"),
    TRTEDT  = as.Date(RFENDTC, "%Y-%m-%d"),
    ITTFL   = "Y",
    FASFL   = "Y",
    RANDFL  = "Y",
    PPFL    = "Y"
  )
)

Clean. Readable. Audit-friendly. Identical in intent to SAS IF ... THEN DO.


Installation

# Install from CRAN
install.packages("sasif")

# Install development version from GitHub
# install.packages("devtools")
devtools::install_github("chandrt23-lang/sasif")

Note: sasif requires a data.table as input. Convert with setDT(df) before calling data_step().


Functions

Function SAS Equivalent Description
data_step() DATA step Opens a SAS-style processing block on a data.table
if_do() IF ... THEN DO One condition governs multiple variable assignments
else_if_do() ELSE IF ... THEN DO Secondary condition — skipped if prior block matched
else_do() ELSE DO Default assignments when all prior conditions are FALSE
delete_if() DELETE Removes rows where condition is TRUE
if_independent() Multiple standalone IF Each condition evaluated independently — not a chain

Clinical Examples

Example 1 — ADSL: Multiple Population Flags in One Block

The most common use case — one treatment condition drives 10 variable assignments simultaneously:

ADSL <- data_step(adsl,
  if_do(ACTARMCD == "TRTA",
    SAFFL   = "Y",
    SAFFLN  = 1,
    TRT01A  = ACTARMCD,
    TRT01AN = 1,
    TRTSDT  = as.Date(RFSTDTC, "%Y-%m-%d"),
    TRTEDT  = as.Date(RFENDTC, "%Y-%m-%d"),
    ITTFL   = "Y",
    FASFL   = "Y",
    RANDFL  = "Y",
    PPFL    = "Y"
  )
)

Example 2 — ADSL: Age Category with IF / ELSE IF / ELSE

Mutually exclusive chain — first matching condition wins, others are skipped. Both AGECAT (character) and AGECATN (numeric) are derived together:

ADSL <- data_step(adsl,
  if_do(AGE <= 45,
    AGECAT  = "YOUNG",
    AGECATN = 1
  ),
  else_if_do(AGE <= 70,
    AGECAT  = "MIDDLE",
    AGECATN = 2
  ),
  else_do(
    AGECAT  = "OLD",
    AGECATN = 3
  )
)

Compare this to data.table nested fifelse() — which forces you to repeat the condition separately for each variable:

# ❌ data.table nested fifelse — condition repeated per variable
adlb[, `:=`(
  AGECAT  = fifelse(AGE <= 45, "YOUNG", fifelse(AGE <= 70, "MIDDLE", "OLD")),
  AGECATN = fifelse(AGE <= 45, 1L,      fifelse(AGE <= 70, 2L,       3L))
)]

Example 3 — ADLB: Lab Categorisation (Character + Numeric Together)

Derive both the category label and its numeric code from one condition block:

out <- data_step(adlb,
  if_do(LBTESTCD == "ALB" & AVAL < ANRLO,
    ALBCAT  = "LOW",
    ALBCATN = 1
  ),
  else_if_do(LBTESTCD == "ALB" & AVAL > ANRHI,
    ALBCAT  = "HIGH",
    ALBCATN = 2
  ),
  else_do(
    ALBCAT  = "NORMAL",
    ALBCATN = 3
  )
)

Example 4 — ADAE: Treatment-Emergent Flag

Flag adverse events that started on or after treatment start date:

ADAE <- data_step(adae,
  if_do(ASTDT >= TRTSDT & ASTDT <= TRTEDT,
    TRTEMFL = "Y",
    TRTEMA  = AEDECOD
  )
)

Example 5 — ADSL: Multi-Arm Treatment Assignment

Assign treatment label, numeric code, and start date together per arm:

ADSL <- data_step(adsl,
  if_do(ACTARMCD == "TRTA",
    TRT01A  = "Treatment A",
    TRT01AN = 1,
    TRTSDT  = as.Date(RFSTDTC, "%Y-%m-%d")
  ),
  else_if_do(ACTARMCD == "TRTB",
    TRT01A  = "Treatment B",
    TRT01AN = 2,
    TRTSDT  = as.Date(RFSTDTC, "%Y-%m-%d")
  ),
  else_do(
    TRT01A  = "Placebo",
    TRT01AN = 99,
    TRTSDT  = as.Date(RFSTDTC, "%Y-%m-%d")
  )
)

Example 6 — ADLB: Independent Flags (if_independent)

Use if_independent() when conditions are not mutually exclusive — each is evaluated on its own, so multiple flags can apply to the same row:

out <- data_step(adlb,
  if_independent(AVAL < ANRLO,  LOWNFL = "Y"),
  if_independent(AVAL > ANRHI,  HINFL  = "Y"),
  if_independent(LBTESTCD == "ALB", ALBFL = "Y")
)

Important: Do not mix if_do() chains with if_independent() on the same variable — if_independent() runs after the chain and will overwrite it. Use one approach consistently per variable.


Example 7 — DELETE: Remove Unwanted Rows

Remove screen failures and unscheduled visits explicitly:

# Remove screen failure subjects
ADSL <- data_step(adsl,
  delete_if(ACTARMCD == "SCRNFAIL")
)

# Remove records with missing test codes and unscheduled visits
ADLB <- data_step(adlb,
  delete_if(is.na(LBTESTCD)),
  delete_if(VISIT == "UNSCHEDULED")
)

Why Not case_when() or fifelse()?

Feature sasif case_when() data.table fifelse()
One condition → multiple variables ✅ Natural ❌ Repeated per variable ❌ Repeated per variable
IF / ELSE IF / ELSE chain ✅ Native ⚠️ Simulated ⚠️ Nested
SAS programmer readability ⭐⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐
Risk of condition drift across variables Low ✅ High ⚠️ High ⚠️
Vectorized performance
Audit-friendly derivation flow ⚠️ ⚠️

When to Use sasif

When NOT to Use sasif

sasif is focused on conditional derivation logic. Use standard R packages for:


Validation

A formal IQ/OQ/PQ Validation Document is available for GxP-regulated environments, covering all 6 functions in accordance with:

Contact the maintainer to request the validation document.


Getting Help


Citation

citation("sasif")

License

MIT © Thiyagarajan Chandrasekaran