| Type: | Package |
| Title: | 'REDCap' Data Management |
| Version: | 1.0.0 |
| Maintainer: | João Carmezim <jcarmezim@igtp.cat> |
| Description: | REDCap Data Management - 'REDCap' (Research Electronic Data CAPture; https://projectredcap.org) is a web application developed at Vanderbilt University, designed for creating and managing online surveys and databases and the REDCap API is an interface that allows external applications to connect to REDCap remotely, and is used to programmatically retrieve or modify project data or settings within REDCap, such as importing or exporting data. REDCapDM is an R package that allows users to manage data exported directly from REDCap or using an API connection. This package includes several functions designed for pre-processing data, generating reports of queries such as outliers or missing values, and following up on previously identified queries. |
| License: | MIT + file LICENSE |
| URL: | https://bruigtp.github.io/REDCapDM/, https://doi.org/10.1186/s12874-024-02178-6 |
| BugReports: | https://github.com/bruigtp/REDCapDM/issues |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | dplyr, janitor, openxlsx, purrr, REDCapR, rlang, stringr, tibble, tidyr, tidyselect, labelled, utils, stringi, cli, forcats, lifecycle |
| Suggests: | knitr, rmarkdown, kableExtra, testthat (≥ 3.0.0), mockery |
| VignetteBuilder: | knitr |
| Depends: | R (≥ 4.1) |
| LazyData: | true |
| Config/testthat/edition: | 3 |
| Config/Needs/website: | rmarkdown |
| NeedsCompilation: | no |
| Packaged: | 2025-11-26 10:02:20 UTC; jcarmezim |
| Author: | João Carmezim [aut, cre], Pau Satorra [aut], Judith Peñafiel [aut], Esther García [aut], Natàlia Pallarès [aut], Cristian Tebé [aut] |
| Repository: | CRAN |
| Date/Publication: | 2025-11-26 10:20:02 UTC |
Managing REDCap Data: The R package REDCapDM / REDCapDM: A Toolkit for Managing REDCap Data in R
Description
The REDCapDM package provides tools to import, process, and manage REDCap data within R. It supports data retrieval through the REDCap API or directly from exported files and includes a robust set of functions for data transformation, validation, and discrepancy management. Designed for efficient workflow integration, REDCapDM simplifies the handling of REDCap datasets, making it easier to ensure data quality and consistency.
Key Features:
-
Flexible Data Import: Import data directly from REDCap using API connections or process exported REDCap files.
-
Data Transformation: Streamline the cleaning and preparation of raw datasets for analysis.
-
Query Management: Identify and track data discrepancies, missing events, and manage resolution reports.
Core Functions:
-
redcap_data: Reads data exported from REDCap or retrieved through the REDCap API into R. -
rd_transform: One-step pipeline to clean and preprocess the raw REDCap data.-
rd_dates: Standardize date and datetime fields. -
rd_delete_vars: Remove specified variables (by name or pattern). -
rd_recalculate: Recompute calculated fields and compare with REDCap values. -
rd_factor: Replace numeric multiple-choice columns with their factor version. -
rd_checkbox: Expand checkbox responses with custom labels and rename 'var___1' columns (REDCap style) to 'var_option'. -
rd_split: Splits a REDCap dataset by form or event. -
rd_insert_na: Manually set specified variables to missing based on a logical filter. -
rd_rlogic: Translate REDCap branching or calculation logic into R syntax. -
rd_dictionary: Update dictionary (translation of REDCap logic into R syntax) to reflect transformed data and logic.
-
-
rd_query: Identifies discrepancies (queries) in the dataset for validation. -
rd_event: Detects missing events in longitudinal datasets. -
check_queries: Compares historical and current query reports to track changes and additions. -
rd_export: Exports a summary report of identified queries to an Excel (.xlsx) file.
Examples
## Not run:
# Install REDCapDM from CRAN:
install.packages("REDCapDM")
# Install the latest version of REDCapDM from GitHub:
remotes::install_github("bruigtp/REDCapDM")
## End(Not run)
Handle Project Arguments
Description
This helper function processes the project argument in the several other functions.
It extracts data, dictionary, event form, and results from the project object while handling
potential duplication and providing warnings when arguments are provided redundantly.
Usage
check_proj(project, data = NULL, dic = NULL, event_form = NULL)
Arguments
project |
A list or object containing |
data |
A data frame (optional) that may be overridden if provided in the |
dic |
A data dictionary (optional) that may be overridden if provided in the |
event_form |
An optional event-form object that may be overridden if provided in the |
Check for Changes Between Two Query Reports
Description
Compare an older query report (old) with a newer one (new) and classify each
query into one of four statuses:
-
Pending — the same query is present in both reports (no change detected),
-
Solved — the query was present in the old report but is absent from the new report,
-
New — the query appears in the new report but was not present in the old report,
-
Miscorrected — a special case where a query in the new report is marked as
Newbut shares the sameIdentifierandDescriptionas an existing record (suggesting a re-issued or modified query for the same identifier).
The function returns a detailed comparison table of queries with a Modification factor (one of the four statuses) and an HTML summary table showing counts per status.
Usage
check_queries(old, new, report_title = NULL, return_viewer = TRUE)
Arguments
old |
Data frame containing the previous (older) query report. Must include
|
new |
Data frame containing the newer query report. Must include
|
report_title |
Optional single string used as the caption for the HTML summary table.
Defaults to |
return_viewer |
Logical; if |
Details
Requirements:
Both
oldandnewmust be data frames.Both data frames must contain at least the following character columns:
Identifier,Description, andQuery.A
Codecolumn is optional; if present it will be preserved and considered for sorting and output. IfCodeexists innew, it is removed at the beginning of the routine to avoid conflicts with re-assigned codes.
The function merges the two reports, constructs composite keys used for comparison, classifies each row into a modification state, detects and re-labels Miscorrected cases, reassigns a Code per Identifier to keep codes consistent, and returns a detailed dataset plus an optional HTML summary viewer.
Value
A list with two elements:
queriesA data frame containing all queries present in either
oldornew. A factor columnModificationindicates the state for each row (levels:Pending,Solved,Miscorrected,New). The function also reassignsCodevalues so codes are consistent perIdentifier.resultsIf
return_viewer = TRUE, an HTMLknitr::kable(styled withkableExtra) summarising totals per state. Ifreturn_viewer = FALSE, this isNULL.
Notes and edge cases
-
Column types: If
Identifier,DescriptionorQueryare factors, they will be used in the comparison — it is recommended to convert them to character prior to callingcheck_queries()to avoid factor-level mismatches. -
Sorting: When
Identifiervalues contain a dash (e.g."100-20"), the function attempts to split into numericcenterandidparts for logical ordering. OtherwiseIdentifieris coerced to numeric for ordering. -
Miscorrected detection: A
Miscorrectedlabel is assigned when more than one row shares the sameIdentifier + Descriptioncomposite and a row is otherwise classified asNew— this signals a likely re-issued or modified query for an existing identifier.
Examples
# Minimal reproducible example
old <- data.frame(
Identifier = c("100-1", "100-2", "200-1"),
Description = c("age check", "weight check", "lab miss"),
Query = c("is.na(age)", "is.na(weight)", "missing lab result"),
Code = c("100-1-1", "100-2-1", "200-1-1"),
stringsAsFactors = FALSE
)
new <- data.frame(
Identifier = c("100-1", "200-1", "300-1"),
Description = c("age check", "lab miss", "new query"),
Query = c("is.na(age)", "missing lab result (clarify)", "is.na(x)"),
stringsAsFactors = FALSE
)
res <- check_queries(old = old, new = new, report_title = "My Query Comparison")
# detailed table
head(res$queries)
# HTML summary (if in an RMarkdown or interactive viewer)
res$results
Change Checkboxes Names to Option Names
Description
This function updates the names of checkboxes in the dataset and dictionary to reflect the names of their options.
Usage
checkbox_names(data, dic, labels, checkbox_labels = c("No", "Yes"))
Arguments
data |
Dataset containing the REDCap data. |
dic |
Dataset containing the REDCap dictionary. |
labels |
Named character vector with the names of the variables in the data and their corresponding REDCap labels. |
checkbox_labels |
Character vector specifying the names for the two options of each checkbox variable. The default is |
Subset of COVICAN's Database
Description
A random sample of the COVICAN study. An international, multicentre cohort study of cancer patients with COVID-19 to describe the epidemiology, risk factors, and clinical outcomes of co-infections and superinfections in onco-haematological patients with COVID-19.
Usage
data(covican)
Format
A data frame with 342 rows and 56 columns
- record_id:
Identifier of each record. This information does not match the real data.
- redcap_event_name:
Auto-generated name of the events
- redcap_data_access_group:
Auto-generated name of each center. This information does not match the real data.
- inc_1:
Inclusion criteria of 'Patients older than 18 years' (0 = No ; 1 = Yes)
- inc_2:
Inclusion criteria of 'Cancer patients' (0 = No ; 1 = Yes)
- inc_3:
Inclusion criteria of 'Diagnosed of COVID-19' (0 = No ; 1 = Yes)
- exc_1:
Exclusion criteria of 'Solid tumour remission >1 year' (0 = No ; 1 = Yes)
- screening_fail_crit:
Indicator of non-compliance with inclusion and exclusion criteria (0 = compliance ; 1 = non-compliance)
- d_birth:
Date of birth (y-m-d). This date does not correspond to the original.
- d_admission:
Date of first visit (y-m-d). This date does not correspond to the original.
- age:
Age in years
- dm:
Indicator of diabetes (0 = No ; 1 = Yes)
- type_dm:
Type of diabetes (1 = No complications ; 2 = End-organ diabetes-related disease)
- copd:
Indicator of chronic obstructive pulmonary disease (0 = No ; 1 = Yes)
- fio2:
Fraction of inspired oxygen in percentage
- available_analytics:
Indicator of blood test available (0 = No ; 1 = Yes)
- potassium:
Potassium in mmol/L
- resp_rate:
Respiratory rate in bpm
- leuk_lymph:
Indicator of leukemia or lymphoma (0 = No ; 1 = Yes)
- acute_leuk:
Indicator of acute leukemia (0 = No ; 1 = Yes)
- type_underlying_disease...:
Checkbox with the type of underlying disease (0 = Haematological cancer ; 1 = Solid tumour)
- underlying_disease_hemato...:
Checkbox with the type of underlying disease (1 = Acute myeloid leukemia ; 2 = Myelodysplastic syndrome ; 3 = Chronic myeloid leukaemia ; 4 = Acute lymphoblastic leukaemia ; 5 = Hodgkin lymphoma ; 6 = Non Hodgkin lymphoma ; 7 = Multiple myeloma ; 8 = Myelofibrosis ; 9 = Aplastic anaemia ; 10 = Chronic lymphocytic leukaemia ; 11 = Amyloidosis ; 12 = Other)
- urine_culture:
Indicator of urine culture: (0 = Not done ; 1 = Done)
- ....factor:
Labels of the different variables
Note
List with three data frames: the first one with the data, the second one with the dictionary (codebook) of the REDCap project and the last one with the instrument-event mappings of the REDCap project.
References
Gudiol, C., Durà-Miralles, X., Aguilar-Company, J., Hernández-Jiménez, P., Martínez-Cutillas, M., Fernandez-Avilés, F., Machado, M., Vázquez, L., Martín-Dávila, P., de Castro, N., Abdala, E., Sorli, L., Andermann, T. M., Márquez-Gómez, I., Morales, H., Gabilán, F., Ayaz, C. M., Kayaaslan, B., Aguilar-Guisado, M., Herrera, F. Royo-Cebrecos C, Peghin M, González-Rico C, Goikoetxea J, Salgueira S, Silva-Pinto A, Gutiérrez-Gutiérrez B, Cuellar S, Haidar G, Maluquer C, Marin M, Pallarès N, Carratalà J. (2021). Co-infections and superinfections complicating COVID-19 in cancer patients: A multicentre, international study. The Journal of infection, 83(3), 306–313. https://doi.org/10.1016/j.jinf.2021.07.014
Fill Rows with Values from One Event
Description
This function fills all rows in the dataset with the value of a particular variable in a specified event. It is an auxiliary function used in the rd_rlogic function.
Usage
fill_data(which_event, which_var, data)
Arguments
which_event |
String specifying the name of the event. |
which_var |
String specifying the name of the variable. |
data |
Dataset containing the REDCap data. |
Transform Checkbox Variables in a REDCap Project
Description
This function is used to process checkbox variables in a REDCap dataset. By default, it changes their default categories ("Unchecked" and "Checked") to new ones ("No" and "Yes). Optionally, the function can also evaluate the branching logic for checkbox fields and adjust the data and dictionary accordingly.
Usage
rd_checkbox(
project = NULL,
data = NULL,
dic = NULL,
event_form = NULL,
checkbox_labels = c("No", "Yes"),
checkbox_names = TRUE,
na_logic = "none"
)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
checkbox_labels |
Character vector of length 2 for labels of unchecked/checked values. Default: |
checkbox_names |
Logical. If |
na_logic |
Controls how missing values are set based on branching logic. Must be one of |
Details
Checkbox columns are expected in REDCap wide format (
field___code).Branching logic evaluation requires
event_formfor longitudinal projects.Names are cleaned and truncated to 60 characters; uniqueness is enforced.
Fields that cannot be evaluated are listed in
results.
Value
A list with:
- data
Transformed dataset with checkbox fields as factors and optionally renamed.
- dictionary
Updated dictionary with checkbox fields expanded and optionally renamed.
- event_form
The
event_formpassed in (if applicable).- results
Summary of transformations and any fields needing review.
Examples
# Basic usage with a project object
res <- rd_checkbox(covican)
# With custom labels
res <- rd_checkbox(data = covican$data,
dic = covican$dictionary,
checkbox_labels = c("Not present", "Present"))
# Keep original checkbox names
res <- rd_checkbox(covican, checkbox_names = FALSE)
# Longitudinal project with NA logic
res <- rd_checkbox(data = covican$data,
dic = covican$dictionary,
event_form = covican$event_form,,
na_logic = "eval")
Transform Dates and Datetimes in REDCap Data
Description
Converts date and datetime fields in a REDCap dataset to appropriate R classes.
Usage
rd_dates(project = NULL, data = NULL, dic = NULL, event_form = NULL)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
Details
The function performs the following tasks:
Detects date and datetime fields from the REDCap dictionary (
date_*anddatetime_*validation types).Converts date fields to
Dateclass.Converts datetime fields to
POSIXctclass, treating empty strings asNA.
Value
A list with the following elements:
- data
The transformed dataset with date and datetime fields formatted as
DateandPOSIXct.- dictionary
The original REDCap dictionary passed to the function.
- event_form
The original event-form mapping (if applicable).
- results
A summary of the transformations performed.
Examples
result <- rd_dates(covican)
transformed_data <- result$data
Delete Variables from REDCap Dataset and Dictionary
Description
Deletes selected variables from a REDCap dataset and its dictionary, keeping them consistent and preserving variable labels.
Usage
rd_delete_vars(
project = NULL,
data = NULL,
dic = NULL,
event_form = NULL,
vars = NULL,
pattern = NULL
)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
vars |
Optional. A character vector of variable names to remove from both the dataset and dictionary. |
pattern |
Optional. A character vector of regular expression patterns. Variables matching these patterns will be removed from the dataset and dictionary. |
Details
Ensure that at least one of
varsorpatternis specified.Removes specified variables and their factor versions (e.g.,
variable.factor) from the dataset.Removes matching variables from the dictionary.
Warns about factor versions of variables matching patterns, recommending use of
rd_factor()if necessary.
Value
A list containing:
- data
The updated dataset with specified variables removed.
- dictionary
The updated REDCap dictionary.
- event_form
The original event-form mapping (if applicable).
- results
A summary message describing the variable removal operation.
Examples
# Delete specific variables by name
result <- rd_delete_vars(
project = covican,
vars = c("potassium", "leuk_lymph")
)
# Delete variables matching patterns
result <- rd_delete_vars(
data = covican$data,
dic = covican$dictionary,
pattern = c("_complete$", "_other$")
)
Transform the data dictionary and handle branching logic
Description
Updates a REDCap data dictionary by converting branching logic and calculation expressions into valid R expressions. This ensures that conditional display rules and calculated fields in the dictionary can be programmatically evaluated with the dataset. Any variables that cannot be converted are reported in the results.
Usage
rd_dictionary(project = NULL, data = NULL, dic = NULL, event_form = NULL)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
Details
The function performs the following tasks:
Evaluates and transforms branching logic expressions into valid R expressions using
rd_rlogic.Evaluates and converts calculation expressions for fields of type
calc.Generates a results summary table listing any variables that could not be converted.
Value
A list with the following elements:
- data
The original dataset passed to the function.
- dictionary
The updated data dictionary with modified branching logic and calculations.
- event_form
The original event-form mapping (if applicable).
- results
A summary of the transformations, including any variables with unconverted branching logic.
Examples
## Not run:
result <- rd_dictionary(covican)
print(result$results)
updated_dic <- result$dictionary
## End(Not run)
Identify Missing Events in REDCap Data
Description
Helps identify records in a REDCap longitudinal project that are missing one or more specified events. Because REDCap typically omits empty events from exports, an event that contains no data for a record will not appear. This function finds those absent events and returns a per-record query table and a summarized HTML report.
Usage
rd_event(
project = NULL,
data = NULL,
dic = NULL,
event_form = NULL,
event,
filter = NA,
query_name = NA,
addTo = NA,
report_title = NA,
report_zeros = FALSE,
link = list()
)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
event |
Character vector with one or more REDCap event names to check for missing records. |
filter |
Optional. A single character string containing a filter expression to subset the dataset before checking for missing events. Example: |
query_name |
Optional character vector describing each query. Defaults to a standard format: |
addTo |
Optional data frame from a previous query report to which the new results can be appended. |
report_title |
Optional string specifying the title of the final report. Defaults to |
report_zeros |
Logical, include variables with zero queries in the report. Default is |
link |
Optional list containing project information ( |
Value
A named list with two elements:
queriesA data frame listing records missing the specified events. Columns:
Identifier,DAG,Event,Instrument,Field,Repetition,Description,Query,Code, and optionallyLink. If no queries are found this will be an empty data frame with the expected columns.resultsAn HTML table (knitr::kable styled with kableExtra) summarising the number of missing records per event. Returned as
knitr::kable(HTML).
Examples
# Minimal reproducible example
data0 <- data.frame(
record_id = c("100-1", "100-2", "200-1"),
redcap_event_name = c("baseline_arm_1", "baseline_arm_1", "follow_up_arm_1"),
redcap_event_name.factor = factor(c("Baseline", "Baseline", "Follow-up")),
stringsAsFactors = FALSE
)
# Suppose we want to check that every record has the follow-up event
res <- rd_event(
data = data0,
dic = data.frame(), # placeholder dictionary
event = "follow_up_arm_1",
report_zeros = TRUE
)
# Records missing the event:
res$queries
# HTML summary (in RMarkdown or Viewer)
res$results
Export Queries to an Excel File
Description
Export a query dataset (e.g., from rd_query or rd_event) to an .xlsx file. The function can optionally convert a column of URLs into Excel hyperlinks and apply password protection to the worksheet.
Usage
rd_export(
project = NULL,
queries = NULL,
column = NULL,
sheet_name = NULL,
path = NULL,
password = NULL
)
Arguments
project |
A list containing the dataframe of queries and results (expected |
queries |
A data frame of identified queries. |
column |
Name of the column containing URLs to convert into hyperlinks. If |
sheet_name |
Name of the Excel sheet in the resulting |
path |
File path for saving the |
password |
Optional password to protect the worksheet from edits. |
Value
An .xlsx file written to the specified path.
Examples
## Not run:
rd_export(
queries = my_queries,
column = "Link",
sheet_name = "My Queries",
path = "queries.xlsx"
)
## End(Not run)
Convert Variables to Factors in a REDCap Dataset
Description
Converts variables in a REDCap dataset with associated .factor columns into actual factor variables, while allowing the exclusion of specific variables. Ensures consistency with the dataset and preserves variable labels.
Usage
rd_factor(
project = NULL,
data = NULL,
dic = NULL,
event_form = NULL,
exclude = NULL
)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
exclude |
Optional character vector of variable names (use original names without the |
Details
The function looks for columns ending in .factor and replaces the original variable values with those .factor values (converted to factors). It preserves variable labels. The exclude argument must contain base variable names (no .factor suffix); if any .factor names are passed to exclude the function will throw an informative error. The columns redcap_event_name, redcap_repeat_instrument and redcap_data_access_group (and their .factor counterparts) are handled specially to avoid altering event or access-group data.
Value
A list with the following elements:
- data
The transformed dataset with
.factorcolumns applied as factors.- dictionary
The dictionary used (unchanged).
- event_form
The event-form mapping used (if applicable).
- results
A brief text summary of the transformation.
Examples
## Not run:
result <- rd_factor(covican)
result <- rd_factor(covican, exclude = c("available_analytics", "urine_culture"))
transformed_data <- result$data
## End(Not run)
Insert Missing Values Using a Filter
Description
Sets selected variables to NA when a filter condition is satisfied. Useful for managing checkboxes or other fields without explicit gatekeeper questions.
Usage
rd_insert_na(
project = NULL,
data = NULL,
dic = NULL,
event_form = NULL,
vars,
filter
)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
vars |
Character vector of variable names to set to |
filter |
A single logical expression (as string). Rows where the filter evaluates to |
Details
Each variable is only updated in rows/events where both the variable and filter are present.
For longitudinal projects,
event_formmust be provided for proper event-level filtering.Only one filter expression is allowed.
Variables and filter columns must exist in both
dataanddictionary.
Value
A list with:
- data
The dataset with
NAinserted where the filter applies.- dictionary
The unchanged dictionary.
- event_form
The
event_formpassed in (if applicable).- results
Summary message of the changes applied.
Examples
# Set 'potassium' to NA where age < 65
## Not run:
data <- rd_insert_na(
data = covican$data,
dic = covican$dictionary,
vars = "potassium",
filter = "age < 65"
)
table(data$potassium)
## End(Not run)
Identification of Queries
Description
Detects and summarizes queries in a REDCap dataset based on specified expressions, filters, or a defined branching logic. Useful for identifying missing values, out-of-range values, or values that do not meet predefined criteria.
Usage
rd_query(
project = NULL,
variables = NA,
expression = NA,
negate = FALSE,
event = NA,
filter = NA,
addTo = NA,
variables_names = NA,
query_name = NA,
instrument = NA,
report_title = NA,
report_zeros = FALSE,
by_dag = FALSE,
link = list(),
data = NULL,
dic = NULL,
event_form = NULL
)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
variables |
Character vector of variable names to check for queries. |
expression |
Character vector of R expressions to evaluate for each variable. |
negate |
Logical, if |
event |
Required for longitudinal projects to avoid overestimation. REDCap event(s) to analyze. |
filter |
Optional string of filters to apply to the dataset, such as the branching logic of a variable. |
addTo |
Optional data frame from a previous query report to which the new results can be appended. |
variables_names |
Optional character vector of descriptions for each variable. Defaults to the variables labels in the dictionary. |
query_name |
Optional character vector describing each query. Defaults to a standard format: |
instrument |
Optional REDCap instrument(s) for each variable. Defaults to the instrument reported in the dictionary. |
report_title |
Optional string specifying the title of the final report. Defaults to |
report_zeros |
Logical, include variables with zero queries in the report. Default is |
by_dag |
Logical, split results by Data Access Group (DAG). Default is |
link |
Optional list containing project information ( |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
Details
The function performs the following steps:
Applies user-specified expressions to the selected variables to detect queries.
Optionally negates the expressions to find values that do not satisfy the condition.
Handles REDCap branching logic, converting it into R-compatible expressions for evaluation.
Applies additional user-specified filters before identifying queries.
Generates structured query results with metadata including:
Identifier (record_id)
DAG (if present)
Event and Instrument
Field, Repetition, Description, Query statement
Optional link to REDCap entry
Optionally combines results with previous query outputs using
addTo.Produces a summarized report, optionally including variables with zero queries.
Provides warnings for variables with branching logic that could not be automatically evaluated.
Value
A list containing:
- queries
A data frame or a list of data frames (if
by_dag = TRUE) with detailed query information for each record.- results
A formatted report (HTML table) summarizing total queries per variable, event, and DAG if applicable.
Examples
## Not run:
# Identify missing values for multiple variables
result <- rd_query(covican,
variables = c("copd", "age"),
expression = c("is.na(x)", "x %in% NA"),
event = "baseline_visit_arm_1"
)
result$results
# Identify values exceeding a threshold
result <- rd_query(covican,
variables = "age",
expression = "x > 20",
event = "baseline_visit_arm_1"
)
# Apply a filter to select subset of data
result <- rd_query(covican,
variables = "potassium",
expression = "is.na(x)",
event = "baseline_visit_arm_1",
filter = "available_analytics == '1'"
)
## End(Not run)
Recalculate and Verify Calculated Fields in REDCap Data
Description
Recalculates fields marked as calculated in the REDCap dictionary, compares them with the original values, and reports discrepancies. If any differences are found, new recalculated fields are added to the dataset and dictionary with _recalc appended to the names.
Usage
rd_recalculate(
project = NULL,
data = NULL,
dic = NULL,
event_form = NULL,
exclude = NULL
)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
exclude |
Optional. Character vector of field names to exclude from recalculation. |
Details
Fields of type
calcin the dictionary are recalculated.Recalculated values are compared with the original values.
If differences exist, new fields
[field_name]_recalcare added to the dataset and dictionary.Works for single-event projects; for longitudinal projects,
event_formmust be provided.Fields with incomplete branching logic or smart variables may fail to recalculate.
Value
A list with:
- data
The dataset with new
_recalcfields for any differing calculated fields.- dictionary
Updated dictionary including the new
_recalcfields.- event_form
The
event_formpassed in (if applicable).- results
Summary report of the recalculation process, including tables of discrepancies.
Examples
# Recalculate all calculated fields
## Not run:
results <- rd_recalculate(
data = covican$data,
dic = covican$dictionary,
event_form = covican$event_form
)
## End(Not run)
# Recalculate but exclude some variables
## Not run:
results <- rd_recalculate(
project = covican,
exclude = c("age", "screening_fail_crit")
)
## End(Not run)
Translate REDCap Logic to R Logic
Description
Converts a REDCap logic expression into R-compatible logic. Processes one logic expression (logic) for one target variable (var) at a time. Supports common REDCap operators (and, or, =, <, >, etc.) and handles event-specific logic in longitudinal projects. Logic involving smart variables or repeated instruments may require manual review.
Usage
rd_rlogic(
project = NULL,
data = NULL,
dic = NULL,
event_form = NULL,
logic,
var
)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
logic |
A single REDCap logic string (e.g., |
var |
A single string specifying the target variable the logic applies to. |
Details
Translates REDCap operators and functions into R equivalents:
-
=→==,<>→!=,and→&,or→|. Converts functions like
if(),rounddown(),datediff(),sum()to R equivalents.
-
Handles date transformations and empty strings (
'') →NA.Adjusts logic for longitudinal data using
event_formif provided.Evaluates the translated R logic against the dataset and returns the results.
Logic with repeated instruments, smart variables, or multiple events per variable may require manual inspection.
Value
A list with:
- rlogic
The translated R-compatible logic as a string.
- eval
The evaluation of the translated logic on the provided dataset, filtered by event if applicable.
Examples
# Translate a single REDCap logic expression for one variable
covican |>
rd_rlogic(
logic = "if([exc_1]='1' or [inc_1]='0' or [inc_2]='0' or [inc_3]='0', 1, 0)",
var = "screening_fail_crit"
)
Split a REDCap dataset by form or event
Description
Splits a REDCap dataset into separate datasets by form or event using the data dictionary. Supports both longitudinal and non-longitudinal projects and can return wide or long formats for repeated measures.
Usage
rd_split(
project = NULL,
data = NULL,
dic = NULL,
event_form = NULL,
which = NULL,
by = "form",
wide = FALSE
)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
which |
Optional. A single form or event to extract. If not provided, all forms or events are returned. |
by |
Character. Criteria to split the dataset: |
wide |
Logical. If |
Details
Handles checkbox variables and REDCap default variables (
_complete,_timestamp) appropriately.For form-based splits in longitudinal projects, uses
event_formto map variables to events.Wide format expands repeated instances into multiple columns per record.
Filtering by
whichallows extracting a single form or event.Projects with repeated instruments are handled by filtering on the
redcap_repeat_instrumentvariable.
Value
Depending on which and wide:
- data
A
data.frameor a list ofdata.framesrepresenting the split datasets.- dictionary
The original REDCap dictionary.
- event_form
The original event-form mapping (if applicable).
- results
A summary message of the splitting operation.
Examples
# Split by form and return wide format
result <- covican |>
rd_split(by = "form", wide = TRUE)
print(result)
# Split by event (long format)
result <- covican |>
rd_split(by = "event")
print(result)
Transformation of the Raw Data
Description
Transforms the raw REDCap data read by the redcap_data function. The function runs in one-step pipeline all functions dedicated to processing the data and returns the transformed data and dictionary, along with a summary of each step done.
Usage
rd_transform(
project = NULL,
data = NULL,
dic = NULL,
event_form = NULL,
checkbox_labels = c("No", "Yes"),
na_logic = "none",
exclude_recalc = NULL,
exclude_factor = NULL,
delete_vars = NULL,
delete_pattern = NULL,
final_format = "raw",
which_event = NULL,
which_form = NULL,
wide = NULL
)
Arguments
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
checkbox_labels |
Character vector of length 2 for labels of unchecked/checked values. Default: |
na_logic |
Controls how missing values are set based on the branching logic of a checkbox. Must be one of |
exclude_recalc |
Optional. Character vector of field names to exclude from recalculation. |
exclude_factor |
Optional character vector of variable names (use original names without the |
delete_vars |
Optional. A character vector of variable names to remove from both the dataset and dictionary. |
delete_pattern |
Optional. A character vector of regular expression patterns. Variables matching these patterns will be removed from the dataset and dictionary. |
final_format |
Character string indicating the final format of the data. Options are |
which_event |
Character. If |
which_form |
Character. If |
wide |
Logical. If |
Value
A list with elements:
- data
Transformed data (data.frame or nested list when split).
- dictionary
Updated dictionary data.frame.
- event_form
Event–form mapping (if applicable).
- results
Character summary of transformation steps performed.
Examples
# Minimal usage (project object or data + dictionary)
trans <- rd_transform(covican)
# Custom checkbox labels
trans <- rd_transform(covican,
checkbox_labels = c("Not present", "Present"))
# Return only a single form (wide)
trans <- rd_transform(covican,
final_format = "by_form",
which_form = "laboratory_findings",
wide = TRUE)
Recalculate REDCap Calculated Fields
Description
This function recalculates each calculated field if the logic can be transcribed to R. Note that calculated fields containing smart-variables or variables from other events cannot be transcribed.
The function returns the dataset and dictionary with the recalculated variables appended (named as the original field plus _recalc), along with a summary table of the recalculation results.
Usage
recalculate(data, dic, event_form = NULL, exclude_recalc = NULL)
Arguments
data |
Data frame containing data from REDCap. |
dic |
Data frame containing the dictionary read from REDCap. |
event_form |
Data frame containing the correspondence of each event with each form. |
exclude_recalc |
Character vector with the names of the variables that should not be recalculated. Useful for projects with time-consuming recalculations for certain calculated fields. |
Read REDCap data
Description
Import REDCap data into R either from REDCap's exported R file or directly via the REDCap API. The function returns a list with the dataset, the project dictionary (metadata) and, for longitudinal projects, the instrument–event mapping (event_form) when available.
Usage
redcap_data(
data_path = NA,
dic_path = NA,
event_path = NA,
uri = NA,
token = NA,
filter_field = NULL,
survey_fields = FALSE
)
Arguments
data_path |
Path to exported R file (use with |
dic_path |
Path to the dictionary file (CSV or XLSX; use with |
event_path |
Path to the event-form mapping file (CSV or XLSX) for longitudinal projects (downloadable via the |
uri |
REDCap API base URI (use with |
token |
REDCap API token (use with |
filter_field |
Optional character vector of field names to request from the API. |
survey_fields |
Logical; include survey-related fields when pulling via API. Default |
Details
Two import modes are supported:
-
Exported files — use
data_path(REDCap R export) anddic_path(dictionary CSV/XLSX). -
API — use
uriandtokento pull data and metadata directly from REDCap.
If the project is longitudinal, provide event_path (instrument–event mapping)
or the function will attempt to fetch it from the API when using API mode.
Steps for using exported data in REDCap:
Use the REDCap Export Data function and choose R Statistical Software format.
REDCap generates:
A CSV file with observations.
An R script to format variables for import.
Ensure the exported files, dictionary, and event mapping (if any) are in the same directory.
Value
A list with:
-
data: Imported dataset. -
dictionary: Variable dictionary (project metadata). -
event_form: Event-form mapping for longitudinal projects (if applicable).
Note
To use other package functions effectively, include the dic_path argument to load the project dictionary.
Use either exported-files mode (
data_path+dic_path) or API mode (uri+token) — not both.For exported files, REDCap's R export is required for
data_path. Dictionary and event files must be CSV or XLSX.
Examples
## Not run:
# From exported files
out <- redcap_data(
data_path = "project_export.r",
dic_path = "project_dictionary.csv",
event_path = "instrument_event_map.csv"
)
# From API
out_api <- redcap_data(
uri = "https://redcap.example.org/api/",
token = "REPLACE_WITH_TOKEN"
)
## End(Not run)
Round Numbers to a Specified Number of Digits —-
Description
This function rounds numeric values to the specified number of decimal digits,
mimicking the behavior of the base R round() function but implemented manually.
Usage
round(x, digits)
Arguments
x |
A numeric vector to be rounded. |
digits |
Integer indicating the number of decimal places to round to. |
Value
A numeric vector rounded to the specified number of digits.
Examples
round(3.14159, 2)
round(c(-2.718, 3.14159), 1)
Creation of a Data Frame with Variables from All Forms of a Specified Event
Description
This function generates a nested dataset filtered by each event, containing only the variables associated with each event. It uses the provided data, dictionary, and event-form mapping. You can choose to return data for a specific event.
Usage
split_event(data, dic, event_form, which = NULL)
Arguments
data |
Data frame containing data from REDCap. |
dic |
Data frame containing the dictionary read from REDCap. |
event_form |
Data frame containing the correspondence of each event with each form. |
which |
Character string specifying an event if only data for that event is desired. |
Creation of a Data Frame with Variables from a Specified Form
Description
This function generates a nested dataset containing only the variables associated with each form, using the provided data, dictionary, and event-form mapping. You can choose to return data for a specific form.
Usage
split_form(data, dic, event_form = NULL, which = NULL, wide = FALSE)
Arguments
data |
Data frame containing data from REDCap. |
dic |
Data frame containing the dictionary read from REDCap. |
event_form |
Data frame containing the correspondence of each event with each form. |
which |
Character string specifying a form if only data for that form is desired. |
wide |
Logical indicating if the dataset should be returned in a wide format ( |
Convert Variables to Factors
Description
This function converts all variables in the dataset to factors, except those specified in the exclude parameter.
Usage
to_factor(data, dic, exclude = NULL)
Arguments
data |
Data frame containing the REDCap data. |
dic |
Data frame containing the REDCap dictionary. |
exclude |
Character vector specifying the names of variables that should not be converted to factors. If |
Transformation of Checkboxes with Branching Logic
Description
This function inspects all the checkboxes in the study to determine if they have a branching logic. If a branching logic is present and its result is missing, the function will input a missing value into the checkbox. If checkbox_na is TRUE, the function will additionally input a missing value when the branching logic isn't satisfied, not just when it is missing. If a branching logic cannot be found or the logic cannot be transcribed due to the presence of smart variables, the variable is added to a list of reviewable variables that will be printed.
The function returns the dataset with the transformed checkboxes and a table summarizing the results.
Usage
transform_checkboxes(data, dic, event_form = NULL, checkbox_na = FALSE)
Arguments
data |
Data frame containing data from REDCap. |
dic |
Data frame containing the dictionary read from REDCap. |
event_form |
Data frame containing the correspondence of each event with each form. |
checkbox_na |
Logical indicating if values of checkboxes with branching logic should be set to missing only when the branching logic is missing ( |