The CohortContrast package is designed to facilitate cohort exploration. It accepts a target cohort of any size from your OMOP CDM instance and allows you to provide a custom control cohort or generate one using matching or inverse controls. The package performs concept-level enrichment analysis and provides a handy GUI for visualization and concept mapping.
The package also includes bundled example studies that can be used to explore the file structure and downstream workflows without connecting to a live database:
exampleStudies <- c("lc500", "lc500s")
exampleRoot <- system.file("example", "st", package = "CohortContrast")
data.frame(
study = exampleStudies,
mode = vapply(
unname(file.path(exampleRoot, exampleStudies)),
function(path) CohortContrast::checkDataMode(path)$mode,
character(1)
),
row.names = NULL
)
#> study mode
#> 1 lc500 patient
#> 2 lc500s summaryThese built-in studies are useful for trying the Viewer and for reproducing the examples used throughout the package documentation.
The CDMConnector
package is used to establish the database connection for running
CohortContrast. You can configure the connection either by
reading credentials from a .Renviron file or explicitly
writing them in your script.
################################################################################
#
# Initiate the database connection
#
#################################################################################
port <- as.integer(Sys.getenv("DB_PORT"))
cdmSchema <-
Sys.getenv("OHDSI_CDM") # Schema which contains the OHDSI Common Data Model
cdmVocabSchema <-
Sys.getenv("OHDSI_VOCAB") # Schema which contains the OHDSI Common Data Model vocabulary tables.
cdmResultsSchema <-
Sys.getenv("OHDSI_RESULTS") # Schema which contains "cohort" table (is not mandatory)
writeSchema <-
Sys.getenv("OHDSI_WRITE") # Schema for temporary tables, will be deleted
writePrefix <- "cc_"
db = DBI::dbConnect(
RPostgres::Postgres(),
dbname = Sys.getenv("DB_NAME"),
host = Sys.getenv("DB_HOST"),
user = Sys.getenv("DB_USERNAME"),
password = Sys.getenv("DB_PASSWORD"),
port = port
)
cdm <- CDMConnector::cdmFromCon(
con = db,
cdmSchema = cdmSchema,
achillesSchema = cdmResultsSchema,
writeSchema = c(schema = writeSchema, prefix = writePrefix)
)If you plan to use the interactive viewer
(runCohortContrastViewer()) or summary precompute
(precomputeSummary()), configure Python and install
required Python packages:
If you are deploying on an offline or air-gapped environment, see the supplementary article Air-gapped server setup:
vignette("a10_air_gapped_server_setup", package = "CohortContrast")For a study-output interpretation example using bundled data:
vignette("a11_interpreting_results_lc500", package = "CohortContrast")vignette("a12_interpreting_summary_results_lc500s", package = "CohortContrast")Let´s say we want to explore our cohort of lung cancer patients. We can import this target cohort multiple ways:
If your cohort is defined in ATLAS, you can use its unique cohort ID. Ensure the cohort is generated within ATLAS on the CDM instance.
################################################################################
#
# Create target table from OMOP CDM instance (ATLAS's cohort id)
#
#################################################################################
cohortsTableName = 'cohort'
targetCohortId = 1403
targetTable <- CohortContrast::cohortFromCohortTable(cdm = cdm, db = db,
tableName = cohortsTableName, schemaName = cdmResultsSchema, cohortId = targetCohortId)If you have the JSON expression of the cohort (exportable from
ATLAS), you can import the cohort directly. The package also ships
example JSON cohorts in inst/example/example_json/, for
example inst/example/example_json/lung/cohort.json
(installed path shown below).
################################################################################
#
# Create target table from a JSON
#
#################################################################################
pathToJSON = '/path/to/atlas_json'
targetTable <- CohortContrast::cohortFromJSON(pathToJSON = pathToJSON, cdm = cdm)# Example bundled with CohortContrast:
exampleJsonDir <- system.file("example/example_json/lung", package = "CohortContrast")
list.files(exampleJsonDir, pattern = ".json$", full.names = TRUE)
# [1] ".../example_json/lung/cohort.json"
targetTable <- CohortContrast::cohortFromJSON(pathToJSON = exampleJsonDir, cdm = cdm)We can import the target table from a CSV. The file should have the same columns as present inside a cohort table (cohort_definition_id, subject_id, cohort_start_date, cohort_end_date).
################################################################################
#
# Create target table from a CSV
#
#################################################################################
pathToCsv = '/path/to/cohort.csv'
targetTable <- CohortContrast::cohortFromCSV(pathToCsv = pathToCsv, cohortId = 2)We can also use a table in our memory.
################################################################################
#
# Create target table
#
#################################################################################
# Create a cohort table with a 1-year follow-up window after cohort start
data <- tibble::tribble(
~cohort_definition_id, ~subject_id, ~cohort_start_date,
1, 4804, as.Date('1997-03-23'),
1, 4861, as.Date('1982-06-02'),
1, 1563, as.Date('1977-06-25'),
1, 2830, as.Date('2006-08-11'),
1, 1655, as.Date('2004-09-29'),
2, 5325, as.Date('1982-06-02'),
2, 3743, as.Date('1997-03-23'),
2, 2980, as.Date('2004-09-29'),
2, 1512, as.Date('2006-08-11'),
2, 2168, as.Date('1977-06-25')
) |>
dplyr::mutate(cohort_end_date = cohort_start_date + 365)
targetTable <- CohortContrast::cohortFromDataTable(data = data, cohortId = 2)For cases 2-4 you do not have to specify the cohortId
parameter inside the function call, but when multiple cohorts present it
is advised.
The control cohort is a cohort that the target cohort is compared against. This means we will check the proportions of each concept occurrence between the two cohorts. The result of the analysis is heavily dependent on both of them, therefore they should be selected with care.
The control cohort can be generated the same way the target cohort
has been shown to be generated in examples 1-4. But there are a few
automatic ways the package CohortContrast provides:
One of the scientific approaches is to select matches from the
database (based on age and sex). There are parameters such as
ratio which shown how many controls we want for each case.
Also min (at least that many matches error otherwise) and
max (at maximum n matches) parameters can be used.
################################################################################
#
# Create control cohort table based on matches
#
#################################################################################
controlTable = CohortContrast::createControlCohortMatching(cdm = cdm, targetTable = targetTable, ratio = 4)Sometimes it makes the most sense to use inverse controls (the same subjects during observation period not described in target cohort). This is the case for example if we want to see the contrast after a diagnosis.
################################################################################
#
# Create control cohort table based on inverse controls
#
#################################################################################
controlTable = CohortContrast::createControlCohortInverse(cdm = cdm, targetTable = targetTable)If you have constructed the cohorts by hand it is strongly advised to check and resolve overlap conflicts as well as conflicts with observation period inside the OMOP CDM.
################################################################################
#
# Resolve conflicts
#
#################################################################################
targetTable = CohortContrast::resolveCohortTableOverlaps(cdm = cdm, cohortTable = targetTable)
controlTable = CohortContrast::resolveCohortTableOverlaps(cdm = cdm, cohortTable = controlTable)