Data:
Mmat_champs
: Uncertainty-quantified
misclassification matrix estimates of computer-coded
verbal autopsy (CCVA) algorithms based on the CHAMPS data (Will be updated
periodically)
Example Data: Individual-level cause of death data in COMSA-Mozambique (Public Version)
comsamoz_public_openVAout
: Specific
(High-resolution) causes
comsamoz_public_broad
: Broad
(Low-resolution) causes
Functions:
vacalibration()
: Main function for
VA-Calibration
Other functions:
cause_map()
: Maps specific causes to broad
causes
modular.vacalib()
: Implements the modular
VA-calibration (see Section 3.8 in Pramanik
et al. (2025))
For the purpose of illustration,
comsamoz_public_openVAout
and
comsamoz_public_broad
contain publicly available
(deidentified) individual-level cause of death (COD) data for neonates
aged 0–27 days from the Countrywide Mortality Surveillance for Action
project in Mozambique (COMSA-Mozambique). The cause in this example data
are obtained using the InSilicoVA algorithm.
These are assigned specific (high-resolution) COD assigned
by InSilicoVA algorithm to 2016 WHO Verbal Autopsy Questionnaire data
for neonate in COMSA-Mozambique. This is obtained using the
crossVA()
function in the openVA
package.
data(comsamoz_public_openVAout) # load data in R environment
class(comsamoz_public_openVAout) # list
names(comsamoz_public_openVAout) # different components
comsamoz_public_openVAout$age_group # age group
comsamoz_public_openVAout$va_algo # algorithm
head(comsamoz_public_openVAout$data) # head of the specific COD data
# for these 6 individuals, the causes of deaths are "Other and unspecified neonatal CoD",
# "Birth asphyxia", "Neonatal sepsis", "Birth asphyxia", "Birth asphyxia", "Neonatal sepsis"
These are assigned broad (low-resolution) COD for the same
deaths in the above specific COD data
comsamoz_public_openVAout
. This is obtained using the
cause_map()
function in this package.
comsamoz_public_openVAout
and
comsamoz_public_broad
are of the same format (a list with
components "data"
, "age_group"
,
"va_algo"
, and "version"
).
Broad causes are as below for each age group:
neonate
:
"congenital_malformation"
"pneumonia"
"sepsis_meningitis_inf"
(sepsis/meningitis/infections)"ipre"
(intrapartum-related events)"other"
"prematurity"
.child
:
"malaria"
"pneumonia"
"diarrhea"
,"severe_malnutrition"
,"hiv"
,"injury"
,"other"
,"other_infections"
, and"nn_causes"
(neonatal causes; consists of IPRE,
congenital malformation, and prematurity).This stores estimates of misclassification matrices for different computer-coded verbal autopsy (CCVA) algorithms, age groups, and countries based on the COD data from the CHAMPS project.
CHAMPS Data: The Child Health and Mortality Prevention Surveillance (CHAMPS) Network gathers premortem clinical and laboratory data, along with postmortem verbal autopsy (VA) and minimally invasive tissue sampling (MITS), from sites in Bangladesh, Ethiopia, Kenya, Mali, Mozambique, Sierra Leone, and South Africa. A panel of physicians and scientists uses the diagnostic test results and clinical records to ascertain a causal chain. This creates a limited paired COD data from two diagnoses: a gold standard (CHAMPS cause from here on) and one based on VA.
Estimates: We model this data using the
efficient country-specific misclassification matrix modeling framework
proposed in Pramanik
et al. (2025). Mmat_champs
stores estimates from this
modeling for
two age groups: "neonate"
for 0-27 days,
"child"
for 1-59 months.
three CCVA algorithms: "eava"
for EAVA,
"insilicova"
for InSilicoVA, and "interva"
for
InterVA.
eight countries: "Bangladesh"
,
"Ethiopia"
, "Kenya"
, "Mali"
,
"Mozambique"
, "Sierra Leone"
, and
"South Africa"
. It also has an estimate for
"other"
for all countries outside CHAMPS.
Mmat_champs
is a nested list. For example,
Mmat_champs$neonate$eava$postsumm$Mozambique
contains
posterior summaries of misclassification estimates for neonates based on
EAVA algorithm in Mozambique. Similarly,
Mmat_champs$neonate$eava$postmean$Mozambique
and
Mmat_champs$neonate$eava$asDirich$Mozambique
contain
posterior mean and Diichlet approximation of the posterior.
For any age group, algorithm, and country, the posterior estimates are stored in three formats:
"postsumm"
: Array of posterior summary
X CHAMPS broad cause
X VA broad cause
.
Posterior summaries are mean
(posterior mean),
min
(minimum), 2.5%
(2.5% percentile),
25%
(25% percentile), 50%
(50% percentile),
75%
(75% percentile), 97.5%
(97.5%
percentile), and max
(maximum).
For example,
Mmat_champs$neonate$eava$postsumm$Mozambique[,"pneumonia",]
are posterior summaries for CHAMPS cause "pneumonia"
. Rows
are posterior summaries. Columns are VA predicted broad causes.
"postmean"
: Matrix of
CHAMPS broad cause
X VA broad cause
. Contains
posterior mean of misclassification matrices.
Mmat_champs$neonate$eava$postmean$Mozambique["pneumonia",]
are posterior means for CHAMPS cause "pneumonia"
."asDirich"
: Matrix of
CHAMPS broad cause
X VA broad cause
. Stores
concentration (or scale) parameters of Dirichlet distribution that best
approximates the posterior distibution of misclassification matrices
based on the CHAMPS data.
Mmat_champs$neonate$eava$asDirich$Mozambique["pneumonia",]
best approximate the misclassification posterior for CHAMPS cause
"pneumonia"
.vacalibration()
is the main function for implementing
VA-calibration, where VA-only data can be input either as specific cause
(e.g., comsamoz_public_openVAout
), or broad cause (e.g.,
comsamoz_public_broad
), or broad-cause-specific death
counts.
calib_out_specific = vacalibration(va_data = setNames(list(comsamoz_public_openVAout$data),
list(comsamoz_public_openVAout$va_algo)),
age_group = comsamoz_public_openVAout$age_group,
country = "Mozambique")
Below is how we can compare uncalibrated CSMF estimates and posterior summary of calibrated CSMF estimates:
#################################### uncalibrated ####################################
round(calib_out_specific$p_uncalib, 3) # specific cause
round(calib_out_broad$p_uncalib, 3) # broad cause
round(calib_out_deathcount$p_uncalib, 3) # broad-cause-specific death count
#################################### calibrated ####################################
round(calib_out_specific$pcalib_postsumm["insilicova",,], 3) # specific cause
round(calib_out_broad$pcalib_postsumm["insilicova",,], 3) # broad cause
round(calib_out_deathcount$pcalib_postsumm["insilicova",,], 3) # broad-cause-specific death count
# default
calib_out_specific = vacalibration(va_data = setNames(list(comsamoz_public_openVAout$data),
list(comsamoz_public_openVAout$va_algo)),
age_group = comsamoz_public_openVAout$age_group,
country = "Mozambique")
# misclassification estimates provided by user
calib_out_specific_mmat = vacalibration(va_data = setNames(list(comsamoz_public_openVAout$data),
list(comsamoz_public_openVAout$va_algo)),
Mmat.asDirich = setNames(list(Mmat_champs[[comsamoz_public_openVAout$age_group]][[comsamoz_public_openVAout$va_algo]]$asDirich[["Mozambique"]]),
list(comsamoz_public_openVAout$va_algo)),
age_group = comsamoz_public_openVAout$age_group,
country = "Mozambique")
Below is a comparison of uncalibrated and calibrated CSMF estimates
#################################### uncalibrated ####################################
round(calib_out_specific$p_uncalib, 3) # default
round(calib_out_specific_mmat$p_uncalib, 3) # user provided misclassification estimate
#################################### calibrated ####################################
round(calib_out_specific$pcalib_postsumm["insilicova",,], 3) # default
round(calib_out_specific_mmat$pcalib_postsumm["insilicova",,], 3) # user provided misclassification estimate
For example, let below are broad-cause-specific death counts based on EAVA and InSilicoVA among neonate in Mozambique:
va_data_example = list("eava" = c("congenital_malformation" = 40, "pneumonia" = 175,
"sepsis_meningitis_inf" = 265, "ipre" = 220,
"other" = 30, "prematurity" = 170),
"insilicova" = c("congenital_malformation" = 5, "pneumonia" = 145,
"sepsis_meningitis_inf" = 370, "ipre" = 330,
"other" = 60, "prematurity" = 290))
The data can be similarly input as above. When multiple algorithms
are provided, vacalibration()
by default performs
algorithm-specific calibration and an ensemble calibration that combines
all algorithms to provide a calibrated CSMF estimate for the
population.
calib_out_ensemble = vacalibration(va_data = va_data_example,
age_group = "neonate", country = "Mozambique")
Here is a comparison of uncalibrated, and algorithm-specific and ensemble calibration:
round(calib_out_ensemble$p_uncalib, 3) # uncalibrated
round(calib_out_ensemble$pcalib_postsumm["eava",,], 3) # EAVA-specific calibration
round(calib_out_ensemble$pcalib_postsumm["insilicova",,], 3) # InSilicoVA-specific calibration
round(calib_out_ensemble$pcalib_postsumm["ensemble",,], 3) # Ensemble calibration
Set ensemble = F
to turn off ensemble calibration in
vacalibration()
.