Title: | Multivariate Age-Period-Cohort (MAPC) Modeling for Health Data |
Version: | 0.1.0 |
Description: | Bayesian multivariate age-period-cohort (MAPC) models for analyzing health data, with support for model fitting, visualization, stratification, and model comparison. Inference focuses on identifiable cross-strata differences, as described by Riebler and Held (2010) <doi:10.1093/biostatistics/kxp037>. Methods for handling complex survey data via the 'survey' package are included, as described in Mercer et al. (2014) <doi:10.1016/j.spasta.2013.12.001>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
Imports: | dplyr, tidyselect, fastDummies, stringr, rlang, tidyr, ggplot2, viridis, scales, purrr, grid, gridExtra, ggpubr, tibble, survey |
URL: | https://github.com/LarsVatten/MAPCtools |
BugReports: | https://github.com/LarsVatten/MAPCtools/issues |
Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown, INLA |
Additional_repositories: | https://inla.r-inla-download.org/R/stable/ |
Config/testthat/edition: | 3 |
Depends: | R (≥ 3.5) |
NeedsCompilation: | no |
Packaged: | 2025-06-23 15:06:36 UTC; lavat |
Author: | Lars Vatten [aut, cre] |
Maintainer: | Lars Vatten <lavatt99@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-06-25 15:40:02 UTC |
Aggregate binomial data
Description
Aggregates binomial data using sufficient statistics for binomial samples.
For a sample \boldsymbol{y} = \{y_1, \dots, y_n\}
with y_i \sim \text{Bin}(n, p)
, the sample is aggregated into the sufficient statistic
s = \sum_{i=1}^n y_i
.
Usage
abinomial(data)
Arguments
data |
Binomial data vector. |
Value
Aggregated binomial data.
Add 1-indexed age, period and cohort indices via match()
Description
Add 1-indexed age, period and cohort indices via match()
Usage
add_APC_by_match(df, age_name, period_name, age_order, period_order, M = 1)
Arguments
df |
Data frame |
age_name |
Name of the age (or age-group) column (string). |
period_name |
Name of the period (e.g. year) column (string). |
age_order |
Character vector giving the desired ordering of age levels |
period_order |
Vector (numeric or character) giving the desired ordering of periods |
M |
Grid factor, defined as the ratio of age interval width to period interval width. |
Value
Data frame with new columns: age_index, period_index, cohort_index
Add cohort column to data frame
Description
Adds a column for birth cohorts to a data frame, derived from specified age and period columns through the relation cohort = period - age
.
Usage
add_cohort_column(data, age, period, cohort_name = "cohort")
Arguments
data |
Data frame with age and period column. |
age |
Age column in |
period |
Period column in |
cohort_name |
Name of the cohort column to be created. Defaults to |
Value
Data frame with additional column for birth cohorts added.
Add cohort column to data frame
Description
Adds a column for cohort indices to a data frame, derived from specified age and period index columns through the relationship cohort index = period index - age index + max(age index)
.
Usage
add_cohort_index(data, age_index, period_index, cohort_name = "cohort_index")
Arguments
data |
Data frame with age and period columns. |
age_index |
Age index column in |
period_index |
Period index column in |
cohort_name |
Name of the cohort index column to be created. Defaults to |
Value
Data frame with additional column for cohort indices.
Aggregate Gaussian data
Description
Aggregates Gaussian data using sufficient statistics for Gaussian samples.
For a sample \boldsymbol{y} = \{y_1, \dots, y_n\}
with y_i \sim \mathcal{N(\mu, (s_i \tau)^{-1})}
, i=1, \dots, n
, the sample is aggregated into the sufficient statistic
(v, \frac{1}{2} \sum_{i=1}^n \log(s_i), m, n, \bar{y})
,
with
m = \sum_{i=1}^n s_i
\quad
\bar{y} = \frac{1}{m} \sum{i=1}^n s_iy_i
\quad
v = \frac{1}{m} \sum_{i=1}^n s_i y_i^2 - \bar{y}^2
.
For a short derivation of the sufficient statistic, attach the INLA package (library(INLA)
) and run inla.doc("agaussian")
.
Usage
agaussian(data, precision.scale = NULL)
Arguments
data |
Gaussian data, must be a numeric vector. |
precision.scale |
Scales for the precision of each Gaussian observation. |
Value
Aggregated Gaussian data, in an inla.mdata
object, which is compatible with the agaussian
family in INLA.
Aggregate data across an entire data frame using sufficient statistics
Description
Aggregates specified columns of a data frame into summarizing statistics, preserving the potentially complex structure returned by aggregator functions (like data frames or inla.mdata objects) within list-columns. Aggregation is performed according to sufficient statistics for the specified distribution of the columns. Possible distributions: Gaussian, binomial. This function aggregates the entire data frame into a single row result.
Usage
aggregate_df(
data,
gaussian = NULL,
gaussian.precision.scales = NULL,
binomial = NULL
)
Arguments
data |
A data frame. |
gaussian |
Gaussian columns in |
gaussian.precision.scales |
Scales for the precision of Gaussian observations.
|
binomial |
Binomial columns in |
Value
A single-row data frame (tibble) containing:
A column
n
with the total number of rows in the input data.For each specified column in
gaussian
,binomial
, a corresponding list-column (named e.g.,colname_gaussian
,colname_binomial
. Each element of these list-columns can be accessed by using the$
operator twice, e.g. throughdata$colname_gaussian$Y1
for the first element of the Gaussian summary.
Aggregate grouped data using aggregate_df
Description
Aggregates a grouped data frame into summarizing statistics within groups by
applying the aggregate_df
function to each group.
Aggregation is performed according to sufficient statistics for the specified
distribution of the columns to be aggregated.
Usage
aggregate_grouped_df(
data,
by,
gaussian = NULL,
gaussian.precision.scales = NULL,
binomial = NULL
)
Arguments
data |
Data frame to be grouped and aggregated. |
by |
Columns in |
gaussian |
Gaussian columns in |
gaussian.precision.scales |
Scales for the precision of Gaussian observations. |
binomial |
Binomial columns in |
Value
Aggregated data frame (tibble), with one row per group, containing
grouping variables, count n
per group, and aggregated list-columns for
specified variables as returned by aggregate_df
.
Aggregate multinomial data. Used in aggregate_df
.
Description
Aggregates multinomial data into sufficient statistics for multinomial samples.
Converts input data to character before processing.
For a sample \boldsymbol{y} = \{y_1, \dots, y_n\}
with y_i \in \{1, \dots, K\}
, P(y_i = k) = p_k, k=1, \dots, K
, the sample is aggregated into the sufficient statistic
\boldsymbol{s} = (s_1, \dots, s_{K-1})
where
s_k = \sum_{i=1}^n \mathbb{I}(y_i = k)
for k = 1, \dots, K-1
.
(The last category is omitted due to the sum-to-one constraint)
Usage
amultinomial(data, col_name, all_categories = NULL)
Arguments
data |
A vector containing the multinomial observations (will be coerced to character). |
col_name |
A character string giving the name of the column (primarily for context/error messages, less critical now). |
all_categories |
A character vector with the names or levels of all possible categories in the multinomial distribution (must include all observed values after coercion to character). |
Value
A one-row data frame containing counts for each of the first K - 1
categories.
Create NA structure across age, period and cohort groups based on strata
Description
Creates a data frame where age, period, and cohort values are placed into
columns specific to their stratum (defined by stratify_var
), with other
strata combinations marked as NA. This structure is often useful for
specific modeling approaches, like certain Age-Period-Cohort (APC) models.
Optionally includes unique indices for random effects.
Usage
as.APC.NA.df(data, stratify_by, age, period, cohort, include.random = FALSE)
Arguments
data |
Data frame with age, period, cohort, and stratification columns. |
stratify_by |
Stratification variable column. This column will be used to create the stratum-specific NA structure. It should ideally be a factor or character vector. |
age |
Age column in |
period |
Name of the period column (must be a numeric/integer column). |
cohort |
Name of the cohort column (must be a numeric/integer column). |
include.random |
Logical. Whether to include a unique index ('random') for each combination of age, period, and stratum, potentially for use as random effect identifiers in models. Defaults to FALSE. |
Value
A data frame containing the original age
, period
,
cohort
, and stratify_by
columns, plus:
Dummy indicator columns for each level of
stratify_by
(e.g.,Region_North
,Region_South
ifRegion
was a stratifying variable).Stratum-specific age, period, and cohort columns (e.g.,
age_Region_North
,period_Region_North
,cohort_Region_North
), containing the respective value if the row belongs to that stratum, andNA
otherwise.If
include.random = TRUE
, a column namedrandom
with unique integer indices. The rows are ordered primarily by the stratification variable levels. This is useful for defining random components in MAPC models.
Add 1-indexed APC columns to data frame, handling numeric or categorical age/period
Description
Add 1-indexed APC columns to data frame, handling numeric or categorical age/period
Usage
as.APC.df(data, age, period, age_order = NULL, period_order = NULL, M = 1)
Arguments
data |
Data frame with age and period columns. |
age |
Age column in |
period |
Period column in |
age_order |
(Optional) Character vector giving the desired order of age levels.
If NULL and the |
period_order |
(Optional) Vector (numeric or character) giving the desired order of periods.
If NULL and |
M |
Grid factor, defined as the ratio of age interval width to period interval width. Defaults to 1 (i.e. assuming equal sized age and period increments). |
Value
The data frame with new columns \code{age_index}, \code{period_index}, \code{cohort_index}, and sorted by \code{(age_index, period_index)}.
Aggregate binomial data
Description
Aggregates binomial data into sufficient statistics for binomial samples. Uses abinomial
.
Usage
binomial_aggregator(data, col_name)
Arguments
data |
Data frame with binomial data column. |
col_name |
Binomial data column. |
Value
Aggregated binomial data column.
Check if a set of columns is missing from a data frame. For use in aggregate_df
.
Description
Check if a set of columns is missing from a data frame. For use in aggregate_df
.
Usage
check_cols_exist(cols, df)
Arguments
cols |
String or vector of strings that is the name of the columns. |
df |
Data frame |
Value
Nothing. Casts an error message if any of the columns are missing.
Clamp a numeric value within bounds
Description
Internal helper to restrict values between a lower and upper bound.
Usage
clamp(x, lower, upper)
Arguments
x |
A numeric vector. |
lower |
Lower bound. |
upper |
Upper bound. |
Value
A numeric vector with values clamped.
Compute dynamic pretty breaks for continuous x-axis
Description
Internal helper to compute breaks on a numeric scale, ensuring the number of breaks is in a desired range and aligned with data bounds.
Usage
dynamic_pretty_breaks(x, target_n = 8, max_breaks = 12, min_breaks = 3)
Arguments
x |
A numeric vector. |
target_n |
Target number of breaks. |
max_breaks |
Maximum allowed number of breaks. |
min_breaks |
Minimum allowed number of breaks. |
Value
A numeric vector of break points.
Compute dynamic pretty breaks for discrete x-axis
Description
Internal helper to select a well-spaced subset of factor levels or unique strings for axis labeling on a discrete scale.
Usage
dynamic_pretty_discrete_breaks(
x,
target_n = 8,
max_breaks = 12,
min_breaks = 3
)
Arguments
x |
A factor or character vector. |
target_n |
Target number of breaks. |
max_breaks |
Maximum allowed number of breaks. |
min_breaks |
Minimum allowed number of breaks. |
Value
A character vector of selected breaks.
Find expected groups based on distinct values across a set of variables
Description
Given a data frame and a set of discrete (or factor) variables, returns all combinations of their observed levels and the list of levels.
Usage
expected_groups(data, vars)
Arguments
data |
A data frame whose columns you want to examine. |
vars |
Character vector of column names in |
Value
A named list with two elements:
- grid
A data.frame where each row is one combination of the variable levels (equivalent to what
expand.grid
would produce).- levels
A named list; for each variable in
vars
it gives the sorted unique values (or factor levels) observed indata
.
Fit a multivariable age-period-cohort model
Description
Fit a Bayesian multivariate age-period-cohort model, and obtain posteriors for identifiable cross-strata contrasts. The method is based on Riebler and Held (2010) doi:10.1093/biostatistics/kxp037. For handling complex survey data, we follow Mercer et al. (2014) doi:10.1016/j.spasta.2013.12.001, implemented using the survey package.
Usage
fit_MAPC(
data,
response,
family,
apc_format,
stratify_by,
reference_strata = NULL,
age,
period,
grid.factor = 1,
apc_prior = "rw1",
extra.fixed = NULL,
extra.random = NULL,
extra.models = NULL,
extra.hyper = NULL,
include.random = FALSE,
binomial.n = NULL,
poisson.offset = NULL,
inla_formula = NULL,
lincombs = NULL,
survey.design = NULL,
apc_hyperprior = NULL,
control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE),
verbose = FALSE
)
Arguments
data |
A data frame containing the age, period, response, and stratification variables.
Age and period are assumed to be on the raw scale, not transformed to 1-indexed index columns.
Factor/character columns are handled, as long as they are properly sorted by |
response |
A string naming the response (outcome) variable in |
family |
A string indicating the likelihood family. The default is |
apc_format |
A specification of the APC structure, with options:
Note: It is also possible to specify models with only one or two time effects, by omitting the letters corresponding to the time effects to be excluded. |
stratify_by |
A string naming the column in |
reference_strata |
Level of |
age |
Name of the age variable in |
period |
Name of the period variable in |
grid.factor |
(Optional) Grid factor, defined as the ratio of age interval width to period interval width; defaults to 1. |
apc_prior |
(Optional) A string specifying the prior for the age, period, and cohort effects (e.g. |
extra.fixed |
(Optional) If desired, the user can specify additional fixed effects to be added. This is passed as a character argument,
specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names.
Defaults to |
extra.random |
(Optional) If desired, the user can specify additional random effects to be added. This is passed as a character argument,
specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names.
Defaults to |
extra.models |
(Optional) If the user specifies one or more additional random effects to be added in |
extra.hyper |
(Optional) If the user specifies one or more additional random effects to be added in |
include.random |
(Optional) Logical; if |
binomial.n |
(Optional) For the |
poisson.offset |
(Optional) For the |
inla_formula |
(Optional) If desired, the user can pass its own INLA-compatible formula to define the model. If not, a formula is generated automatically, with the models and priors defined. |
lincombs |
(Optional) If desired, the user can pass its own INLA-compatible linear combinations to be computed by the |
survey.design |
(Optional) In the case of complex survey data, explicit handling of unequal sampling probabilities can be required.
The user can pass a |
apc_hyperprior |
(Optional) If the user wants non-default hyperpriors for the time effects, this can be achieved by passing the entire
prior specification as a string. If e.g. |
control.compute |
(Optional) A list of control variables passed to the |
verbose |
(Optional) This is argument is passed along to the |
Details
This function works as a wrapper around the inla()
-function from the INLA
package, which executes the model fitting procedures using Integrated Neste Laplace Approximations.
The returned object is of class mapc
. S3 methods are available for:
-
print()
: Displays a concise summary of the model, including the APC format used, CPU time, number of estimated parameters (fixed, random, hyperparameters, linear combinations), and model fit scores (DIC, WAIC, log-score). -
summary()
: Prints detailed posterior summaries of all estimated components, including fixed effects, random effects, hyperparameters, and linear combinations, as estimated by theinla()
-function. -
plot()
: Visualizes model estimates of cross-stata contrast trends, using precomputed plots stored in the object. The available plots depends on the APC-format that was used. You can control which effects to plot using thewhich
argument (e.g.which="age"
orwhich=c("age", "period")
).
Value
An named list, containing the following arguments:
model_fit
An object of class
"inla"
, containing posterior densities, posterior summaries, measures of model fit etc. See documentation for theinla()
-function for details.plots
A named list of plots for each time effect. Extract them as
plots\$age
/plots\$period
plots\$cohort
.
References
Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319-392. doi:10.1111/j.1467-9868.2008.00700.x See also https://www.r-inla.org for more information about the INLA method and software.
See Also
fit_all_MAPC
for fitting multiple models at once,
and the function inla()
from the INLA
package for the estimation machinery.
For complex survey data, see svydesign
for the creation of a survey design object which can be passed to survey.design
.
Examples
data("toy_data")
fit <- fit_MAPC(
data = toy_data,
response = count,
family = "poisson",
apc_format = "ApC",
stratify_by = education,
reference_strata = 1,
age = age,
period = period
)
# Print concise summary of the MAPC fit and the estimation procedure
print(fit)
# Plot estimated cross-strata contrast trends
plot(fit)
# Optional: view full summary of the model (can be long)
# summary(fit)
Fit all configurations of MAPC models using INLA
Description
Fits all configurations of shared vs. stratum-specific time effects:
- APc
Shared age and period effects, stratum-specific cohort effects.
- ApC
Shared age and cohort effects, stratum-specific period effects.
- aPC
Shared period and cohort effects, stratum-specific age effects.
- Apc
Shared age effects, stratum-specific period and cohort effects.
- aPc
Shared period effects, stratum-specific age and cohort effects.
- apC
Shared cohort effects, stratum-specific age and period effects.
Uses the fit_MAPC
function.
The multivariate APC model is based on Riebler and Held (2010) doi:10.1093/biostatistics/kxp037.
For handling complex survey data, we follow Mercer et al. (2014) doi:10.1016/j.spasta.2013.12.001,
implemented using the survey package.
Usage
fit_all_MAPC(
data,
response,
family,
stratify_by,
reference_strata = NULL,
age = "age",
period = "period",
grid.factor = 1,
all_models = c("apC", "aPc", "Apc", "aPC", "ApC", "APc"),
extra.fixed = NULL,
extra.random = NULL,
extra.models = NULL,
extra.hyper = NULL,
apc_prior = "rw1",
include.random = FALSE,
binomial.n = NULL,
poisson.offset = NULL,
apc_hyperprior = NULL,
survey.design = NULL,
control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE),
track.progress = FALSE,
verbose = FALSE
)
Arguments
data |
A data frame containing the age, period, response, and stratification variables.
Age and period are assumed to be on the raw scale, not transformed to 1-indexed index columns.
Factor/character columns are handled, as long as they are properly sorted by |
response |
A string naming the response (outcome) variable in |
family |
A string indicating the likelihood family. The default is |
stratify_by |
The column in |
reference_strata |
Level of |
age |
The age column in |
period |
The period column in |
grid.factor |
(Optional) Grid factor, defined as the ratio of age interval width to period interval width; defaults to 1. |
all_models |
(Optional) Character vectors of valid APC-formats (e.g. |
extra.fixed |
(Optional) If desired, the user can specify additional fixed effects to be added. This is passed as a character argument,
specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names.
Defaults to |
extra.random |
(Optional) If desired, the user can specify additional random effects to be added. This is passed as a character argument,
specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names.
Defaults to |
extra.models |
(Optional) If the user specifies one or more additional random effects to be added in |
extra.hyper |
(Optional) If the user specifies one or more additional random effects to be added in |
apc_prior |
(Optional) A string specifying the prior for the age, period, and cohort effects (e.g. |
include.random |
(Optional) Logical; if |
binomial.n |
(Optional) For the |
poisson.offset |
(Optional) For the |
apc_hyperprior |
(Optional) If the user wants non-default hyperpriors for the time effects, this can be achieved by passing the entire
prior specification as a string. If e.g. |
survey.design |
(Optional) In the case of complex survey data, explicit handling of unequal sampling probabilities can be required.
The user can pass a |
control.compute |
(Optional) A list of control variables passed to the |
track.progress |
(Optional) Whether to report progress of the estimation of models in the console; defaults to |
verbose |
(Optional) This is argument is passed along to the |
Details
The returned object is of class all_mapc
, which is a container for multiple mapc
model fits (each typically fitted with a different APC formats).
It also contains a model_selection
element, which holds plots summarizing comparative fit metrics (DIC, WAIC and log-scores).
The following S3 methods are available:
-
print()
: Prints a compact summary for each individual model fit. -
summary()
: Callssummary()
on each containedmapc
object, providing detailed posterior summaries. -
plot()
: Displays model comparison plots (DIC/WAIC/log-score comparisons).
These methods are intended to streamline multi-model workflows and allow quick comparison of results across model specifications.
Value
A named list of mapc
objects, one for each configuration of shared vs. stratum-specific time effects: APc, ApC, aPC, Apc, aPc, apC.
References
Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319-392. doi:10.1111/j.1467-9868.2008.00700.x See also https://www.r-inla.org for more information about the INLA method and software.
See Also
fit_MAPC
for fitting a single model (more flexible; can pass your own formula and lincombs),
and the function inla()
from the INLA
package for the estimation machinery.
For complex survey data, see svydesign
for the creation of a survey design object which can be passed to survey.design
.
Examples
data("toy_data")
fits <- fit_all_MAPC(
data = toy_data,
response = count,
family = "poisson",
stratify_by = education,
reference_strata = 1,
age = age,
period = period,
apc_prior = "rw2",
include.random = TRUE
)
# Print concise summary of the models and estimation procedure
print(fits)
# Plot comparison plots, based on comparative fit metrics
plot(fits)
# Optional: view full summary of all models (can be long)
# summary(fits)
Aggregate Gaussian data
Description
Aggregates numerical data frame column using sufficient statistics for Gaussian samples, into an inla.mdata
object compatible with theagaussian
likelihood in INLA
.
Uses agaussian
.
Usage
gaussian_aggregator(y, precision.scale = NULL)
Arguments
y |
Gaussian column. |
precision.scale |
Scales for the precision of each Gaussian observation. |
Value
Aggregated Gaussian column as an inla.mdata
object.
Generate MAPC formula for INLA
Description
Based on APC-format, generate the proper formula to pass to INLA for fitting MAPC models.
Usage
generate_MAPC_formula(
df,
APC_format,
response,
stratify_var,
age = "age",
period = "period",
cohort = "cohort",
intercept = FALSE,
apc_prior = "rw1",
apc_hyper = NULL,
random_term = TRUE,
extra.fixed = NULL,
extra.random = NULL,
extra.models = NULL,
extra.hyper = NULL
)
Arguments
df |
Data frame for which MAPC models should be fit |
APC_format |
A string where lower-case letters indicate stratum-specific time effects and upper-case letters indicate shared time effects. |
response |
A string, name of the column in |
stratify_var |
Stratification variable. At least one time effect should be stratum-specific, and at least one should be shared. |
age |
Name of age column |
period |
Name of period column |
cohort |
Name of cohort column |
intercept |
Boolean, indicating if an overall intercept should be included in the formula. |
apc_prior |
Which prior model to use for the time effects. |
apc_hyper |
If the user wants non-default hyperpriors for the random time effects, this can be achieved by passing the entire
prior specification as a string. If e.g. |
random_term |
Indicator, indicating if a random term should be included in the model. |
extra.fixed |
Name of additional fixed effects. |
extra.random |
Name of additional random effects. |
extra.models |
Models for additional random effects. Supported |
extra.hyper |
If the user wants non-default hyperpriors for the additional random effects, this can be achieved by passing the entire
prior specification as a string. If e.g. |
Value
A formula object that can be passed to INLA to fit the desired MAPC model.
Generate Age-Period-Cohort Linear Combinations for INLA
Description
Constructs a set of linear combinations (contrasts) for age, period, and/or cohort effects
across different strata, relative to a specified reference strata, suitable for use with
inla.make.lincomb
from the INLA
package.
Usage
generate_apc_lincombs(
apc_format,
data,
strata,
reference_strata,
age = "age",
period = "period",
cohort = "cohort"
)
Arguments
apc_format |
Character string containing any combination of
e.g. |
data |
A |
strata |
String giving the name of the factor column in |
reference_strata |
String indicating which level of |
age |
String name of the column in |
period |
String name of the column in |
cohort |
String name of the column in |
Details
For each specified dimension (a
, p
, c
), the function loops over all
unique values of age, period, or cohort in the data, and over all strata levels except
the reference. It then constructs a contrast that subtracts the effect in the reference
stratum from the effect in the other strata at each index.
Value
A named list
of linear combination objects as returned by
inla.make.lincomb()
(INLA
function). Each element corresponds to one contrast,
with names of the form “Age = x, Strata = y vs ref”, “Period = x, Strata = y vs ref”,
or “Cohort = x, Strata = y vs ref”, depending on apc_format
.
Function for finding longest consecutive run of non-missing indices
Description
Function for finding longest consecutive run of non-missing indices
Usage
longest_run(x)
Arguments
x |
A vector of indices for which to find the longest consecutive run of indices |
Value
A named list, containing arguments:
from
The first index of the longest run.
to
The last index of the longest run.
step
The stepsize of the longest run.
Make a table of model fit scores
Description
Generates a table summarizing the provided model fit scores.
Usage
model_selection_criteria_table(
dic_scores = NULL,
waic_scores = NULL,
log_scores = NULL
)
Arguments
dic_scores |
Named list of DIC scores. Names correspond to the model. |
waic_scores |
Named list of WAIC scores. Names correspond to the model. |
log_scores |
Named list of log-scores scores (derived from CPO scores). Names correspond to the model. |
Value
A table summarizing the model fit scores, as a ggplot
object.
Aggregate multinomial data (No changes needed here)
Description
Aggregates multinomial data into sufficient statistics for multinomial samples. Uses amultinomial
.
Usage
multinomial_aggregator(df, col_name, all_categories = NULL)
Arguments
df |
Data frame with multinomial data column. |
col_name |
Name of column with multinomial data. |
all_categories |
Character vector of names of categories in multinomial distribution. |
Value
Aggregated multinomial data column (a one-row data frame).
Count number of groups across a set of variables in a data frame
Description
Counts number of groups across specified grouping and stratification variables in a data frame. At least one grouping or stratification variable must be provided.
Usage
number_of_groups(df, group_by)
Arguments
df |
A data frame with grouping and/or stratification variables. |
group_by |
Variables in data frame that defines a grouping of the data. |
Value
Number of distinct groups and strata in the data frame.
Count number of groups across a set of variables in a data frame
Description
Counts number of groups across specified grouping and stratification variables in a data frame. At least one grouping or stratification variable must be provided.
Usage
number_of_strata(df, stratify_by)
Arguments
df |
A data frame with grouping and/or stratification variables. |
stratify_by |
Variables in data frame that defines a stratification of the data. |
Value
Number of distinct strata in the data frame.
Plot counts of observations across bins of a numeric variable, optionally stratified
Description
Bins a specified numeric variable into intervals, counts observations per value of a specified variable and bin groups, and plots lines for each bin group using ggplot2. If a stratification variable is provided, counts are calculated per strata and plotted as separate colored lines. If an additional stratification variable is provided, separate plot windows are created for each level.
Usage
plot_binned_counts(
data,
x,
bin_by,
stratify_by = NULL,
for_each = NULL,
n_bins = 8,
bin_width = NULL,
title = "Observation counts",
subtitle = NULL,
legend_title = NULL,
x_lab = NULL,
y_lab = NULL,
viridis_color_option = "D"
)
Arguments
data |
Data frame containing all input variables. |
x |
Variable in |
bin_by |
Numeric variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, counts are computed for each combination of |
for_each |
(Optional) Additional stratification variable. If supplied,
separate plot windows are created per level of |
n_bins |
(Optional) Number of bins to create across |
bin_width |
(Optional) Width of the bins created across |
title |
(Optional) Plot title; defaults to |
subtitle |
(Optional) Plot subtitle; defaults to |
legend_title |
(Optional) Legend title; defaults to name of |
x_lab |
(Optional) Label for the x-axis; defaults to the name of |
y_lab |
(Optional) Label for the y-axis; defaults to the name of |
viridis_color_option |
(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself. |
Value
If for_each
is not supplied, a ggplot object showing counts
per x
and bin groups, optionally faceted by stratify_by
. If for_each
is supplied, a named list of such plots.
See Also
plot_counts_1D
, plot_counts_2D
,
plot_counts_with_mean
, ggplot
Examples
data("toy_data")
# Counts by period, binned by age
plot_binned_counts(toy_data, x = period,
bin_by = age, n_bins = 4)
# Counts by period, binned by age, stratified by education levels
plot_binned_counts(toy_data, period,
bin_by = age, n_bins = 4,
stratify_by = education)
# Counts by period, binned by age, stratified by education levels, for each sex
plot_binned_counts(toy_data, period,
bin_by = age, n_bins = 4,
stratify_by = education, for_each = sex)
Plot counts of observations across a single variable, optionally stratified
Description
Computes the number of observations at each value of a specified variable and creates a line plot of these counts using ggplot2. If a stratification variable is provided, counts are calculated per strata and plotted as separate colored lines. If an additional stratification variable is provided, separate plot windows are created for each level.
Usage
plot_counts_1D(
data,
x,
stratify_by = NULL,
for_each = NULL,
title = "Observation counts",
subtitle = NULL,
legend_title = NULL,
x_lab = NULL,
y_lab = NULL,
viridis_color_option = "D"
)
Arguments
data |
Data frame containing all input variables. |
x |
Variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, counts are computed for each combination of |
for_each |
(Optional) Additional stratification variable. If supplied,
separate plot windows are created per level of |
title |
(Optional) Plot title; defaults to |
subtitle |
(Optional) Plot subtitle; defaults to |
legend_title |
(Optional) Legend title; defaults to name of |
x_lab |
(Optional) Label for the x-axis; defaults to the name of |
y_lab |
(Optional) Label for the y-axis; defaults to the name of |
viridis_color_option |
(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself. |
Value
A ggplot object displaying counts across the variable supplied in x
,
optionally stratified by stratify_by
. If for_each
is supplied, separate plots are created in separate windows for each level.
Visuals can be modified with ggplot2.
See Also
plot_counts_2D
, plot_binned_counts
,
plot_counts_with_mean
, ggplot
Examples
data("toy_data")
# Counts by age
plot_counts_1D(toy_data, x = age)
# Counts by age, stratified by education level
plot_counts_1D(toy_data, x = age,
stratify_by = education)
# Count by age, stratified by education level, for each sex
plot_counts_1D(toy_data, x = age,
stratify_by = education, for_each = sex)
Plot counts of observations across two dimensions, optionally stratified
Description
Computes the number of observations for each combination of two specified variables, and displays the result as a heatmap using ggplot2. If a stratification variable is provided, counts are calculated per strata and strata-specific heatmaps are displayed in individual panels. If an additional stratification variable is provided, separate plot windows are created for each level.
Usage
plot_counts_2D(
data,
x,
y,
stratify_by = NULL,
for_each = NULL,
color_gradient = c("blue", "beige", "red"),
title = "Observation counts",
legend_title = NULL,
subtitle = NULL,
x_lab = NULL,
y_lab = NULL
)
Arguments
data |
Data frame containing all input variables. |
x |
Variable in |
y |
Variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, counts are computed for each combination of |
for_each |
(Optional) Additional stratification variable. If supplied,
separate plot windows are created per level of |
color_gradient |
(Optional) Color gradient for the heatmap. Specified as a character vector of three colors, representing: c(<low_counts>, <middle_counts>, <high_counts>).
Defaults to |
title |
(Optional) Plot title; defaults to |
legend_title |
(Optional) Legend title for color gradient; defaults to "Count". |
subtitle |
(Optional) Plot subtitle; defaults to |
x_lab |
(Optional) Label for the x-axis; defaults to the name of |
y_lab |
(Optional) Label for the y-axis; defaults to the name of |
Value
If for_each
is not supplied, a ggplot object
showing a heatmap of counts for each x
-y
combination, optionally
faceted by stratify_by
. If for_each
is supplied, a named list
of such ggplot
objects, one per unique value of for_each
.
See Also
plot_counts_1D
, plot_binned_counts
,
plot_counts_with_mean
, ggplot
Examples
data("toy_data")
# Heatmap of counts by age and period
plot_counts_2D(toy_data, x = age, y = period)
# Heatmap of counts by age and period, stratified by education
plot_counts_2D(toy_data, x = period, y = age,
stratify_by = education)
# Heatmap of counts by age and period, stratified by education, for each sex
plot_counts_2D(toy_data, x = period, y = age,
stratify_by = education, for_each = sex)
Plot heatmap of observation counts with mean overlay, optionally stratified
Description
Computes counts of observations for each combination of two variables and displays them as a heatmap, with an overlaid line showing the mean of the second variable across the first. If a stratification variable is provided, observations are counted per strata and strata-specific heatmaps are displayed in individual panels. If an additional stratification variable is provided, separate plot windows are created for each level.
Usage
plot_counts_with_mean(
data,
x,
y,
stratify_by = NULL,
for_each = NULL,
title = NULL,
subtitle = NULL,
heatmap_legend = "Count",
mean_legend = "Mean",
viridis_color_option = "D",
mean_color = "coral"
)
Arguments
data |
Data frame containing all input variables. |
x |
Variable in |
y |
Variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, counts and means are computed for each level of |
for_each |
(Optional) Additional stratification variable.
If supplied, separate plot windows are created per level of |
title |
(Optional) Plot title; defaults to |
subtitle |
(Optional) Plot subtitle; defaults to |
heatmap_legend |
(Optional) Label for the heatmap legend; defaults to "Count". |
mean_legend |
(Optional) Label for the overlay mean line legend; defaults to "Mean". |
viridis_color_option |
(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself. |
mean_color |
(Optional) Color for the overlay mean line; defaults to "coral". |
Value
If for_each
is not supplied, a ggplot
object showing a
heatmap of counts with a mean overlay line, optionally faceted by stratify_by
.
If for_each
is supplied, a named list of such plots.
See Also
plot_counts_1D
, plot_counts_2D
,
plot_binned_counts
, ggplot
Examples
data("toy_data")
# Heatmap of counts by age vs. period with mean age overlay
plot_counts_with_mean(toy_data, x = period, y = age)
# Heatmap of counts by age vs. period with mean age overlay, stratified by education
plot_counts_with_mean(toy_data, x = period, y = age,
stratify_by = education)
# Heatmap of counts by age vs. period with mean age overlay, stratified by education, for each sex
plot_counts_with_mean(toy_data, x = period, y = age,
stratify_by = education, for_each = sex)
Plot Linear Combinations of Age-Period-Cohort Effects by Strata
Description
Generates ggplot2 line plots of estimated linear combinations for age, period, and/or cohort effects from an INLA fit, stratified by a factor. Returns a named list of ggplot objects for each requested effect.
Usage
plot_lincombs(
inla_fit,
apc_model,
data,
strata_col,
reference_level,
family = NULL,
age_ind = "age",
period_ind = "period",
cohort_ind = "cohort",
age_title = NULL,
period_title = NULL,
cohort_title = NULL,
y_lab = NULL,
age_vals = NULL,
period_vals = NULL,
cohort_vals = NULL,
age_breaks = NULL,
age_limits = NULL,
period_breaks = NULL,
period_limits = NULL,
cohort_breaks = NULL,
cohort_limits = NULL,
PDF_export = FALSE
)
Arguments
inla_fit |
An object returned by the |
apc_model |
Character string indicating the configuration of shared vs. stratum-specific time effects in the model. |
data |
The data frame used to fit |
strata_col |
Character name of the factor column in |
reference_level |
Character value of |
family |
Optional character; if |
age_ind |
Character name of the age variable in |
period_ind |
Character name of the period variable in |
cohort_ind |
Character name of the cohort variable in |
age_title |
Optional plot title for the age effect. |
period_title |
Optional plot title for the period effect. |
cohort_title |
Optional plot title for the cohort effect. |
y_lab |
Optional y-axis label; if |
age_vals |
Optional numeric vector of x-values for age; defaults to
|
period_vals |
Optional numeric vector of x-values for period; defaults to
|
cohort_vals |
Optional numeric vector of x-values for cohort; defaults to
|
age_breaks |
Optional vector of breaks for the age plot x-axis. |
age_limits |
Optional numeric vector of length 2 giving x-axis limits for age. |
period_breaks |
Optional vector of breaks for the period plot x-axis. |
period_limits |
Optional numeric vector of length 2 giving x-axis limits for period. |
cohort_breaks |
Optional vector of breaks for the cohort plot x-axis. |
cohort_limits |
Optional numeric vector of length 2 giving x-axis limits for cohort. |
PDF_export |
Logical; if |
Value
A named list of ggplot
objects. Elements are
"age"
, "period"
, and/or "cohort"
depending on apc_model
.
Examples
if (requireNamespace("INLA", quietly = TRUE)) {
# Load toy dataset
data("toy_data")
# Filter away unobserved cohorts (see plot_missing_data() function):
require(dplyr)
toy_data.f <- toy_data %>% filter(sex == "female") %>% subset(cohort > 1931)
# Load precomputed 'mapc' object
apC_fit.f <- readRDS(system.file("extdata", "quickstart-apC_fit_f.rds", package = "MAPCtools"))
# Extract INLA object:
apC_fit.inla <- apC_fit.f$model_fit
apC_plots <- plot_lincombs(
inla_fit = apC_fit.inla,
apc_model = "apC",
data = toy_data.f,
strata_col = "education",
reference_level = "1",
family = "poisson",
)
# Display the age effect plot
print(apC_plots$age)
# Display the period effect plot
print(apC_plots$period)
}
Plot mean of a response variable across a single variable, optionally stratified
Description
Computes the mean of a specified response variable at each value of a specified x variable and displays a line plot using ggplot2. If a stratification variable is provided, means are calculated per strata and plotted as separate colored lines. If an additional stratification variable is provided, separate plot windows are created for each level.
Usage
plot_mean_response_1D(
data,
response,
x,
stratify_by = NULL,
for_each = NULL,
title = NULL,
subtitle = NULL,
legend_title = NULL,
x_lab = NULL,
y_lab = NULL,
viridis_color_option = "D"
)
Arguments
data |
A |
response |
A numeric variable in |
x |
A variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, counts are computed for each combination of |
for_each |
(Optional) Additional stratification variable. If supplied,
separate plot windows are created per level of |
title |
(Optional) Plot title; defaults to |
subtitle |
(Optional) Plot subtitle; defaults to |
legend_title |
(Optional) Legend title; defaults to name of |
x_lab |
(Optional) Label for the x-axis; defaults to the name of |
y_lab |
(Optional) Label for the y-axis; defaults to the name of |
viridis_color_option |
(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself. |
Value
A ggplot
object displaying the mean of the response across the variable supplied in x
,
optionally stratified by stratify_by
. If for_each
is supplied, separate plots are created in separate windows for each level.
Visuals can be modified with ggplot2.
See Also
Examples
data("toy_data")
# Mean by age
plot_mean_response_1D(toy_data, response = count, x = age)
# Mean count by age, stratified by education
plot_mean_response_1D(toy_data, response = count, x = age,
stratify_by = education)
# Mean count by age, stratified by education, for each sex
plot_mean_response_1D(toy_data, response = count, x = age,
stratify_by = education, for_each = sex)
Plot mean of a response variable across two dimensions, optionally stratified
Description
Computes the mean of a specified response variable for each combination of two variables and displays it as a heatmap using ggplot2. If a stratification variable is provided, means are calculated per strata and strata-specific heatmaps are displayed in individual panels. If an additional stratification variable is provided, separate plot windows are created for each level.
Usage
plot_mean_response_2D(
data,
response,
x,
y,
stratify_by = NULL,
for_each = NULL,
color_gradient = c("blue", "beige", "red"),
title = NULL,
subtitle = NULL,
x_lab = NULL,
y_lab = NULL
)
Arguments
data |
Data frame containing all input variables. |
response |
Numeric variable in |
x |
Variable in |
y |
Variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, means are computed for each combination of |
for_each |
(Optional) Additional stratification variable.
If supplied, separate plot windows are created per level of |
color_gradient |
(Optional) Color gradient for the heatmap. Specified as a character vector of three colors, representing: c(<low_counts>, <middle_counts>, <high_counts>).
Defaults to |
title |
Plot title; defaults to |
subtitle |
(Optional) Plot subtitle; defaults to |
x_lab |
(Optional) Label for the x-axis; defaults to the name of |
y_lab |
(Optional) Label for the y-axis; defaults to the name of |
Value
A ggplot object showing the mean of response
across x
and y
, optionally faceted by facet_row
and/or facet_col
.
See Also
Examples
data("toy_data")
# Mean count by age and period
plot_mean_response_2D(toy_data, response = count, x = period, y = age)
# Mean count by age and period, stratified by education level
plot_mean_response_2D(toy_data, response = count, x = period, y = age,
stratify_by = education)
# Mean count by age and period, stratified by education level, for each sex
plot_mean_response_2D(toy_data, response = count, x = period, y = age,
stratify_by = education, for_each = sex)
Plot Missing Group Combinations
Description
Creates a tile plot highlighting combinations of grouping variables that are expected but missing from the data. Allows for faceting.
Usage
plot_missing_data(
data,
x,
y,
stratify_by = NULL,
for_each = NULL,
facet_labeller = NULL,
title = "Missing data",
subtitle = NULL,
x_lab = NULL,
y_lab = NULL
)
Arguments
data |
Data frame. |
x |
Variable in |
y |
Variable in |
stratify_by |
(Optional) Stratification variable. If
supplied, missing data is examined separately for each leves of |
for_each |
(Optional) Additional stratification variable. If supplied,
separate plot windows are created per level of |
facet_labeller |
A |
title |
Character string for the plot title. Defaults to "Missing data". |
subtitle |
Character string for the plot subtitle. Defaults to NULL. |
x_lab |
Character string for the x-axis label. Defaults to the name of |
y_lab |
Character string for the y-axis label. Defaults to the name of |
Value
A ggplot object, or NULL if no missing combinations found.
See Also
Examples
data("toy_data")
# Plot missing data across age and period, stratified by education, for each sex
plot_missing_data (data = toy_data,
x = period,
y = age,
stratify_by = education,
for_each = sex)
Make a plot of model fit scores
Description
Makes a plot summarizing the model fit scores of a set of models, and ranks the models based on their score.
Usage
plot_model_selection_criteria(scores, title = NULL, alphabetic = FALSE)
Arguments
scores |
Named list of model fit scores. Names correspond to the model. |
title |
(Optional) Plot title; defaults to the name of |
alphabetic |
(Optional) TRUE/FALSE, indicating if models are to be sorted alphabetically along the horizontal axis. Defaults to |
Details
The function assumes lower scores are preferable, so the models with the lowest scores are ranked higher.
Value
Plot summarizing the model fit scores, and ranking the models based on their scores.
Plot time effects with uncertainty ribbons
Description
Internal plotting helper to visualize median differences over time (or other x-axis), including HPD interval ribbons and stratified lines.
Usage
plot_time_effect(
data,
x_lab,
y_lab,
family,
color_palette,
plot_theme,
legend_title = "Strata"
)
Arguments
data |
A data frame with columns: x, median_differences, hpd_lower, hpd_upper, Strata. |
x_lab |
Label for the x-axis. |
y_lab |
Label for the y-axis. |
family |
A string indicating model family ("binomial", "poisson", or other). |
color_palette |
A named vector of colors for strata. |
plot_theme |
A ggplot2 theme object. |
legend_title |
Title for the legend. |
Value
A ggplot object.
Helper for evaluating column names, strings, or self-contained vectors
Description
Accepts:
Unquoted column names (tidy-eval style)
Quoted column names (as strings)
Self-contained vectors
Usage
resolve_column(arg)
Arguments
arg |
A column name (unquoted or string) or a vector. |
Value
A quosure, symbol, or literal vector depending on input.
Synthetic Age-Period-Cohort Dataset
Description
A toy dataset generated to illustrate modeling of age, period, and cohort effects, including interactions with education and sex. This data simulates count outcomes (e.g., disease incidence or event counts) as a function of demographic variables using a Poisson process.
Usage
data(toy_data)
Format
A data frame with 10000 rows and 7 variables:
- age
Age of individuals, sampled uniformly from 20 to 59.
- period
Calendar year of observation, sampled uniformly from 1990 to 2019.
- education
Factor for education level, with levels 1, 2 and 3.
- sex
Factor indicating biological sex, with levels: "male", "female".
- count
Simulated event count, generated from a Poisson distribution.
- known_rate
The true Poisson rate used to generate
count
, computed from the log-linear model.- cohort
Derived variable indicating year of birth (period - age).
Details
The underlying event rate is modeled on the log scale as a linear combination of age, period, sex, education, and an age-education interaction. The count outcome is drawn from a Poisson distribution with this rate. This dataset is handy for testing APC models.
The true log-rate is computed (for observation n
) as:
\log(\lambda_n)
= \beta_0
+ \beta_{\text{period}}\,\bigl(2020 - \text{period}_n\bigr)
+ \beta_{\text{sex}}\,I(\text{sex}_n = \text{female}) \\[6pt]
\quad
+ \beta_{\text{edu}}\,(\text{edu level}_n)
+ \beta_{\text{edu-age}}\,(\text{age}_n - 20)\,(\text{edu level}_n - 1)\,I(\text{age}_n \le 40) \\[6pt]
\quad
+ \beta_{\text{edu-age}}\,(60 - \text{age}_n)\,(\text{edu level}_n - 1)\,I(\text{age}_n > 40)
where the rate decreases over time (periods), increases with age up to age 40, and decreases after. The coefficients used are:
-
intercept = 1.0
-
b_period = 0.02
-
b_sex = 0.5
(female effect) -
b_education_base = 0.5
-
b_education_age_interaction = 0.015
Source
Simulated data, created using base R and tibble.
Validate lincomb terms against an INLA formula
Description
Checks that all variable names used in inla.make.lincomb()
expressions
(inside a list of lincombs) are present in the provided INLA model formula.
Usage
validate_lincombs_against_formula(lincombs, formula)
Arguments
lincombs |
A list of linear combinations (as generated by |
formula |
The INLA model formula object (e.g., from |
Value
Invisible TRUE if all terms match. Otherwise, stops with an informative error.