Title: Multivariate Age-Period-Cohort (MAPC) Modeling for Health Data
Version: 0.1.0
Description: Bayesian multivariate age-period-cohort (MAPC) models for analyzing health data, with support for model fitting, visualization, stratification, and model comparison. Inference focuses on identifiable cross-strata differences, as described by Riebler and Held (2010) <doi:10.1093/biostatistics/kxp037>. Methods for handling complex survey data via the 'survey' package are included, as described in Mercer et al. (2014) <doi:10.1016/j.spasta.2013.12.001>.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.2
VignetteBuilder: knitr
Imports: dplyr, tidyselect, fastDummies, stringr, rlang, tidyr, ggplot2, viridis, scales, purrr, grid, gridExtra, ggpubr, tibble, survey
URL: https://github.com/LarsVatten/MAPCtools
BugReports: https://github.com/LarsVatten/MAPCtools/issues
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown, INLA
Additional_repositories: https://inla.r-inla-download.org/R/stable/
Config/testthat/edition: 3
Depends: R (≥ 3.5)
NeedsCompilation: no
Packaged: 2025-06-23 15:06:36 UTC; lavat
Author: Lars Vatten [aut, cre]
Maintainer: Lars Vatten <lavatt99@gmail.com>
Repository: CRAN
Date/Publication: 2025-06-25 15:40:02 UTC

Aggregate binomial data

Description

Aggregates binomial data using sufficient statistics for binomial samples. For a sample \boldsymbol{y} = \{y_1, \dots, y_n\} with y_i \sim \text{Bin}(n, p), the sample is aggregated into the sufficient statistic

s = \sum_{i=1}^n y_i.

Usage

abinomial(data)

Arguments

data

Binomial data vector.

Value

Aggregated binomial data.


Add 1-indexed age, period and cohort indices via match()

Description

Add 1-indexed age, period and cohort indices via match()

Usage

add_APC_by_match(df, age_name, period_name, age_order, period_order, M = 1)

Arguments

df

Data frame

age_name

Name of the age (or age-group) column (string).

period_name

Name of the period (e.g. year) column (string).

age_order

Character vector giving the desired ordering of age levels

period_order

Vector (numeric or character) giving the desired ordering of periods

M

Grid factor, defined as the ratio of age interval width to period interval width.

Value

         Data frame with new columns: age_index, period_index, cohort_index

Add cohort column to data frame

Description

Adds a column for birth cohorts to a data frame, derived from specified age and period columns through the relation cohort = period - age.

Usage

add_cohort_column(data, age, period, cohort_name = "cohort")

Arguments

data

Data frame with age and period column.

age

Age column in data.

period

Period column in data.

cohort_name

Name of the cohort column to be created. Defaults to "cohort".

Value

Data frame with additional column for birth cohorts added.


Add cohort column to data frame

Description

Adds a column for cohort indices to a data frame, derived from specified age and period index columns through the relationship cohort index = period index - age index + max(age index).

Usage

add_cohort_index(data, age_index, period_index, cohort_name = "cohort_index")

Arguments

data

Data frame with age and period columns.

age_index

Age index column in data.

period_index

Period index column in data.

cohort_name

Name of the cohort index column to be created. Defaults to "cohort_index".

Value

Data frame with additional column for cohort indices.


Aggregate Gaussian data

Description

Aggregates Gaussian data using sufficient statistics for Gaussian samples. For a sample \boldsymbol{y} = \{y_1, \dots, y_n\} with y_i \sim \mathcal{N(\mu, (s_i \tau)^{-1})}, i=1, \dots, n, the sample is aggregated into the sufficient statistic

(v, \frac{1}{2} \sum_{i=1}^n \log(s_i), m, n, \bar{y}),

with

m = \sum_{i=1}^n s_i \quad \bar{y} = \frac{1}{m} \sum{i=1}^n s_iy_i \quad v = \frac{1}{m} \sum_{i=1}^n s_i y_i^2 - \bar{y}^2.

For a short derivation of the sufficient statistic, attach the INLA package (library(INLA)) and run inla.doc("agaussian").

Usage

agaussian(data, precision.scale = NULL)

Arguments

data

Gaussian data, must be a numeric vector.

precision.scale

Scales for the precision of each Gaussian observation.
Defaults to vector of 1s (no scaling).

Value

Aggregated Gaussian data, in an inla.mdata object, which is compatible with the agaussian family in INLA.


Aggregate data across an entire data frame using sufficient statistics

Description

Aggregates specified columns of a data frame into summarizing statistics, preserving the potentially complex structure returned by aggregator functions (like data frames or inla.mdata objects) within list-columns. Aggregation is performed according to sufficient statistics for the specified distribution of the columns. Possible distributions: Gaussian, binomial. This function aggregates the entire data frame into a single row result.

Usage

aggregate_df(
  data,
  gaussian = NULL,
  gaussian.precision.scales = NULL,
  binomial = NULL
)

Arguments

data

A data frame.

gaussian

Gaussian columns in data to be aggregated. The Gaussian observations are collapsed into an inla.mdata object compatible with the agaussian family, see the documentation for the agaussian family in INLA for details. Defaults to NULL (optional).

gaussian.precision.scales

Scales for the precision of Gaussian observations.
Must be one of:

  • NULL: Use default scales of 1 for all observations in all gaussian columns.

  • A single numeric vector: Applied only if exactly one column is specified in gaussian. Length must match nrow(data).

  • A named list: Where names(gaussian.precision.scales) are the names of the Gaussian columns (must match columns specified in gaussian). Each list element must be a numeric vector of scales for that column, with length matching nrow(data).
    Defaults to NULL (optional).

binomial

Binomial columns in data to be aggregated. Defaults to NULL (optional).

Value

A single-row data frame (tibble) containing:


Aggregate grouped data using aggregate_df

Description

Aggregates a grouped data frame into summarizing statistics within groups by applying the aggregate_df function to each group. Aggregation is performed according to sufficient statistics for the specified distribution of the columns to be aggregated.

Usage

aggregate_grouped_df(
  data,
  by,
  gaussian = NULL,
  gaussian.precision.scales = NULL,
  binomial = NULL
)

Arguments

data

Data frame to be grouped and aggregated.

by

Columns in data for which to group data by.

gaussian

Gaussian columns in grouped_data to be aggregated.
Defaults to NULL (optional).

gaussian.precision.scales

Scales for the precision of Gaussian observations.
See aggregate_df documentation for format details, and agaussian in INLA for more details. Defaults to NULL.

binomial

Binomial columns in grouped_data to be aggregated.
Defaults to NULL (optional).

Value

Aggregated data frame (tibble), with one row per group, containing grouping variables, count n per group, and aggregated list-columns for specified variables as returned by aggregate_df.


Aggregate multinomial data. Used in aggregate_df.

Description

Aggregates multinomial data into sufficient statistics for multinomial samples.
Converts input data to character before processing. For a sample \boldsymbol{y} = \{y_1, \dots, y_n\} with y_i \in \{1, \dots, K\}, P(y_i = k) = p_k, k=1, \dots, K, the sample is aggregated into the sufficient statistic

\boldsymbol{s} = (s_1, \dots, s_{K-1})

where

s_k = \sum_{i=1}^n \mathbb{I}(y_i = k) for k = 1, \dots, K-1.

(The last category is omitted due to the sum-to-one constraint)

Usage

amultinomial(data, col_name, all_categories = NULL)

Arguments

data

A vector containing the multinomial observations (will be coerced to character).

col_name

A character string giving the name of the column (primarily for context/error messages, less critical now).

all_categories

A character vector with the names or levels of all possible categories in the multinomial distribution (must include all observed values after coercion to character).

Value

A one-row data frame containing counts for each of the first K - 1 categories.


Create NA structure across age, period and cohort groups based on strata

Description

Creates a data frame where age, period, and cohort values are placed into columns specific to their stratum (defined by stratify_var), with other strata combinations marked as NA. This structure is often useful for specific modeling approaches, like certain Age-Period-Cohort (APC) models. Optionally includes unique indices for random effects.

Usage

as.APC.NA.df(data, stratify_by, age, period, cohort, include.random = FALSE)

Arguments

data

Data frame with age, period, cohort, and stratification columns.

stratify_by

Stratification variable column. This column will be used to create the stratum-specific NA structure. It should ideally be a factor or character vector.

age

Age column in data (must be a numeric/integer column).

period

Name of the period column (must be a numeric/integer column).

cohort

Name of the cohort column (must be a numeric/integer column).

include.random

Logical. Whether to include a unique index ('random') for each combination of age, period, and stratum, potentially for use as random effect identifiers in models. Defaults to FALSE.

Value

A data frame containing the original age, period, cohort, and stratify_by columns, plus:


Add 1-indexed APC columns to data frame, handling numeric or categorical age/period

Description

Add 1-indexed APC columns to data frame, handling numeric or categorical age/period

Usage

as.APC.df(data, age, period, age_order = NULL, period_order = NULL, M = 1)

Arguments

data

Data frame with age and period columns.

age

Age column in data.

period

Period column in data.

age_order

(Optional) Character vector giving the desired order of age levels. If NULL and the age column is factor/character, uses unique(sort(data[[age]])).

period_order

(Optional) Vector (numeric or character) giving the desired order of periods. If NULL and period column is a factor/character, uses unique(sort(data[[period]])).

M

Grid factor, defined as the ratio of age interval width to period interval width. Defaults to 1 (i.e. assuming equal sized age and period increments).

Value

           The data frame with new columns \code{age_index}, \code{period_index}, \code{cohort_index},
                   and sorted by \code{(age_index, period_index)}.

Aggregate binomial data

Description

Aggregates binomial data into sufficient statistics for binomial samples. Uses abinomial.

Usage

binomial_aggregator(data, col_name)

Arguments

data

Data frame with binomial data column.

col_name

Binomial data column.

Value

Aggregated binomial data column.


Check if a set of columns is missing from a data frame. For use in aggregate_df.

Description

Check if a set of columns is missing from a data frame. For use in aggregate_df.

Usage

check_cols_exist(cols, df)

Arguments

cols

String or vector of strings that is the name of the columns.

df

Data frame

Value

Nothing. Casts an error message if any of the columns are missing.


Clamp a numeric value within bounds

Description

Internal helper to restrict values between a lower and upper bound.

Usage

clamp(x, lower, upper)

Arguments

x

A numeric vector.

lower

Lower bound.

upper

Upper bound.

Value

A numeric vector with values clamped.


Compute dynamic pretty breaks for continuous x-axis

Description

Internal helper to compute breaks on a numeric scale, ensuring the number of breaks is in a desired range and aligned with data bounds.

Usage

dynamic_pretty_breaks(x, target_n = 8, max_breaks = 12, min_breaks = 3)

Arguments

x

A numeric vector.

target_n

Target number of breaks.

max_breaks

Maximum allowed number of breaks.

min_breaks

Minimum allowed number of breaks.

Value

A numeric vector of break points.


Compute dynamic pretty breaks for discrete x-axis

Description

Internal helper to select a well-spaced subset of factor levels or unique strings for axis labeling on a discrete scale.

Usage

dynamic_pretty_discrete_breaks(
  x,
  target_n = 8,
  max_breaks = 12,
  min_breaks = 3
)

Arguments

x

A factor or character vector.

target_n

Target number of breaks.

max_breaks

Maximum allowed number of breaks.

min_breaks

Minimum allowed number of breaks.

Value

A character vector of selected breaks.


Find expected groups based on distinct values across a set of variables

Description

Given a data frame and a set of discrete (or factor) variables, returns all combinations of their observed levels and the list of levels.

Usage

expected_groups(data, vars)

Arguments

data

A data frame whose columns you want to examine.

vars

Character vector of column names in data to use.

Value

A named list with two elements:

grid

A data.frame where each row is one combination of the variable levels (equivalent to what expand.grid would produce).

levels

A named list; for each variable in vars it gives the sorted unique values (or factor levels) observed in data.


Fit a multivariable age-period-cohort model

Description

Fit a Bayesian multivariate age-period-cohort model, and obtain posteriors for identifiable cross-strata contrasts. The method is based on Riebler and Held (2010) doi:10.1093/biostatistics/kxp037. For handling complex survey data, we follow Mercer et al. (2014) doi:10.1016/j.spasta.2013.12.001, implemented using the survey package.

Usage

fit_MAPC(
  data,
  response,
  family,
  apc_format,
  stratify_by,
  reference_strata = NULL,
  age,
  period,
  grid.factor = 1,
  apc_prior = "rw1",
  extra.fixed = NULL,
  extra.random = NULL,
  extra.models = NULL,
  extra.hyper = NULL,
  include.random = FALSE,
  binomial.n = NULL,
  poisson.offset = NULL,
  inla_formula = NULL,
  lincombs = NULL,
  survey.design = NULL,
  apc_hyperprior = NULL,
  control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE),
  verbose = FALSE
)

Arguments

data

A data frame containing the age, period, response, and stratification variables. Age and period are assumed to be on the raw scale, not transformed to 1-indexed index columns. Factor/character columns are handled, as long as they are properly sorted by sort(unique(data$age/period)) (e.g. values of the form "20-25" for age groups are handled).

response

A string naming the response (outcome) variable in data.

family

A string indicating the likelihood family. The default is "gaussian" with identity link. See names(inla.models()$likelihood) for a list of possible alternatives and use inla.doc() for detailed docs for individual families.

apc_format

A specification of the APC structure, with options:

APc

Shared age and period effects, stratum-specific cohort effects.

ApC

Shared age and cohort effects, stratum-specific period effects.

aPC

Shared period and cohort effects, stratum-specific age effects.

Apc

Shared age effects, stratum-specific period and cohort effects.

aPc

Shared period effects, stratum-specific age and cohort effects.

apC

Shared cohort effects, stratum-specific age and period effects.

Note: It is also possible to specify models with only one or two time effects, by omitting the letters corresponding to the time effects to be excluded.

stratify_by

A string naming the column in data to use for stratification (e.g. region or sex).

reference_strata

Level of stratify_by to set as the reference level.

age

Name of the age variable in data.

period

Name of the period variable in data.

grid.factor

(Optional) Grid factor, defined as the ratio of age interval width to period interval width; defaults to 1.

apc_prior

(Optional) A string specifying the prior for the age, period, and cohort effects (e.g. "rw1", "rw2"). Defaults to "rw1".

extra.fixed

(Optional) If desired, the user can specify additional fixed effects to be added. This is passed as a character argument, specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names. Defaults to NULL.

extra.random

(Optional) If desired, the user can specify additional random effects to be added. This is passed as a character argument, specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names. Defaults to NULL.

extra.models

(Optional) If the user specifies one or more additional random effects to be added in extra.random, this argument can be used to specify the model to be used for the additional random effects. Either passed as a single string, in which case all extra random effects are assigned the same model, or a character vector matching the length of extra.ranom, mapping unique models to each variable in extra.random. If NULL and extra.random is non-empty, all extra random effects are assigned the "iid" model in inla(). Defaults to NULL.

extra.hyper

(Optional) If the user specifies one or more additional random effects to be added in extra.random, this argument can be used to specify the priors of the hyperparameters of the models used for the random effects. The hyperpriors are specified as strings that can be passed directly to the hyper=... argument in the formula passed to the inla()-function. See the argument apc_prior below for a concrete example. Defaults to NULL, in which case the default INLA priors are used.

include.random

(Optional) Logical; if TRUE, include an overall random effect in the APC model, to capture unobserved heterogeneity. Defaults to FALSE.

binomial.n

(Optional) For the family=binomial likelihood. Either an integer giving the number of trials for the binomial response, or the variable in data containing the number of trials for each observation.

poisson.offset

(Optional) For the family=poisson likelihood. Either an integer giving the denominator for the Poisson count response, or the variable in data containing the denominator for each observation.

inla_formula

(Optional) If desired, the user can pass its own INLA-compatible formula to define the model. If not, a formula is generated automatically, with the models and priors defined.

lincombs

(Optional) If desired, the user can pass its own INLA-compatible linear combinations to be computed by the inla program. See the inla()-function or f()-function documentations in INLA for details.

survey.design

(Optional) In the case of complex survey data, explicit handling of unequal sampling probabilities can be required. The user can pass a survey.design object created with the svydesign function from the survey package. In this case, a Gaussian model is fit for the survey adjusted estimates, based on the asymptotic normality of Hájek estimator. The argument family should still indicate the underlying distribution of the response, and based on this, an appropriate transformation is applied to the adjusted mean estimates.

apc_hyperprior

(Optional) If the user wants non-default hyperpriors for the time effects, this can be achieved by passing the entire prior specification as a string. If e.g. hyper = list(theta = list(prior="pc.prec", param=c(0.5,0.01))) is desired, pass the string "list(theta = list(prior="pc.prec", param=c(0.5,0.01)))" to this argument.

control.compute

(Optional) A list of control variables passed to the inla()-function, that specifies what to be computed during model fitting. See options for control.compute in the INLA docs. Defaults to list(dic=TRUE, waic=TRUE, cpo=TRUE). If posterior sampling is desired, config=TRUE must be passed as a control option inside control.compute.

verbose

(Optional) This is argument is passed along to the inla() function that estimates the MAPC model. If verbose=TRUE, the inla-program runs in verbose mode, which can provide more informative error messages.

Details

This function works as a wrapper around the inla()-function from the INLA package, which executes the model fitting procedures using Integrated Neste Laplace Approximations.

The returned object is of class mapc. S3 methods are available for:

Value

An named list, containing the following arguments:

model_fit

An object of class "inla", containing posterior densities, posterior summaries, measures of model fit etc. See documentation for the inla()-function for details.

plots

A named list of plots for each time effect. Extract them as plots\$age/plots\$periodplots\$cohort.

References

Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319-392. doi:10.1111/j.1467-9868.2008.00700.x See also https://www.r-inla.org for more information about the INLA method and software.

See Also

fit_all_MAPC for fitting multiple models at once, and the function inla() from the INLA package for the estimation machinery. For complex survey data, see svydesign for the creation of a survey design object which can be passed to survey.design.

Examples


data("toy_data")
fit <- fit_MAPC(
  data               = toy_data,
  response           = count,
  family             = "poisson",
  apc_format         = "ApC",
  stratify_by        = education,
  reference_strata   = 1,
  age                = age,
  period             = period
)

# Print concise summary of the MAPC fit and the estimation procedure
print(fit)

# Plot estimated cross-strata contrast trends
plot(fit)

# Optional: view full summary of the model (can be long)
# summary(fit)




Fit all configurations of MAPC models using INLA

Description

Fits all configurations of shared vs. stratum-specific time effects:

APc

Shared age and period effects, stratum-specific cohort effects.

ApC

Shared age and cohort effects, stratum-specific period effects.

aPC

Shared period and cohort effects, stratum-specific age effects.

Apc

Shared age effects, stratum-specific period and cohort effects.

aPc

Shared period effects, stratum-specific age and cohort effects.

apC

Shared cohort effects, stratum-specific age and period effects.

Uses the fit_MAPC function. The multivariate APC model is based on Riebler and Held (2010) doi:10.1093/biostatistics/kxp037. For handling complex survey data, we follow Mercer et al. (2014) doi:10.1016/j.spasta.2013.12.001, implemented using the survey package.

Usage

fit_all_MAPC(
  data,
  response,
  family,
  stratify_by,
  reference_strata = NULL,
  age = "age",
  period = "period",
  grid.factor = 1,
  all_models = c("apC", "aPc", "Apc", "aPC", "ApC", "APc"),
  extra.fixed = NULL,
  extra.random = NULL,
  extra.models = NULL,
  extra.hyper = NULL,
  apc_prior = "rw1",
  include.random = FALSE,
  binomial.n = NULL,
  poisson.offset = NULL,
  apc_hyperprior = NULL,
  survey.design = NULL,
  control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE),
  track.progress = FALSE,
  verbose = FALSE
)

Arguments

data

A data frame containing the age, period, response, and stratification variables. Age and period are assumed to be on the raw scale, not transformed to 1-indexed index columns. Factor/character columns are handled, as long as they are properly sorted by sort(unique(data$age/period)) (e.g. values of the form "20-25" for age groups are handled).

response

A string naming the response (outcome) variable in data.

family

A string indicating the likelihood family. The default is "gaussian" with identity link.

stratify_by

The column in data to use for stratification.

reference_strata

Level of stratify_by to set as the reference level.

age

The age column in data.

period

The period column in data.

grid.factor

(Optional) Grid factor, defined as the ratio of age interval width to period interval width; defaults to 1.

all_models

(Optional) Character vectors of valid APC-formats (e.g. c("ApC", "apC", "APc")), specifying the MAPC models to be estimated. Requirements for a valid APC-format (lowercase letter means stratum-specific, uppercase means shared): - Only one time effect: shared/stratum-specific both fine. - Two time effects: shared/stratum-specific both fine. - Three time effects: either one or two must be stratum-specific. Defaults to c("apC", "aPc", "Apc", "aPC", "ApC", "APc").

extra.fixed

(Optional) If desired, the user can specify additional fixed effects to be added. This is passed as a character argument, specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names. Defaults to NULL.

extra.random

(Optional) If desired, the user can specify additional random effects to be added. This is passed as a character argument, specifying the name of the variable to be added. Multiple variables can be added by passing a character vector of names. Defaults to NULL.

extra.models

(Optional) If the user specifies one or more additional random effects to be added in extra.random, this argument can be used to specify the model to be used for the additional random effects. Either passed as a single string, in which case all extra random effects are assigned the same model, or a character vector matching the length of extra.ranom, mapping unique models to each variable in extra.random. If NULL and extra.random is non-empty, all extra random effects are assigned the "iid" model in inla(). Defaults to NULL.

extra.hyper

(Optional) If the user specifies one or more additional random effects to be added in extra.random, this argument can be used to specify the priors of the hyperparameters of the models used for the random effects. The hyperpriors are specified as strings that can be passed directly to the hyper=... argument in the formula passed to the inla()-function. See the argument apc_prior below for a concrete example. Defaults to NULL, in which case the default INLA priors are used.

apc_prior

(Optional) A string specifying the prior for the age, period, and cohort effects (e.g. "rw1", "rw2"). Defaults to "rw1".

include.random

(Optional) Logical; if TRUE, include an overall random effect in the APC model. Defaults to FALSE.

binomial.n

(Optional) For the family=binomial likelihood. Either an integer giving the number of trials for the binomial response, or the name of the column containing the number of trials for each observation.

poisson.offset

(Optional) For the family=poisson likelihood. Either an integer giving the denominator for the Poisson count response, or the name of the column containing the denominator for each observation.

apc_hyperprior

(Optional) If the user wants non-default hyperpriors for the time effects, this can be achieved by passing the entire prior specification as a string. If e.g. hyper = list(theta = list(prior="pc.prec", param=c(0.5,0.01))) is desired, pass the string "list(theta = list(prior="pc.prec", param=c(0.5,0.01)))" to this argument.

survey.design

(Optional) In the case of complex survey data, explicit handling of unequal sampling probabilities can be required. The user can pass a survey.design object created with the svydesign function from the survey package. In this case, a Gaussian model is fit for the survey adjusted estimates, based on the asymptotic normality of Hájek estimator. The argument family should still indicate the underlying distribution of the response, and based on this, an appropriate transformation is applied to the adjusted mean estimates.

control.compute

(Optional) A list of control variables passed to the inla()-function, that specifies what to be computed during model fitting. See options for control.compute in the INLA docs. Defaults to list(dic=TRUE, waic=TRUE, cpo=TRUE). If posterior sampling is desired, config=TRUE must be passed as a control option inside control.compute.

track.progress

(Optional) Whether to report progress of the estimation of models in the console; defaults to FALSE.

verbose

(Optional) This is argument is passed along to the inla() function that estimates the MAPC model. If verbose=TRUE, the inla-program runs in verbose mode, which can provide more informative error messages.

Details

The returned object is of class all_mapc, which is a container for multiple mapc model fits (each typically fitted with a different APC formats). It also contains a model_selection element, which holds plots summarizing comparative fit metrics (DIC, WAIC and log-scores).

The following S3 methods are available:

These methods are intended to streamline multi-model workflows and allow quick comparison of results across model specifications.

Value

A named list of mapc objects, one for each configuration of shared vs. stratum-specific time effects: APc, ApC, aPC, Apc, aPc, apC.

References

Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319-392. doi:10.1111/j.1467-9868.2008.00700.x See also https://www.r-inla.org for more information about the INLA method and software.

See Also

fit_MAPC for fitting a single model (more flexible; can pass your own formula and lincombs), and the function inla() from the INLA package for the estimation machinery. For complex survey data, see svydesign for the creation of a survey design object which can be passed to survey.design.

Examples


data("toy_data")
fits <- fit_all_MAPC(
  data               = toy_data,
  response           = count,
  family             = "poisson",
  stratify_by        = education,
  reference_strata   = 1,
  age                = age,
  period             = period,
  apc_prior          = "rw2",
  include.random     = TRUE
)

# Print concise summary of the models and estimation procedure
print(fits)

# Plot comparison plots, based on comparative fit metrics
plot(fits)

# Optional: view full summary of all models (can be long)
# summary(fits)



Aggregate Gaussian data

Description

Aggregates numerical data frame column using sufficient statistics for Gaussian samples, into an inla.mdata object compatible with theagaussian likelihood in INLA. Uses agaussian.

Usage

gaussian_aggregator(y, precision.scale = NULL)

Arguments

y

Gaussian column.

precision.scale

Scales for the precision of each Gaussian observation.
Defaults to vector of 1s (no scaling).

Value

Aggregated Gaussian column as an inla.mdata object.


Generate MAPC formula for INLA

Description

Based on APC-format, generate the proper formula to pass to INLA for fitting MAPC models.

Usage

generate_MAPC_formula(
  df,
  APC_format,
  response,
  stratify_var,
  age = "age",
  period = "period",
  cohort = "cohort",
  intercept = FALSE,
  apc_prior = "rw1",
  apc_hyper = NULL,
  random_term = TRUE,
  extra.fixed = NULL,
  extra.random = NULL,
  extra.models = NULL,
  extra.hyper = NULL
)

Arguments

df

Data frame for which MAPC models should be fit

APC_format

A string where lower-case letters indicate stratum-specific time effects and upper-case letters indicate shared time effects.

response

A string, name of the column in df that represents the response variable.

stratify_var

Stratification variable. At least one time effect should be stratum-specific, and at least one should be shared.

age

Name of age column

period

Name of period column

cohort

Name of cohort column

intercept

Boolean, indicating if an overall intercept should be included in the formula.
Defaults to TRUE (optional).

apc_prior

Which prior model to use for the time effects.
Defaults to "rw1" (optional).

apc_hyper

If the user wants non-default hyperpriors for the random time effects, this can be achieved by passing the entire prior specification as a string. If e.g. hyper = list(theta = list(prior="pc.prec", param=c(0.5,0.01))) is desired, pass the string "list(theta = list(prior="pc.prec", param=c(0.5,0.01)))" to this argument.

random_term

Indicator, indicating if a random term should be included in the model.
Defaults to TRUE (optional).

extra.fixed

Name of additional fixed effects.
Defaults to NULL (optional).

extra.random

Name of additional random effects.
Defaults to NULL (optional).

extra.models

Models for additional random effects. Supported INLA models include 'iid', 'rw1' and 'rw2'.
Defaults to NULL (optional).

extra.hyper

If the user wants non-default hyperpriors for the additional random effects, this can be achieved by passing the entire prior specification as a string. If e.g. hyper = list(theta = list(prior="pc.prec", param=c(0.5,0.01))) is desired, pass the string "list(theta = list(prior="pc.prec", param=c(0.5,0.01)))" to this argument.

Value

A formula object that can be passed to INLA to fit the desired MAPC model.


Generate Age-Period-Cohort Linear Combinations for INLA

Description

Constructs a set of linear combinations (contrasts) for age, period, and/or cohort effects across different strata, relative to a specified reference strata, suitable for use with inla.make.lincomb from the INLA package.

Usage

generate_apc_lincombs(
  apc_format,
  data,
  strata,
  reference_strata,
  age = "age",
  period = "period",
  cohort = "cohort"
)

Arguments

apc_format

Character string containing any combination of "a", "p", "c":

"a"

include age contrasts

"p"

include period contrasts

"c"

include cohort contrasts

e.g. "ap" to generate age and period contrasts only.

data

A data.frame containing the variables specified by age, period, cohort, and strata. The age, period, and cohort variables must be integer-valued (or coercible to integer).

strata

String giving the name of the factor column in data that defines strata.

reference_strata

String indicating which level of strata should be used as the reference.

age

String name of the column in data containing age indices (default "age").

period

String name of the column in data containing period indices (default "period").

cohort

String name of the column in data containing cohort indices (default "cohort").

Details

For each specified dimension (a, p, c), the function loops over all unique values of age, period, or cohort in the data, and over all strata levels except the reference. It then constructs a contrast that subtracts the effect in the reference stratum from the effect in the other strata at each index.

Value

A named list of linear combination objects as returned by inla.make.lincomb() (INLA function). Each element corresponds to one contrast, with names of the form “Age = x, Strata = y vs ref”, “Period = x, Strata = y vs ref”, or “Cohort = x, Strata = y vs ref”, depending on apc_format.


Function for finding longest consecutive run of non-missing indices

Description

Function for finding longest consecutive run of non-missing indices

Usage

longest_run(x)

Arguments

x

A vector of indices for which to find the longest consecutive run of indices

Value

A named list, containing arguments:

from

The first index of the longest run.

to

The last index of the longest run.

step

The stepsize of the longest run.


Make a table of model fit scores

Description

Generates a table summarizing the provided model fit scores.

Usage

model_selection_criteria_table(
  dic_scores = NULL,
  waic_scores = NULL,
  log_scores = NULL
)

Arguments

dic_scores

Named list of DIC scores. Names correspond to the model.

waic_scores

Named list of WAIC scores. Names correspond to the model.

log_scores

Named list of log-scores scores (derived from CPO scores). Names correspond to the model.

Value

A table summarizing the model fit scores, as a ggplot object.


Aggregate multinomial data (No changes needed here)

Description

Aggregates multinomial data into sufficient statistics for multinomial samples. Uses amultinomial.

Usage

multinomial_aggregator(df, col_name, all_categories = NULL)

Arguments

df

Data frame with multinomial data column.

col_name

Name of column with multinomial data.

all_categories

Character vector of names of categories in multinomial distribution.

Value

Aggregated multinomial data column (a one-row data frame).


Count number of groups across a set of variables in a data frame

Description

Counts number of groups across specified grouping and stratification variables in a data frame. At least one grouping or stratification variable must be provided.

Usage

number_of_groups(df, group_by)

Arguments

df

A data frame with grouping and/or stratification variables.

group_by

Variables in data frame that defines a grouping of the data.

Value

Number of distinct groups and strata in the data frame.


Count number of groups across a set of variables in a data frame

Description

Counts number of groups across specified grouping and stratification variables in a data frame. At least one grouping or stratification variable must be provided.

Usage

number_of_strata(df, stratify_by)

Arguments

df

A data frame with grouping and/or stratification variables.

stratify_by

Variables in data frame that defines a stratification of the data.

Value

Number of distinct strata in the data frame.


Plot counts of observations across bins of a numeric variable, optionally stratified

Description

Bins a specified numeric variable into intervals, counts observations per value of a specified variable and bin groups, and plots lines for each bin group using ggplot2. If a stratification variable is provided, counts are calculated per strata and plotted as separate colored lines. If an additional stratification variable is provided, separate plot windows are created for each level.

Usage

plot_binned_counts(
  data,
  x,
  bin_by,
  stratify_by = NULL,
  for_each = NULL,
  n_bins = 8,
  bin_width = NULL,
  title = "Observation counts",
  subtitle = NULL,
  legend_title = NULL,
  x_lab = NULL,
  y_lab = NULL,
  viridis_color_option = "D"
)

Arguments

data

Data frame containing all input variables.

x

Variable in data whose values define the x-axis for counts.

bin_by

Numeric variable in data for which to bin observations.

stratify_by

(Optional) Stratification variable. If supplied, counts are computed for each combination of x, bin of bin_by and stratify_by, and separate panels are made for each level of stratify_by.

for_each

(Optional) Additional stratification variable. If supplied, separate plot windows are created per level of for_each.

n_bins

(Optional) Number of bins to create across bin_by; defaults to 8.

bin_width

(Optional) Width of the bins created across bin_by; defaults to NULL. Overrides n_bins if both are supplied.

title

(Optional) Plot title; defaults to "Observation counts".

subtitle

(Optional) Plot subtitle; defaults to NULL if for for_each is NULL, defaults to <name of for_each>: <level of for_each> for each plot window if for_each is supplied.

legend_title

(Optional) Legend title; defaults to name of bin_by + "bin".

x_lab

(Optional) Label for the x-axis; defaults to the name of x.

y_lab

(Optional) Label for the y-axis; defaults to the name of y.

viridis_color_option

(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself.

Value

If for_each is not supplied, a ggplot object showing counts per x and bin groups, optionally faceted by stratify_by. If for_each is supplied, a named list of such plots.

See Also

plot_counts_1D, plot_counts_2D, plot_counts_with_mean, ggplot

Examples

data("toy_data")
# Counts by period, binned by age
plot_binned_counts(toy_data, x = period,
                   bin_by = age, n_bins = 4)
# Counts by period, binned by age, stratified by education levels
plot_binned_counts(toy_data, period,
                   bin_by = age, n_bins = 4,
                   stratify_by = education)
# Counts by period, binned by age, stratified by education levels, for each sex
plot_binned_counts(toy_data, period,
                   bin_by = age, n_bins = 4,
                   stratify_by = education, for_each = sex)

Plot counts of observations across a single variable, optionally stratified

Description

Computes the number of observations at each value of a specified variable and creates a line plot of these counts using ggplot2. If a stratification variable is provided, counts are calculated per strata and plotted as separate colored lines. If an additional stratification variable is provided, separate plot windows are created for each level.

Usage

plot_counts_1D(
  data,
  x,
  stratify_by = NULL,
  for_each = NULL,
  title = "Observation counts",
  subtitle = NULL,
  legend_title = NULL,
  x_lab = NULL,
  y_lab = NULL,
  viridis_color_option = "D"
)

Arguments

data

Data frame containing all input variables.

x

Variable in data whose values define the x-axis for counts.

stratify_by

(Optional) Stratification variable. If supplied, counts are computed for each combination of x and stratify_by, and separate lines are drawn per level of stratify_by.

for_each

(Optional) Additional stratification variable. If supplied, separate plot windows are created per level of for_each.

title

(Optional) Plot title; defaults to "Observation counts".

subtitle

(Optional) Plot subtitle; defaults to NULL if for for_each is NULL, defaults to <name of for_each>: <level of for_each> for each plot window if for_each is supplied.

legend_title

(Optional) Legend title; defaults to name of stratify_var if it is supplied.

x_lab

(Optional) Label for the x-axis; defaults to the name of x.

y_lab

(Optional) Label for the y-axis; defaults to the name of y.

viridis_color_option

(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself.

Value

A ggplot object displaying counts across the variable supplied in x, optionally stratified by stratify_by. If for_each is supplied, separate plots are created in separate windows for each level. Visuals can be modified with ggplot2.

See Also

plot_counts_2D, plot_binned_counts, plot_counts_with_mean, ggplot

Examples

data("toy_data")
# Counts by age
plot_counts_1D(toy_data, x = age)
# Counts by age, stratified by education level
plot_counts_1D(toy_data, x = age,
               stratify_by = education)
# Count by age, stratified by education level, for each sex
plot_counts_1D(toy_data, x = age,
               stratify_by = education, for_each = sex)

Plot counts of observations across two dimensions, optionally stratified

Description

Computes the number of observations for each combination of two specified variables, and displays the result as a heatmap using ggplot2. If a stratification variable is provided, counts are calculated per strata and strata-specific heatmaps are displayed in individual panels. If an additional stratification variable is provided, separate plot windows are created for each level.

Usage

plot_counts_2D(
  data,
  x,
  y,
  stratify_by = NULL,
  for_each = NULL,
  color_gradient = c("blue", "beige", "red"),
  title = "Observation counts",
  legend_title = NULL,
  subtitle = NULL,
  x_lab = NULL,
  y_lab = NULL
)

Arguments

data

Data frame containing all input variables.

x

Variable in data whose values define the x-axis for counts.

y

Variable in data whose values define the y-axis for counts.

stratify_by

(Optional) Stratification variable. If supplied, counts are computed for each combination of x, y and stratify_by, and separate heatmaps are generated per level of stratify_by.

for_each

(Optional) Additional stratification variable. If supplied, separate plot windows are created per level of for_each.

color_gradient

(Optional) Color gradient for the heatmap. Specified as a character vector of three colors, representing: c(<low_counts>, <middle_counts>, <high_counts>). Defaults to c("blue", "beige", "red"). Colors must be recognized by ggplot.

title

(Optional) Plot title; defaults to "Observation counts".

legend_title

(Optional) Legend title for color gradient; defaults to "Count".

subtitle

(Optional) Plot subtitle; defaults to NULL if for for_each is NULL, defaults to <name of for_each>: <level of for_each> for each plot window if for_each is supplied.

x_lab

(Optional) Label for the x-axis; defaults to the name of x.

y_lab

(Optional) Label for the y-axis; defaults to the name of y.

Value

If for_each is not supplied, a ggplot object showing a heatmap of counts for each x-y combination, optionally faceted by stratify_by. If for_each is supplied, a named list of such ggplot objects, one per unique value of for_each.

See Also

plot_counts_1D, plot_binned_counts, plot_counts_with_mean, ggplot

Examples

data("toy_data")
# Heatmap of counts by age and period
plot_counts_2D(toy_data, x = age, y = period)
# Heatmap of counts by age and period, stratified by education
plot_counts_2D(toy_data, x = period, y = age,
               stratify_by = education)
# Heatmap of counts by age and period, stratified by education, for each sex
plot_counts_2D(toy_data, x = period, y = age,
               stratify_by = education, for_each = sex)

Plot heatmap of observation counts with mean overlay, optionally stratified

Description

Computes counts of observations for each combination of two variables and displays them as a heatmap, with an overlaid line showing the mean of the second variable across the first. If a stratification variable is provided, observations are counted per strata and strata-specific heatmaps are displayed in individual panels. If an additional stratification variable is provided, separate plot windows are created for each level.

Usage

plot_counts_with_mean(
  data,
  x,
  y,
  stratify_by = NULL,
  for_each = NULL,
  title = NULL,
  subtitle = NULL,
  heatmap_legend = "Count",
  mean_legend = "Mean",
  viridis_color_option = "D",
  mean_color = "coral"
)

Arguments

data

Data frame containing all input variables.

x

Variable in data whose values define the x-axis for counts.

y

Variable in data whose values define the y-axis for counts, and for which the mean is computed for each value for x.

stratify_by

(Optional) Stratification variable. If supplied, counts and means are computed for each level of stratify_by, and separate heatmaps are generated per level of stratify_by.

for_each

(Optional) Additional stratification variable. If supplied, separate plot windows are created per level of for_each.

title

(Optional) Plot title; defaults to NULL, in which case a title of the form "\<Y\> distribution across \<X\>" is used.

subtitle

(Optional) Plot subtitle; defaults to NULL, or to "\<for_each\>: \<level\>" when for_each is supplied.

heatmap_legend

(Optional) Label for the heatmap legend; defaults to "Count".

mean_legend

(Optional) Label for the overlay mean line legend; defaults to "Mean".

viridis_color_option

(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself.

mean_color

(Optional) Color for the overlay mean line; defaults to "coral".

Value

If for_each is not supplied, a ggplot object showing a heatmap of counts with a mean overlay line, optionally faceted by stratify_by. If for_each is supplied, a named list of such plots.

See Also

plot_counts_1D, plot_counts_2D, plot_binned_counts, ggplot

Examples

data("toy_data")
# Heatmap of counts by age vs. period with mean age overlay
plot_counts_with_mean(toy_data, x = period, y = age)
# Heatmap of counts by age vs. period with mean age overlay, stratified by education
plot_counts_with_mean(toy_data, x = period, y = age,
                      stratify_by = education)
# Heatmap of counts by age vs. period with mean age overlay, stratified by education, for each sex
plot_counts_with_mean(toy_data, x = period, y = age,
                      stratify_by = education, for_each = sex)

Plot Linear Combinations of Age-Period-Cohort Effects by Strata

Description

Generates ggplot2 line plots of estimated linear combinations for age, period, and/or cohort effects from an INLA fit, stratified by a factor. Returns a named list of ggplot objects for each requested effect.

Usage

plot_lincombs(
  inla_fit,
  apc_model,
  data,
  strata_col,
  reference_level,
  family = NULL,
  age_ind = "age",
  period_ind = "period",
  cohort_ind = "cohort",
  age_title = NULL,
  period_title = NULL,
  cohort_title = NULL,
  y_lab = NULL,
  age_vals = NULL,
  period_vals = NULL,
  cohort_vals = NULL,
  age_breaks = NULL,
  age_limits = NULL,
  period_breaks = NULL,
  period_limits = NULL,
  cohort_breaks = NULL,
  cohort_limits = NULL,
  PDF_export = FALSE
)

Arguments

inla_fit

An object returned by the inla()-function, containing the data frame summary.lincomb.derived, which holds the posterior summaries of the cross strata contrasts from the MAPC model. This function assumes that the rownames of the linear combinations are of the specific format produced by generate_apc_lincombs.

apc_model

Character string indicating the configuration of shared vs. stratum-specific time effects in the model.

data

The data frame used to fit inla_fit, containing columns for age, period, cohort, and the stratification variable.

strata_col

Character name of the factor column in data defining strata.

reference_level

Character value of strata_col to use as the reference.

family

Optional character; if NULL, y_lab defaults to "Mean differences". If "gaussian", same; if "poisson", "Log mean ratio"; if "binomial", "Log odds ratio".

age_ind

Character name of the age variable in data (default "age").

period_ind

Character name of the period variable in data (default "period").

cohort_ind

Character name of the cohort variable in data (default "cohort").

age_title

Optional plot title for the age effect.

period_title

Optional plot title for the period effect.

cohort_title

Optional plot title for the cohort effect.

y_lab

Optional y-axis label; if NULL, set according to family.

age_vals

Optional numeric vector of x-values for age; defaults to min(data\$age):max(data\$age).

period_vals

Optional numeric vector of x-values for period; defaults to min(data\$period):max(data\$period).

cohort_vals

Optional numeric vector of x-values for cohort; defaults to min(data\$cohort):max(data\$cohort).

age_breaks

Optional vector of breaks for the age plot x-axis.

age_limits

Optional numeric vector of length 2 giving x-axis limits for age.

period_breaks

Optional vector of breaks for the period plot x-axis.

period_limits

Optional numeric vector of length 2 giving x-axis limits for period.

cohort_breaks

Optional vector of breaks for the cohort plot x-axis.

cohort_limits

Optional numeric vector of length 2 giving x-axis limits for cohort.

PDF_export

Logical; if TRUE, use larger font sizes/layout for PDF output.

Value

A named list of ggplot objects. Elements are "age", "period", and/or "cohort" depending on apc_model.

Examples

if (requireNamespace("INLA", quietly = TRUE)) {
  # Load toy dataset
  data("toy_data")

  # Filter away unobserved cohorts (see plot_missing_data() function):
  require(dplyr)
  toy_data.f <- toy_data %>% filter(sex == "female") %>% subset(cohort > 1931)

  # Load precomputed 'mapc' object
  apC_fit.f <- readRDS(system.file("extdata", "quickstart-apC_fit_f.rds", package = "MAPCtools"))

  # Extract INLA object:
  apC_fit.inla <- apC_fit.f$model_fit
  apC_plots <- plot_lincombs(
    inla_fit    = apC_fit.inla,
    apc_model   = "apC",
    data        = toy_data.f,
    strata_col  = "education",
    reference_level = "1",
    family      = "poisson",

  )
  # Display the age effect plot
  print(apC_plots$age)
  # Display the period effect plot
  print(apC_plots$period)
}


Plot mean of a response variable across a single variable, optionally stratified

Description

Computes the mean of a specified response variable at each value of a specified x variable and displays a line plot using ggplot2. If a stratification variable is provided, means are calculated per strata and plotted as separate colored lines. If an additional stratification variable is provided, separate plot windows are created for each level.

Usage

plot_mean_response_1D(
  data,
  response,
  x,
  stratify_by = NULL,
  for_each = NULL,
  title = NULL,
  subtitle = NULL,
  legend_title = NULL,
  x_lab = NULL,
  y_lab = NULL,
  viridis_color_option = "D"
)

Arguments

data

A data.frame or tibble containing the dataset.

response

A numeric variable in data whose mean will be plotted.

x

A variable in data defining the x-axis for computing means.

stratify_by

(Optional) Stratification variable. If supplied, counts are computed for each combination of x and stratify_by, and separate lines are drawn per level of stratify_by.

for_each

(Optional) Additional stratification variable. If supplied, separate plot windows are created per level of for_each.

title

(Optional) Plot title; defaults to "Observation counts".

subtitle

(Optional) Plot subtitle; defaults to NULL if for for_each is NULL, defaults to <name of for_each>: <level of for_each> for each plot window if for_each is supplied.

legend_title

(Optional) Legend title; defaults to name of stratify_var if it is supplied.

x_lab

(Optional) Label for the x-axis; defaults to the name of x.

y_lab

(Optional) Label for the y-axis; defaults to the name of paste("Mean", <response_name>).

viridis_color_option

(Optional) Option for color gradient; defaults to "D". Options are "A", "B", "C", "D", E", "F", "G", "H". See viridis for information, or experiment yourself.

Value

A ggplot object displaying the mean of the response across the variable supplied in x, optionally stratified by stratify_by. If for_each is supplied, separate plots are created in separate windows for each level. Visuals can be modified with ggplot2.

See Also

plot_mean_response_2D, ggplot

Examples

data("toy_data")
# Mean by age
plot_mean_response_1D(toy_data, response = count, x = age)
# Mean count by age, stratified by education
plot_mean_response_1D(toy_data, response = count, x = age,
                      stratify_by = education)
# Mean count by age, stratified by education, for each sex
plot_mean_response_1D(toy_data, response = count, x = age,
                      stratify_by = education, for_each = sex)

Plot mean of a response variable across two dimensions, optionally stratified

Description

Computes the mean of a specified response variable for each combination of two variables and displays it as a heatmap using ggplot2. If a stratification variable is provided, means are calculated per strata and strata-specific heatmaps are displayed in individual panels. If an additional stratification variable is provided, separate plot windows are created for each level.

Usage

plot_mean_response_2D(
  data,
  response,
  x,
  y,
  stratify_by = NULL,
  for_each = NULL,
  color_gradient = c("blue", "beige", "red"),
  title = NULL,
  subtitle = NULL,
  x_lab = NULL,
  y_lab = NULL
)

Arguments

data

Data frame containing all input variables.

response

Numeric variable in data whose mean to compute.

x

Variable in data for the horizontal axis.

y

Variable in data for the vertical axis.

stratify_by

(Optional) Stratification variable. If supplied, means are computed for each combination of x, y and stratify_by, and separate heatmaps are generated per level of stratify_by.

for_each

(Optional) Additional stratification variable. If supplied, separate plot windows are created per level of for_each.

color_gradient

(Optional) Color gradient for the heatmap. Specified as a character vector of three colors, representing: c(<low_counts>, <middle_counts>, <high_counts>). Defaults to c("blue", "beige", "red"). Colors must be recognized by ggplot.

title

Plot title; defaults to NULL, in which case a title of the form "Mean <response>" is used.

subtitle

(Optional) Plot subtitle; defaults to NULL if for for_each is NULL, defaults to <name of for_each>: <level of for_each> for each plot window if for_each is supplied.

x_lab

(Optional) Label for the x-axis; defaults to the name of x.

y_lab

(Optional) Label for the y-axis; defaults to the name of y.

Value

A ggplot object showing the mean of response across x and y, optionally faceted by facet_row and/or facet_col.

See Also

plot_mean_response_1D, ggplot

Examples

data("toy_data")
# Mean count by age and period
plot_mean_response_2D(toy_data, response = count, x = period, y = age)
# Mean count by age and period, stratified by education level
plot_mean_response_2D(toy_data, response = count, x = period, y = age,
                      stratify_by = education)
# Mean count by age and period, stratified by education level, for each sex
plot_mean_response_2D(toy_data, response = count, x = period, y = age,
                      stratify_by = education, for_each = sex)

Plot Missing Group Combinations

Description

Creates a tile plot highlighting combinations of grouping variables that are expected but missing from the data. Allows for faceting.

Usage

plot_missing_data(
  data,
  x,
  y,
  stratify_by = NULL,
  for_each = NULL,
  facet_labeller = NULL,
  title = "Missing data",
  subtitle = NULL,
  x_lab = NULL,
  y_lab = NULL
)

Arguments

data

Data frame.

x

Variable in data whose values define the x-axis.

y

Variable in data whose values define the y-axis.

stratify_by

(Optional) Stratification variable. If supplied, missing data is examined separately for each leves of stratify_by, and each level gets its own panel.

for_each

(Optional) Additional stratification variable. If supplied, separate plot windows are created per level of for_each.

facet_labeller

A labeller function (e.g. labeller), or a named list where names match facet variables and values are named vectors/lists mapping levels to labels (optional).

title

Character string for the plot title. Defaults to "Missing data".

subtitle

Character string for the plot subtitle. Defaults to NULL.

x_lab

Character string for the x-axis label. Defaults to the name of x_var.

y_lab

Character string for the y-axis label. Defaults to the name of y_var.

Value

A ggplot object, or NULL if no missing combinations found.

See Also

ggplot

Examples

data("toy_data")

# Plot missing data across age and period, stratified by education, for each sex
plot_missing_data (data        = toy_data,
                   x           = period,
                   y           = age,
                   stratify_by = education,
                   for_each    = sex)



Make a plot of model fit scores

Description

Makes a plot summarizing the model fit scores of a set of models, and ranks the models based on their score.

Usage

plot_model_selection_criteria(scores, title = NULL, alphabetic = FALSE)

Arguments

scores

Named list of model fit scores. Names correspond to the model.

title

(Optional) Plot title; defaults to the name of scores.

alphabetic

(Optional) TRUE/FALSE, indicating if models are to be sorted alphabetically along the horizontal axis. Defaults to FALSE.

Details

The function assumes lower scores are preferable, so the models with the lowest scores are ranked higher.

Value

Plot summarizing the model fit scores, and ranking the models based on their scores.


Plot time effects with uncertainty ribbons

Description

Internal plotting helper to visualize median differences over time (or other x-axis), including HPD interval ribbons and stratified lines.

Usage

plot_time_effect(
  data,
  x_lab,
  y_lab,
  family,
  color_palette,
  plot_theme,
  legend_title = "Strata"
)

Arguments

data

A data frame with columns: x, median_differences, hpd_lower, hpd_upper, Strata.

x_lab

Label for the x-axis.

y_lab

Label for the y-axis.

family

A string indicating model family ("binomial", "poisson", or other).

color_palette

A named vector of colors for strata.

plot_theme

A ggplot2 theme object.

legend_title

Title for the legend.

Value

A ggplot object.


Helper for evaluating column names, strings, or self-contained vectors

Description

Accepts:

Usage

resolve_column(arg)

Arguments

arg

A column name (unquoted or string) or a vector.

Value

A quosure, symbol, or literal vector depending on input.


Synthetic Age-Period-Cohort Dataset

Description

A toy dataset generated to illustrate modeling of age, period, and cohort effects, including interactions with education and sex. This data simulates count outcomes (e.g., disease incidence or event counts) as a function of demographic variables using a Poisson process.

Usage

data(toy_data)

Format

A data frame with 10000 rows and 7 variables:

age

Age of individuals, sampled uniformly from 20 to 59.

period

Calendar year of observation, sampled uniformly from 1990 to 2019.

education

Factor for education level, with levels 1, 2 and 3.

sex

Factor indicating biological sex, with levels: "male", "female".

count

Simulated event count, generated from a Poisson distribution.

known_rate

The true Poisson rate used to generate count, computed from the log-linear model.

cohort

Derived variable indicating year of birth (period - age).

Details

The underlying event rate is modeled on the log scale as a linear combination of age, period, sex, education, and an age-education interaction. The count outcome is drawn from a Poisson distribution with this rate. This dataset is handy for testing APC models.

The true log-rate is computed (for observation n) as:

\log(\lambda_n) = \beta_0 + \beta_{\text{period}}\,\bigl(2020 - \text{period}_n\bigr) + \beta_{\text{sex}}\,I(\text{sex}_n = \text{female}) \\[6pt] \quad + \beta_{\text{edu}}\,(\text{edu level}_n) + \beta_{\text{edu-age}}\,(\text{age}_n - 20)\,(\text{edu level}_n - 1)\,I(\text{age}_n \le 40) \\[6pt] \quad + \beta_{\text{edu-age}}\,(60 - \text{age}_n)\,(\text{edu level}_n - 1)\,I(\text{age}_n > 40)

where the rate decreases over time (periods), increases with age up to age 40, and decreases after. The coefficients used are:

Source

Simulated data, created using base R and tibble.


Validate lincomb terms against an INLA formula

Description

Checks that all variable names used in inla.make.lincomb() expressions (inside a list of lincombs) are present in the provided INLA model formula.

Usage

validate_lincombs_against_formula(lincombs, formula)

Arguments

lincombs

A list of linear combinations (as generated by generate_apc_lincombs()).

formula

The INLA model formula object (e.g., from generate_MAPC_formula()).

Value

Invisible TRUE if all terms match. Otherwise, stops with an informative error.