Title: | Routines for Descriptive and Model-Based APC Analysis |
Version: | 1.0.8 |
Maintainer: | Alexander Bauer <baueralexander@posteo.de> |
Description: | Age-Period-Cohort (APC) analyses are used to differentiate relevant drivers for long-term developments. The 'APCtools' package offers visualization techniques and general routines to simplify the workflow of an APC analysis. Sophisticated functions are available both for descriptive and regression model-based analyses. For the former, we use density (or ridgeline) matrices and (hexagonally binned) heatmaps as innovative visualization techniques building on the concept of Lexis diagrams. Model-based analyses build on the separation of the temporal dimensions based on generalized additive models, where a tensor product interaction surface (usually between age and period) is utilized to represent the third dimension (usually cohort) on its diagonal. Such tensor product surfaces can also be estimated while accounting for further covariates in the regression model. See Weigert et al. (2021) <doi:10.1177/1354816620987198> for methodological details. |
License: | MIT + file LICENSE |
URL: | https://bauer-alex.github.io/APCtools/ |
BugReports: | https://github.com/bauer-alex/APCtools/issues |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | ggpubr, checkmate, knitr, ggplot2, colorspace, dplyr, mgcv, scales, tidyr, stringr |
Suggests: | testthat (≥ 3.0.0), rmarkdown, covr |
VignetteBuilder: | knitr |
Depends: | R (≥ 3.5) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-06-18 19:08:36 UTC; alex |
Author: | Alexander Bauer |
Repository: | CRAN |
Date/Publication: | 2025-06-18 20:20:06 UTC |
Internal helper to calculate the (group-specific) density of a variable
Description
Internal helper function that is called in plot_density
to
calculate the density of a metric variable. If plot_density
is called
from within plot_densityMatrix
(i.e., when some of the columns
c("age_group","period_group","cohort_group")
are part of the dataset,
the density is computed individually for all respective APC groups.
Usage
calc_density(dat, y_var, weights_var = NULL, ...)
Arguments
dat |
Dataset with columns |
y_var |
Character name of the main variable to be plotted. |
weights_var |
Optional character name of a weights variable used to project the results in the sample to some population. |
... |
Additional arguments passed to |
Value
Dataset with the calculated densities.
Internal function to capitalize the first letter of a character
Description
Internal helper function to capitalize the first letter of a character value. The use case is to create a plot label like 'Age' from a variable name like 'age'.
Usage
capitalize_firstLetter(char)
Arguments
char |
Character value whose first letter should be capitalized |
Internal helper to compute marginal APC effects and their confidence intervals
Description
Internal helper function to add lower and upper confidence boundaries pointwise
Usage
compute_marginalAPCeffects(dat, model, variable, plot_CI = FALSE)
Arguments
dat |
Dataset containing predicted effects for a grid of all APC dimensions and covariates used in the model. |
model |
|
variable |
One of |
plot_CI |
Indicator if 95% confidence intervals for marginal APC effects should be computed. Defaults to FALSE. |
Details
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effect.
Internal helper to tilt the x-axis for the hexamap plot
Description
Internal helper function to be called in plot_APChexamap
,
to tilt the x-axis for the hexamap plot.
Usage
compute_xCoordinate(period_vec)
Arguments
period_vec |
Numeric vector of period values. |
Internal helper to tilt the x-axis for the hexamap plot
Description
Internal helper function to be called in plot_APChexamap
,
to tilt the x-axis for the hexamap plot.
Usage
compute_yCoordinate(period_vec, age_vec)
Arguments
period_vec |
Numeric vector of period values. |
age_vec |
Numeric vector of age values. |
Create a summary table for multiple estimated GAM models
Description
Create a table to summarize the overall effect strengths of the age, period
and cohort effects for models fitted with gam
or
bam
. The output format can be adjusted by passing
arguments to kable
via the ...
argument.
Usage
create_APCsummary(
model_list,
dat,
digits = 2,
apc_range = NULL,
kable = TRUE,
...
)
Arguments
model_list |
A list of regression models estimated with
|
dat |
Dataset with columns |
digits |
Number of digits for numeric columns. Defaults to 2. |
apc_range |
Optional list with one or multiple elements with names |
kable |
Should the output be a table in kable style? Defaults to
|
... |
Optional additional arguments passed to |
Details
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effect.
Value
Table created with kable
.
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de
Examples
library(APCtools)
library(mgcv)
data(travel)
# create the summary table for one model
model_pure <- gam(mainTrip_distance ~ te(age, period), data = travel)
create_APCsummary(model_pure, dat = travel)
# create the summary table for multiple models
model_cov <- gam(mainTrip_distance ~ te(age, period) + s(household_income),
data = travel)
model_list <- list("pure model" = model_pure,
"covariate model" = model_cov)
create_APCsummary(model_list, dat = travel)
Internal helper to create a group variable as base for a density matrix
Description
Internal helper function to create a group variable based on the
categorization of either age, period or cohort. To be called from within
plot_densityMatrix
.
Usage
create_groupVariable(dat, APC_var, groups_list)
Arguments
dat |
Dataset with a column |
APC_var |
One of |
groups_list |
A list with each element specifying the borders of one
row or column in the density matrix. E.g., if the period should be visualized
in decade columns from 1980 to 2009, specify
|
Value
Vector for the grouping that can be added as additional column to the data.
Internal helper to create a dataset for ggplot2 to highlight diagonals
Description
Internal helper function to create a dataset for ggplot2
that can
be used to highlight specific diagonals in a density matrix.
Usage
create_highlightDiagonalData(dat, highlight_diagonals)
Arguments
dat |
Dataset with columns |
highlight_diagonals |
Optional internal parameter which is only
specified when |
Create model summary tables for multiple estimated GAM models
Description
Create publication-ready summary tables of all linear and nonlinear effects
for models fitted with gam
or bam
.
The output format of the tables can be adjusted by passing arguments to
kable
via the ...
argument.
Usage
create_modelSummary(
model_list,
digits = 2,
method_expTransform = "simple",
...
)
Arguments
model_list |
list of APC models |
digits |
number of displayed digits |
method_expTransform |
One of |
... |
additional arguments to |
Details
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effects.
The table for linear coefficients includes the estimated coefficient
(coef
), the corresponding standard error (se
), lower and upper
limits of 95% confidence intervals (CI_lower
, CI_upper
) and
the p-values for all coefficients apart from the intercept.
The table for nonlinear coefficients include the estimated degrees of freedom
(edf
) and the p-value for each estimate.
Value
List of tables created with kable
.
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de
Examples
library(APCtools)
library(mgcv)
data(travel)
model <- gam(mainTrip_distance ~ te(age, period) + residence_region +
household_size + s(household_income), data = travel)
create_modelSummary(list(model), dat = travel)
Internal helper to create a summary table for one estimated GAM model
Description
Internal helper function to be called in create_APCsummary
.
This function creates the summary table for one model estimated with
gam
or bam
.
Usage
create_oneAPCsummaryTable(model, dat, apc_range = NULL)
Arguments
model |
Optional regression model estimated with |
dat |
Dataset with columns |
apc_range |
Optional list with one or multiple elements with names
|
Value
data.frame
containing aggregated information on the
individual effects.
Drug deaths of white men in the United States
Description
Dataset on the number of unintentional drug overdose deaths in the United States for each age group between 1999 and 2019, retrieved from the CDC WONDER Online Database. The data only cover white men.
Usage
data(drug_deaths)
Format
A dataframe containing
- period
Calendar year
- age
Age group.
- deaths
Number of observed unintentional drug overdose deaths in the respective age group and calendar year.
- population
Number of white men in the respective age group and calendar year in the U.S. population.
- mortality_rate
Drug overdose mortality rate for the respective age group and calendar year, reported as the number of deaths per 100,000 people. Calculated as
100000 * deaths / population
.
Details
The data were exported from the CDC WONDER Online Database (see link in references down below), based on the following settings:
Group by Year and by Single-Year Ages
Demographics: Gender Male; Ethnicity White
Cause of death: Drug / Alcohol Induced Causes. Then select the more specific category Drug poisonings (overdose) Unintentional (X40-X44).
References
Jalal, H., & Burke, D. S. (2020). Hexamaps for Age-Period-Cohort Data Visualization and Implementation in R. Epidemiology (Cambridge, Mass.), 31(6), e47. doi:10.1097/EDE.0000000000001236.
Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999-2019 on CDC WONDER Online Database, released in 2020. Data are from the Multiple Cause of Death Files, 1999-2019, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at wonder.cdc.gov/ucd-icd10.html on 18 June 2025.
Internal helper for gg_addReferenceLines to keep diagonal lines in the plot range
Description
Internal helper function to be called from within
gg_addReferenceLines
. This function takes the dataset prepared
for adding diagonal reference lines in the plot, checks if some diagonals
exceed the plot limits, cuts them accordingly, if necessary, and again
returns the corrected dataset.
Usage
ensure_segmentsInPlotRange(dat_segments, plot_dat)
Arguments
dat_segments |
Dataset containing information on the diagonal reference lines. |
plot_dat |
Dataset used for creating the heatmap. |
Internal helper to extract summary of linear effects in a gam model
Description
Internal helper function to create a data.frame
containing the linear
effects summary of a model fitted with gam
or
bam
.
Usage
extract_summary_linearEffects(model, method_expTransform = "simple")
Arguments
model |
|
method_expTransform |
One of |
Details
If the model was estimated with a log or logit link, the function
automatically performs an exponential transformation of the effect,
see argument method_expTransform
.
Extract returned values of plot.gam() while suppressing creation of the plot
Description
Internal helper function to extract the values returned of
plot.gam
while suppressing creation of the plot.
Usage
get_plotGAMobject(model)
Arguments
model |
Internal helper to add reference lines in an APC heatmap
Description
Internal helper function to add reference lines in an APC heatmap
(vertically, horizontally or diagonally). The function takes an existing list
of ggplot objects, adds the specified reference lines in each plot and
returns the edited ggplot list again. To be called from within
plot_APCheatmap
.
Usage
gg_addReferenceLines(
gg_list,
dimensions,
plot_dat,
markLines_list,
markLines_displayLabels
)
Arguments
gg_list |
Existing list of ggplot objects where the reference lines should be marked in each individual ggplot. |
dimensions |
Character vector specifying the two APC dimensions that
should be visualized along the x-axis and y-axis. Defaults to
|
plot_dat |
Dataset used for creating the heatmap. |
markLines_list |
Optional list that can be used to highlight the borders
of specific age groups, time intervals or cohorts. Each element must be a
numeric vector of values where horizontal, vertical or diagonal lines should
be drawn (depends on which APC dimension is displayed on which axis).
The list can maximally have three elements and must have names out of
|
markLines_displayLabels |
Optional character vector defining for which
dimensions the lines defined through |
Internal helper to add the diagonal highlighting to a ggplot
Description
Internal helper function to highlight diagonals in a density matrix. The function takes an existing ggplot object, adds the diagonal highlighting and returns the edited ggplot object again.
Usage
gg_highlightDiagonals(gg, dat, dat_highlightDiagonals)
Arguments
gg |
Existing ggplot object to which the diagonal highlighting should be added. |
dat |
Dataset with columns |
dat_highlightDiagonals |
Dataset created by
|
Plot 1D smooth effects for gam
models
Description
Plots 1D smooth effects for a GAM model fitted with gam
or bam
.
Usage
plot_1Dsmooth(
model,
plot_ci = TRUE,
select,
alpha = 0.05,
ylim = NULL,
method_expTransform = "simple",
return_plotData = FALSE
)
Arguments
model |
|
plot_ci |
If |
select |
Index of smooth term to be plotted. |
alpha |
|
ylim |
Optional limits of the y-axis. |
method_expTransform |
One of |
return_plotData |
If TRUE, the dataset prepared for plotting is returned. Defaults to FALSE. |
Details
If the model was estimated with a log or logit link, the function
automatically performs an exponential transformation of the effect,
see argument method_expTransform
.
Value
ggplot object
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de
Examples
library(APCtools)
library(mgcv)
data(travel)
model <- gam(mainTrip_distance ~ te(age, period) + residence_region +
household_size + s(household_income), data = travel)
plot_1Dsmooth(model, select = 2)
Heatmap of an APC surface
Description
Plot the heatmap of an APC structure. The function can be used in two ways:
Either to plot the observed mean structure of a metric variable, by
specifying dat
and the variable y_var
, or by specifying
dat
and the model
object, to plot some mean structure
represented by an estimated two-dimensional tensor product surface. The model
must be estimated with gam
or bam
.
Usage
plot_APCheatmap(
dat,
y_var = NULL,
model = NULL,
dimensions = c("period", "age"),
apc_range = NULL,
bin_heatmap = TRUE,
bin_heatmapGrid_list = NULL,
markLines_list = NULL,
markLines_displayLabels = c("age", "period", "cohort"),
y_var_logScale = FALSE,
plot_CI = TRUE,
method_expTransform = "simple",
legend_limits = NULL,
legend_title = NULL
)
Arguments
dat |
Dataset with columns |
y_var |
Optional character name of a metric variable to be plotted. |
model |
Optional regression model estimated with |
dimensions |
Character vector specifying the two APC dimensions that
should be visualized along the x-axis and y-axis. Defaults to
|
apc_range |
Optional list with one or multiple elements with names
|
bin_heatmap , bin_heatmapGrid_list |
|
markLines_list |
Optional list that can be used to highlight the borders
of specific age groups, time intervals or cohorts. Each element must be a
numeric vector of values where horizontal, vertical or diagonal lines should
be drawn (depends on which APC dimension is displayed on which axis).
The list can maximally have three elements and must have names out of
|
markLines_displayLabels |
Optional character vector defining for which
dimensions the lines defined through |
y_var_logScale |
Indicator if |
plot_CI |
Indicator if the confidence intervals should be plotted.
Only used if |
method_expTransform |
One of |
legend_limits |
Optional numeric vector passed as argument |
legend_title |
Optional character legend title. |
Details
See also plot_APChexamap
to plot a hexagonal heatmap with
adapted axes.
If the plot is created based on the model
object and the model was
estimated with a log or logit link, the function automatically performs an
exponential transformation of the effect.
Value
Plot grid created with ggarrange
(if
plot_CI
is TRUE) or a ggplot2
object (if plot_CI
is
FALSE).
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de, Maximilian Weigert maximilian.weigert@stat.uni-muenchen.de
References
Weigert, M., Bauer, A., Gernert, J., Karl, M., Nalmpatian, A., Küchenhoff, H., and Schmude, J. (2021). Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances. Tourism Economics. doi:10.1177/1354816620987198.
See Also
plot_APChexamap
Examples
library(APCtools)
library(mgcv)
data(travel)
# variant A: plot observed mean structures
# observed heatmap
plot_APCheatmap(dat = travel, y_var = "mainTrip_distance",
bin_heatmap = FALSE, y_var_logScale = TRUE)
# with binning
plot_APCheatmap(dat = travel, y_var = "mainTrip_distance",
bin_heatmap = TRUE, y_var_logScale = TRUE)
# variant B: plot some smoothed, estimated mean structure
model <- gam(mainTrip_distance ~ te(age, period) + residence_region +
household_size + s(household_income), data = travel)
# plot the smooth tensor product surface
plot_APCheatmap(dat = travel, model = model, bin_heatmap = FALSE, plot_CI = FALSE)
# ... same plot including the confidence intervals
plot_APCheatmap(dat = travel, model = model, bin_heatmap = FALSE)
# the APC dimensions can be flexibly assigned to the x-axis and y-axis
plot_APCheatmap(dat = travel, model = model, dimensions = c("age","cohort"),
bin_heatmap = FALSE, plot_CI = FALSE)
# add some reference lines
plot_APCheatmap(dat = travel, model = model, bin_heatmap = FALSE, plot_CI = FALSE,
markLines_list = list(cohort = c(1910,1939,1955,1980)))
# default binning of the tensor product surface in 5-year-blocks
plot_APCheatmap(dat = travel, model = model, plot_CI = FALSE)
# manual binning
manual_binning <- list(period = seq(min(travel$period, na.rm = TRUE) - 1,
max(travel$period, na.rm = TRUE), by = 5),
cohort = seq(min(travel$period - travel$age, na.rm = TRUE) - 1,
max(travel$period - travel$age, na.rm = TRUE), by = 10))
plot_APCheatmap(dat = travel, model = model, plot_CI = FALSE,
bin_heatmapGrid_list = manual_binning)
Hexamap of an APC surface
Description
Plot the heatmap of an APC structure using a hexagon-based plot with adapted
axes. In this way, the one temporal dimension that is represented by the
diagonal structure is visually not underrepresented compared to the other two
dimensions on the x-axis and y-axis.
The function can be used in two ways: Either to plot the observed mean
structure of a metric variable, by specifying dat
and the variable
y_var
, or by specifying dat
and the model
object, to
plot some mean structure represented by an estimated two-dimensional tensor
product surface. The model must be estimated with gam
or
bam
.
Usage
plot_APChexamap(
dat,
y_var = NULL,
model = NULL,
apc_range = NULL,
y_var_logScale = FALSE,
obs_interval = 1,
iso_interval = 5,
color_vec = NULL,
color_range = NULL,
line_width = 0.5,
line_color = gray(0.5),
label_size = 0.5,
label_color = "black",
legend_title = NULL
)
Arguments
dat |
Dataset with columns |
y_var |
Optional character name of a metric variable to be plotted. |
model |
Optional regression model estimated with |
apc_range |
Optional list with one or multiple elements with names
|
y_var_logScale |
Indicator if |
obs_interval |
Numeric specifying the interval width based on which the
data is spaced. Only used if |
iso_interval |
Numeric specifying the interval width between the isolines along each axis. Defaults to 5. |
color_vec |
Optional character vector of color names, specifying the color continuum. |
color_range |
Optional numeric vector with two elements, specifying the ends of the color scale in the legend. |
line_width |
Line width of the isolines. Defaults to 0.5. |
line_color |
Character color name for the isolines. Defaults to gray. |
label_size |
Size of the labels along the axes. Defaults to 0.5. |
label_color |
Character color name for the labels along the axes. |
legend_title |
Optional character title for the legend. |
Details
See also plot_APCheatmap
to plot a regular heatmap.
If the plot is created based on the model
object and the model was
estimated with a log or logit link, the function automatically performs an
exponential transformation of the effect.
Value
Creates a plot with base R functions (not ggplot2
).
Author(s)
Hawre Jalal hjalal@pitt.edu, Alexander Bauer alexander.bauer@stat.uni-muenchen.de
References
Jalal, H., Burke, D. (2020). Hexamaps for Age–Period–Cohort Data Visualization and Implementation in R. Epidemiology, 31 (6), e47-e49. doi: 10.1097/EDE.0000000000001236.
See Also
Examples
library(APCtools)
library(mgcv)
library(dplyr)
data(drug_deaths)
# restrict to data where the mortality rate is available
drug_deaths <- drug_deaths %>% filter(!is.na(mortality_rate))
# hexamap of an observed structure
plot_APChexamap(dat = drug_deaths,
y_var = "mortality_rate",
color_range = c(0,40))
# hexamap of a smoothed structure
model <- gam(mortality_rate ~ te(age, period, bs = "ps", k = c(8,8)),
data = drug_deaths)
plot_APChexamap(dat = drug_deaths, model = model)
Plot the density of one metric or categorical variable
Description
Create a density plot or a boxplot of one metric variable or a barplot of one categorical variable, based on a specific subset of the data.
Usage
plot_density(
dat,
y_var,
plot_type = "density",
apc_range = NULL,
highlight_diagonals = NULL,
y_var_cat_breaks = NULL,
y_var_cat_labels = NULL,
weights_var = NULL,
log_scale = FALSE,
xlab = NULL,
ylab = NULL,
legend_title = NULL,
...
)
Arguments
dat |
Dataset with columns |
y_var |
Character name of the main variable to be plotted. |
plot_type |
One of |
apc_range |
Optional list with one or multiple elements with names
|
highlight_diagonals |
Optional internal parameter which is only
specified when |
y_var_cat_breaks |
Optional numeric vector of breaks to categorize
|
y_var_cat_labels |
Optional character vector for the names of the
categories that were defined based on |
weights_var |
Optional character name of a weights variable used to project the results in the sample to some population. |
log_scale |
Indicator if the main variable should be log10 transformed.
Only used if the |
xlab , ylab , legend_title |
Optional plot annotations. |
... |
Additional arguments passed to |
Details
If plot_density
is called internally from within
plot_densityMatrix
(i.e., if the dataset contains some of the
columns c("age_group","period_group","cohort_group")
), this function
will calculate the metric densities individually for these groups.
Value
ggplot object
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de, Maximilian Weigert maximilian.weigert@stat.uni-muenchen.de
Examples
library(APCtools)
data(travel)
plot_density(dat = travel, y_var = "mainTrip_distance")
plot_density(dat = travel, y_var = "mainTrip_distance")
Create a matrix of density plots
Description
This function creates a matrix of individual density plots (i.e., a ridgeline matrix) or boxplots (for metric variables) or of individual barplots (for categorical variables). The age, period or cohort information can each either be plotted on the x-axis or the y-axis.
Usage
plot_densityMatrix(
dat,
y_var,
dimensions = c("period", "age"),
age_groups = NULL,
period_groups = NULL,
cohort_groups = NULL,
plot_type = "density",
highlight_diagonals = NULL,
y_var_cat_breaks = NULL,
y_var_cat_labels = NULL,
weights_var = NULL,
log_scale = FALSE,
legend_title = NULL,
...
)
Arguments
dat |
Dataset with columns |
y_var |
Character name of the main variable to be plotted. |
dimensions |
Character vector specifying the two APC dimensions that
should be visualized along the x-axis and y-axis. Defaults to
|
age_groups , period_groups , cohort_groups |
Each a list. Either containing
purely scalar values or with each element specifying the two borders of one
row or column in the density matrix. E.g., if the period should be visualized
in decade columns from 1980 to 2009, specify
|
plot_type |
One of |
highlight_diagonals |
Optional list to define diagonals in the density that should be highlighted with different colors. Each list element should be a numeric vector stating the index of the diagonals (counted from the top left) that should be highlighted in the same color. If the list is named, the names are used as legend labels. |
y_var_cat_breaks |
Optional numeric vector of breaks to categorize
|
y_var_cat_labels |
Optional character vector for the names of the
categories that were defined based on |
weights_var |
Optional character name of a weights variable used to project the results in the sample to some population. |
log_scale |
Indicator if the main variable should be log10 transformed.
Only used if the |
legend_title |
Optional plot annotation. |
... |
Additional arguments passed to |
Value
ggplot object
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de, Maximilian Weigert maximilian.weigert@stat.uni-muenchen.de
References
Weigert, M., Bauer, A., Gernert, J., Karl, M., Nalmpatian, A., Küchenhoff, H., and Schmude, J. (2021). Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances. Tourism Economics. doi:10.1177/1354816620987198.
Examples
library(APCtools)
# define categorizations for the main trip distance
dist_cat_breaks <- c(1,500,1000,2000,6000,100000)
dist_cat_labels <- c("< 500 km","500 - 1,000 km", "1,000 - 2,000 km",
"2,000 - 6,000 km", "> 6,000 km")
age_groups <- list(c(80,89),c(70,79),c(60,69),c(50,59),c(40,49),c(30,39),c(20,29))
period_groups <- list(c(1970,1979),c(1980,1989),c(1990,1999),c(2000,2009),c(2010,2019))
cohort_groups <- list(c(1980,1989),c(1970,1979),c(1960,1969),c(1950,1959),c(1940,1949),
c(1930,1939),c(1920,1929))
plot_densityMatrix(dat = travel,
y_var = "mainTrip_distance",
age_groups = age_groups,
period_groups = period_groups,
log_scale = TRUE)
# highlight two cohorts
plot_densityMatrix(dat = travel,
y_var = "mainTrip_distance",
age_groups = age_groups,
period_groups = period_groups,
highlight_diagonals = list(8, 10),
log_scale = TRUE)
# also mark different distance categories
plot_densityMatrix(dat = travel,
y_var = "mainTrip_distance",
age_groups = age_groups,
period_groups = period_groups,
log_scale = TRUE,
y_var_cat_breaks = dist_cat_breaks,
y_var_cat_labels = dist_cat_labels,
highlight_diagonals = list(8, 10),
legend_title = "Distance category")
# flexibly assign the APC dimensions to the x-axis and y-axis
plot_densityMatrix(dat = travel,
y_var = "mainTrip_distance",
dimensions = c("period","cohort"),
period_groups = period_groups,
cohort_groups = cohort_groups,
log_scale = TRUE,
y_var_cat_breaks = dist_cat_breaks,
y_var_cat_labels = dist_cat_labels,
legend_title = "Distance category")
# use boxplots instead of densities
plot_densityMatrix(dat = travel,
y_var = "mainTrip_distance",
plot_type = "boxplot",
age_groups = age_groups,
period_groups = period_groups,
log_scale = TRUE,
highlight_diagonals = list(8, 10))
# plot categorical variables instead of metric ones
plot_densityMatrix(dat = travel,
y_var = "household_size",
age_groups = age_groups,
period_groups = period_groups,
highlight_diagonals = list(8, 10))
Internal helper to plot a categorical density
Description
Internal helper function to plot one categorical density, to be called from
within plot_density
.
Usage
plot_density_categorical(
dat,
y_var,
dat_highlightDiagonals = NULL,
weights_var = NULL,
xlab = NULL,
ylab = NULL
)
Arguments
dat |
Dataset with columns |
y_var |
Character name of the main variable to be plotted. |
dat_highlightDiagonals |
Optional dataset created by
|
weights_var |
Optional character name of a weights variable used to project the results in the sample to some population. |
xlab , ylab |
Optional plot annotations. |
Internal helper to plot a metric density
Description
Internal helper function to plot one metric density, to be called from within
plot_density
.
Usage
plot_density_metric(
dat,
y_var,
plot_type = "density",
dat_highlightDiagonals = NULL,
y_var_cat_breaks = NULL,
y_var_cat_labels = NULL,
weights_var = NULL,
log_scale = FALSE,
xlab = NULL,
ylab = NULL,
legend_title = NULL,
...
)
Arguments
dat |
Dataset with columns |
y_var |
Character name of the main variable to be plotted. |
plot_type |
One of |
dat_highlightDiagonals |
Optional dataset created by
|
y_var_cat_breaks |
Optional numeric vector of breaks to categorize
|
y_var_cat_labels |
Optional character vector for the names of the
categories that were defined based on |
weights_var |
Optional character name of a weights variable used to project the results in the sample to some population. |
log_scale |
Indicator if the main variable should be log10 transformed.
Only used if the |
xlab , ylab , legend_title |
Optional plot annotations. |
... |
Additional arguments passed to |
Joint plot to compare the marginal APC effects of multiple models
Description
This function creates a joint plot of the marginal APC effects of multiple estimated models. It creates a plot with one pane per age, period and cohort effect, each containing one lines for each estimated model.
Usage
plot_jointMarginalAPCeffects(
model_list,
dat,
vlines_list = NULL,
ylab = NULL,
ylim = NULL,
plot_CI = FALSE
)
Arguments
model_list |
A list of regression models estimated with
|
dat |
Dataset with columns |
vlines_list |
Optional list that can be used to highlight the borders of
specific age groups, time intervals or cohorts. Each element must be a
numeric vector of values on the x-axis where vertical lines should be drawn.
The list can maximally have three elements and must have names out of
|
ylab , ylim |
Optional ggplot2 styling arguments. |
plot_CI |
Indicator if 95% confidence intervals should be plotted. Defaults to FALSE. |
Details
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effect.
Since the plot output created by the function is no ggplot2
object,
but an object created with ggpubr::ggarrange
, the overall theme
of the plot cannot be changed by adding the theme in the form of
'plot_jointMarginalAPCeffects(...) + theme_minimal(...)
'.
Instead, you can call theme_set(theme_minimal(...))
as an individual
call before calling plot_jointMarginalAPCeffects(...)
. The latter
function will then use this global plotting theme.
Value
Plot grid created with ggarrange
.
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de, Maximilian Weigert maximilian.weigert@stat.uni-muenchen.de
Examples
library(APCtools)
library(mgcv)
data(travel)
# plot marginal effects of one model
model_pure <- gam(mainTrip_distance ~ te(age, period), data = travel)
plot_jointMarginalAPCeffects(model_pure, dat = travel)
# plot marginal effects of multiple models
model_cov <- gam(mainTrip_distance ~ te(age, period) + s(household_income),
data = travel)
model_list <- list("pure model" = model_pure,
"covariate model" = model_cov)
plot_jointMarginalAPCeffects(model_list, dat = travel)
# mark specific cohorts
plot_jointMarginalAPCeffects(model_list, dat = travel,
vlines_list = list("cohort" = c(1966.5,1982.5,1994.5)))
Plot linear effects of a gam in an effect plot
Description
Create an effect plot of linear effects of a model fitted with
gam
or bam
.
Usage
plot_linearEffects(
model,
variables = NULL,
return_plotData = FALSE,
refCat = FALSE,
...
)
Arguments
model |
|
variables |
Optional character vector of variable names specifying which effects should be plotted. The order of the vector corresponds to the order in the effect plot. If the argument is not specified, all linear effects are plotted according to the order of their appearance in the model output. |
return_plotData |
If TRUE, the dataset prepared for plotting is returned. Defaults to FALSE. |
refCat |
If TRUE, reference categories are added to the output for categorical covariates. Defaults to FALSE. |
... |
Additional arguments passed to
|
Details
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effect.
Value
ggplot object
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de
Examples
library(APCtools)
library(mgcv)
data(travel)
model <- gam(mainTrip_distance ~ te(age, period) + residence_region +
household_size + s(household_income), data = travel)
plot_linearEffects(model)
Plot of marginal APC effects based on an estimated GAM model
Description
Plot the marginal effect of age, period or cohort, based on an APC model
estimated as a semiparametric additive regression model with gam
or bam
.
This function is a simple wrapper to plot_partialAPCeffects
,
called with argument hide_partialEffects = TRUE
.
Usage
plot_marginalAPCeffects(
model,
dat,
variable = "age",
vlines_vec = NULL,
plot_CI = FALSE,
return_plotData = FALSE
)
Arguments
model |
Optional regression model estimated with |
dat |
Dataset with columns |
variable |
One of |
vlines_vec |
Optional numeric vector of values on the x-axis where vertical lines should be drawn. Can be used to highlight the borders of specific age groups, time intervals or cohorts. |
plot_CI |
Indicator if 95% confidence intervals should be plotted. Defaults to FALSE. |
return_plotData |
If TRUE, a list of the datasets prepared for plotting
is returned instead of the ggplot object. The list contains one dataset each
for the overall effect (= evaluations of the APC surface to plot the partial
effects) and for each marginal APC effect (no matter the specified value of
the argument |
Value
ggplot object
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de, Maximilian Weigert maximilian.weigert@stat.uni-muenchen.de
References
Weigert, M., Bauer, A., Gernert, J., Karl, M., Nalmpatian, A., Küchenhoff, H., and Schmude, J. (2021). Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances. Tourism Economics. doi:10.1177/1354816620987198.
Examples
library(APCtools)
library(mgcv)
data(travel)
model <- gam(mainTrip_distance ~ te(age, period), data = travel)
plot_marginalAPCeffects(model, dat = travel, variable = "age")
# mark specific cohorts
plot_marginalAPCeffects(model, dat = travel, variable = "cohort",
vlines_vec = c(1966.5,1982.5,1994.5))
Partial APC plots based on an estimated GAM model
Description
Create the partial APC plots based on an APC model estimated as a semiparametric
additive regression model with gam
or bam
.
Usage
plot_partialAPCeffects(
model,
dat,
variable = "age",
hide_partialEffects = FALSE,
vlines_vec = NULL,
plot_CI = FALSE,
return_plotData = FALSE
)
Arguments
model |
Optional regression model estimated with |
dat |
Dataset with columns |
variable |
One of |
hide_partialEffects |
If TRUE, only the marginal effect will be plotted. Defaults to FALSE. |
vlines_vec |
Optional numeric vector of values on the x-axis where vertical lines should be drawn. Can be used to highlight the borders of specific age groups, time intervals or cohorts. |
plot_CI |
Indicator if 95% confidence intervals for marginal APC effects
should be plotted. Only used if |
return_plotData |
If TRUE, a list of the datasets prepared for plotting
is returned instead of the ggplot object. The list contains one dataset each
for the overall effect (= evaluations of the APC surface to plot the partial
effects) and for each marginal APC effect (no matter the specified value of
the argument |
Details
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effect.
Value
ggplot object (if hide_partialEffects
is TRUE) or a plot grid
created with ggarrange
(if FALSE).
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de, Maximilian Weigert maximilian.weigert@stat.uni-muenchen.de
References
Weigert, M., Bauer, A., Gernert, J., Karl, M., Nalmpatian, A., Küchenhoff, H., and Schmude, J. (2021). Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances. Tourism Economics. doi:10.1177/1354816620987198.
Examples
library(APCtools)
library(mgcv)
data(travel)
model <- gam(mainTrip_distance ~ te(age, period), data = travel)
plot_partialAPCeffects(model, dat = travel, variable = "age")
# mark specific cohorts
plot_partialAPCeffects(model, dat = travel, variable = "cohort",
vlines_vec = c(1966.5,1982.5,1994.5))
Distribution plot of one variable against one APC dimension
Description
Plot the distribution of one variable in the data against age, period or
cohort. Creates a bar plot for categorical variables (see argument
geomBar_position
) and boxplots or a line plot of median values for
metric variables (see plot_type
).
Usage
plot_variable(
dat,
y_var,
apc_dimension = "period",
log_scale = FALSE,
plot_type = "boxplot",
geomBar_position = "fill",
legend_title = NULL,
ylab = NULL,
ylim = NULL
)
Arguments
dat |
Dataset containing columns |
y_var |
Character name of the variable to plot. |
apc_dimension |
One of |
log_scale |
Indicator if the visualized variable should be log10 transformed. Only used if the variable is numeric. Defaults to FALSE. |
plot_type |
One of |
geomBar_position |
Value passed to |
legend_title |
Optional character title for the legend which is drawn for categorical variables. |
ylab , ylim |
Optional arguments for styling the ggplot. |
Value
ggplot object
Author(s)
Alexander Bauer alexander.bauer@stat.uni-muenchen.de
Examples
library(APCtools)
data(travel)
# plot a metric variable
plot_variable(dat = travel, y_var = "mainTrip_distance",
apc_dimension = "period", log_scale = TRUE)
plot_variable(dat = travel, y_var = "mainTrip_distance",
apc_dimension = "period", log_scale = TRUE, plot_type = "line")
# plot a categorical variable
plot_variable(dat = travel, y_var = "household_size", apc_dimension = "period")
plot_variable(dat = travel, y_var = "household_size", apc_dimension = "period",
geomBar_position = "stack")
Data from the German Reiseanalyse survey
Description
This dataset from the Reiseanalyse survey comprises travel information on German travelers between 1971 and 2018. Data were collected in a yearly repeated cross-sectional survey of German pleasure travels, based on a sample representative for the (West) German citizens (until 2009) or for all German-speaking residents (starting 2010). Travelers from former East Germany are only included since 1990. Note that the sample only contains trips with at least five days of trip length. For details see Weigert et al. (2021).
Usage
data(travel)
Format
A dataframe containing
- period
Year in which the respondent traveled.
- age
Age of the respondent.
- sampling_weight
Individual weight of each respondent to account for a not perfectly representative sample and project the sample results to the population of German citizens (until 2009) or of German-speaking residents (starting 2010). Only available since 1974.
- german_citizenship
Indicator if the respondent is German citizen or not. Only available since 2010. Until 2009, all respondents were German citizens.
- residence_region
Indicator if the respondent's main residence is in a federal state in the former area of West Germany or in the former area of East Germany.
- household size
Categorized size of the respondent's household.
- household_income
Joint income (in €) of the respondent's household.
- mainTrip_duration
Categorized trip length of the respondent's main trip. The main trip is the trip which the respondent stated was his/her most important trip in the respective year.
- mainTrip_distance
Distance (in km) between the center of the respondent's federal state and the center of the country of destination, for the main trip. The main trip is the trip which the respondent stated was his/her most important trip in the respective year.
Details
The data are a 10% random sample of all respondents who undertook at least one trip in the respective year, between 1971 and 2018. We thank the Forschungsgemeinschaft Urlaub und Reisen e.V. for allowing us to publish this sample.
References
Weigert, M., Bauer, A., Gernert, J., Karl, M., Nalmpatian, A., Küchenhoff, H., and Schmude, J. (2021). Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances. Tourism Economics. doi:10.1177/1354816620987198.
Forschungsgemeinschaft Urlaub und Reisen e.V. (FUR) (2020b) Survey of tourist demand in Germany for holiday travel and short breaks. Available at: https://reiseanalyse.de/wp-content/uploads/2022/11/RA2020_First-results_EN.pdf (accessed 13 January 2023).