Type: | Package |
Title: | Excess Mortality |
Version: | 0.8.1 |
Depends: | R (≥ 3.5.0), |
Imports: | stats, datasets, dplyr, tidyr, rlang, lubridate, splines, ggplot2, scales |
Suggests: | tidycensus, knitr, rmarkdown |
VignetteBuilder: | knitr |
Description: | Implementation of method for estimating excess mortality and other health related outcomes from weekly or daily count data described in Acosta and Irizarry (2021) "A Flexible Statistical Framework for Estimating Excess Mortality". |
License: | Artistic-2.0 |
Encoding: | UTF-8 |
LazyData: | true |
LazyDataCompression: | xz |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-04-07 13:44:41 UTC; rafa |
Author: | Rafael A. Irizarry [aut, cre], Rolando Acosta [aut], Mónica Robles-Fontán [aut], Ferenci Tamás [ctb] |
Maintainer: | Rafael A. Irizarry <rafael_irizarry@dfci.harvard.edu> |
Repository: | CRAN |
Date/Publication: | 2025-04-07 14:10:06 UTC |
Interpolate demographic data
Description
Interpolate yearly population estimates so that a population estimate is provided for each day of the year. The function 'approx' is used with 'rule = 2' for extrapolation.
Usage
approx_demographics(
demo,
first_day,
last_day,
by,
extrapolation.type = c("linear", "constant", "none"),
...
)
Arguments
demo |
A data frame with the yearly population estimates. Must have a numeric column named year and a numeric column named population. |
first_day |
First day to interpolate. If missing the first day of the first year in demo is used. |
last_day |
Last day to interpolate. If missing the last day of the last year in demo is used. |
by |
Vector of column names to group by, for example different demographic strata. If missing it will extrapolate within each strata. To collapse all strata, define as by = NULL. |
extrapolation.type |
Type of extrapolation. Either linear, constant, or none. If none is selected the NAs are returned. |
... |
additional parameters sent to the function 'approx'. |
Value
A data frame with dates and population estimates.
Weekly death counts for each USA state
Description
The Center for Disease Control (CDC) provides weekly estimated death counts for each state in the USA. This data frame includes these estimates for each state along with population sizes.
state Name of USA state
date Corresponding date of observation
outcome Estimated number of deaths
population Population estimate (from the Census)
Usage
data(cdc_state_counts)
Format
An object of class "data.frame"
Callapse age groups into broader ones
Description
Collapse a count or demographic data frame into a broader age strata.
Usage
collapse_age_dist(demo, breaks)
collapse_counts_by_age(counts, breaks)
Arguments
demo |
A data frame with population sizes for different groups that will be collapsed. |
breaks |
The new age breaks for the new, broader, age strata. |
counts |
A data frame with counts and population sizes for different groups that will be collapsed |
Value
A age groups represented in 'demo' or 'counts' are grouped using the new age breaks defined by breaks but containing the populations and counts, if applicable, for age groups defined by 'breaks'.
Examples
library(lubridate)
data(cook_records)
## define smaller subset for example
cook_demographics_subset <- cook_demographics[year(cook_demographics$date)==2021, ]
demo <- collapse_age_dist(cook_demographics_subset,
breaks = c(0, 20, 40, 60, 80, Inf))
Compute counts
Description
Compute counts for groups from individual records
Usage
compute_counts(
dat,
by = NULL,
demo = NULL,
date = "date",
age = "age",
agegroup = "agegroup",
breaks = NULL
)
Arguments
dat |
The data frame with the individual records |
by |
A character vector with the column names the define the groups for which we will compute counts |
demo |
A data frame with population sizes for each time point for each of the groups for which we will compute counts |
date |
The column name of the column that contains dates |
age |
The column name of the column that contains age |
agegroup |
The column name of the column that contains agegroup |
breaks |
The ages that define the agegroups |
Details
This is helper function that helps convert individual records data, in which each death is a row, to a count data frame where each row is a date. It is particulalry helpful for defining agegroups. If you provide the 'breaks' argument it will automaticall divided the data and provide the counts for each age strata. You can also select variables to group by using the 'by' argument. One can provide a data frame with demogrpahic inform through the 'demo' argument. This tabe must have the population size for each group for each data.
Value
A data frame with counts for each group for each date with population sizes, if demo was provided.
Examples
library(lubridate)
data("cook_records")
the_breaks <- c(0, 20, 40, 60, 75, Inf)
## take subset for example
cook_records_subset <- cook_records[year(cook_records$date)==2021, ]
cook_demographics_subset <- cook_demographics[year(cook_demographics$date)==2021, ]
cook_counts <- compute_counts(cook_records_subset,
demo = cook_demographics_subset,
by = c("agegroup", "race", "sex"),
breaks = the_breaks)
Compute expected counts for each day
Description
Compute the expected death count for each unit of time. We assume counts are over-dispersed Poisson distributed with a trend that accounts for slow, year-to-year, changes in death rate across time and a seasonal effect. The function takes a data frame with dates and counts and returns the data frame with the expected counts as a new column. It also returns a logical column that is 'TRUE' if that entry was used in the estimation procedure.
Usage
compute_expected(
counts,
exclude = NULL,
include.trend = TRUE,
trend.knots.per.year = 1/7,
extrapolate = TRUE,
harmonics = 2,
frequency = NULL,
weekday.effect = FALSE,
keep.components = TRUE,
verbose = TRUE
)
Arguments
counts |
A data frame with dates, counts, and population size. |
exclude |
A list of dates to exclude when fitting the model. This is typically the period for which you will later estimate excess counts. |
include.trend |
Logical that determines if a slow trend is included in the model. |
trend.knots.per.year |
Number of knots per year used for the time trend |
extrapolate |
Logical that determines if the slow trend is extrapolated past the range of data used to fit. This |
harmonics |
Number of harmonics to include in the seasonal effect |
frequency |
Number of data points per year. If not provided, the function attempts to estimate it |
weekday.effect |
A logical that determines if a day of the week effect is included in the model |
keep.components |
A logical that if 'TRUE' forces the function to return the estimated trend, seasonal model, and weekday effect, if included in the model. |
verbose |
A logical that if 'TRUE' makes function prints out updates on the estimation procedure |
Details
Periods for which excess deaths will be estimated should be excluded when estimating expected counts. These can be supplied via the 'exclude' argument. Note that If 'extrapolate' is 'TRUE', the default, the time trend will be extrapolated following the estimated trend. If 'extrapolate' is 'FALSE' the trend is assumed to be a constant equal to the estimate on the last day before extrapolation. If the period for which excess deaths are estimated is long, extrapolation should be used with caution. We highly recommend exploring the estimated expected counts with the 'expected_diagnostics' function.
Value
The 'counts' data.frame with two columns added: 'expected' and 'excluded'. The 'expected' column is the estimated expected value of the counts for that date. The 'excluded' column is a logical vector denoting if that date was excluded when estimating the expected value.
If the argument 'keep.components' is 'TRUE' a list is returned with 'counts' data.frame in the first component, the estimated trend in the second, the estimated seasonal effect in the third and the estimated weekday effects in the fourth.
Examples
data(new_jersey_counts)
exclude_dates <- as.Date("2012-10-29") + 0:180
counts <- compute_expected(new_jersey_counts, exclude = exclude_dates, weekday.effect = TRUE)
library(ggplot2)
expected_plot(counts)
Cook County Medical Examiner Records
Description
A data frame with each row represening a death. The data includes death date and demographic information. When you load this dataset you also load 'cook_demographics' which includes the population size for each demographice group by date.
Usage
data("cook_records")
Format
A data frame with these columns
- sex
Sex of the deceased
- age
Age of the deceased
- race
Race of the deceased
- residenceplace
Residence placed of the deceased
- date
Date of the death
- cause_1
Reported cause of death
- type_of_death
Type of death
Compute cumulative excess deaths
Description
This function takes the output of the 'excess_model' function, a start date, and end date and calculates excess mortality and standard errors.
Usage
excess_cumulative(fit, start, end)
Arguments
fit |
The output of 'excess_model' |
start |
The start date |
end |
The end date |
Value
A data frame with the following columns
- date
The date
- observed
The observed excess mortality,which is the sum of observed minus expected until that date
- sd
The standard deviation for excess mortality for that date if year is typical
- fitted
The estimated of excess mortality based on the estimated smooth event effect curve
- se
The standard error for 'fitted'
Examples
data(new_jersey_counts)
exclude_dates <- as.Date("2012-10-29") + 0:180
control_dates <- seq(min(new_jersey_counts$date), min(exclude_dates) - 1, by = "day")
f <- excess_model(new_jersey_counts,
start = as.Date("2012-09-01"),
end = as.Date("2013-09-01"),
exclude = exclude_dates,
model = "correlated",
weekday.effect = TRUE,
control.dates = control_dates)
excess_cumulative(f,
start = as.Date("2017-12-15"),
end = as.Date("2017-12-21") )
Fit excess count model
Description
This function estimates the increase in the rate for a count time series relative to the rate for a typical year. Two options are available: 1 - model the rate increase as a smooth function and estimate this function or 2 - estimate the total excess in intervals. For 1 an 'event' date can be provided and a discontinuity included in the model. You can do either 1 or 2 or both.
Usage
excess_model(
counts,
start = NULL,
end = NULL,
knots.per.year = 12,
event = NULL,
intervals = NULL,
discontinuity = TRUE,
model = c("quasipoisson", "poisson", "correlated"),
exclude = NULL,
include.trend = TRUE,
trend.knots.per.year = 1/7,
harmonics = 2,
frequency = NULL,
weekday.effect = FALSE,
control.dates = NULL,
max.control = 5000,
order.max = 14,
aic = TRUE,
maxit = 25,
epsilon = 1e-08,
alpha = 0.05,
min.rate = 1e-04,
keep.counts = FALSE,
keep.components = TRUE,
verbose = TRUE
)
Arguments
counts |
A data frame with date, count and population columns. |
start |
First day of interval to which model will be fit |
end |
Last day of interval to which model will be fit |
knots.per.year |
Number of knots per year used for the fitted smooth function |
event |
If modeling a discontinuity is desired, this is the day in which it happens |
intervals |
Instead of 'start' and 'end' a list of time intervals can be provided and excess is computed in each one |
discontinuity |
Logical that determines if discontinuity is allowed at 'event' |
model |
Which version of the model to fit |
exclude |
Dates to exclude when computing expected counts |
include.trend |
Logical that determines if a slow trend is included in the model. |
trend.knots.per.year |
Number of knots per year used by 'compute_expected' to estimate the trend for the expected counts |
harmonics |
Number of harmonics used by 'compute_expected' to estimate seasonal trend |
frequency |
Number of observations per year. If not provided an attempt is made to calculate it |
weekday.effect |
Logical that determins if a day of the week effects is included in the model. Should be 'FALSE' for weekly or monthly data. |
control.dates |
When 'model' is set to 'correlated', these dates are used to estimate the covariance matrix. The larger this is the slower the function runs. |
max.control |
If the length of 'control.dates' is larger than 'max.control' the function stops. |
order.max |
Larges order for the Autoregressive process used to model the covariance structure |
aic |
A logical that determines if the AIC criterion is used to selected the order of the AR process |
maxit |
Maxium number of iterations for the IRLS algorithm used when 'model' is 'correlated' |
epsilon |
Difference in deviance requried to declare covergenace of IRLS |
alpha |
Percentile used to define what is outside the normal range |
min.rate |
The estimated expected rate is not permited to go below this value |
keep.counts |
A logical that if 'TRUE' forces the function to include the data used to fit the expected count model. |
keep.components |
A logical that if 'TRUE' forces the function to return the estimated trend, seasonal model, and weekday effect, if included in the model. Ignored if expected counts already provided or 'keep.counts' is 'FALSE'. |
verbose |
Logical that determines if messages are displayed |
Details
Three versions of the model are available: 1 - Assume counts are Poisson distributed, 2 - assume counts are overdispersed Poisson, or 3 - assume a mixed model with correlated errors. The second is the default and recommended for weekly count data. For daily counts we often find evidence of correlation and recommend the third along with setting 'weekday.effect = TRUE'.
If the 'counts' object includes a 'expected' column produced by 'compute_expected' these are used as the expected counts. If not, then these are computed.
Value
If only 'intervals' are provided a data frame with excess estimates described below for 'excess'. if 'start' and 'end' are provided the a list with the following components are included:
- date
The dates for which the estimate was computed
- observed
The observed counts
- expected
The expected counts
- fitted
The fitted curve for excess counts
- se
The point-wise standard error for the fitted curve
- population
The population size
- sd
The standard deviation for observed counts on a typical year
- cov
The estimated covariance matrix for the observed counts
- x
The design matrix used for the fit
- betacov
The covariance matrix for the estimated coefficients
- dispersion
The estimated overdispersion parameter
- detected_intervals
Time intervals for which the 1 - 'alpha' confidence interval does not include 0
- ar
The estimated coefficients for the autoregressive process
- excess
A data frame with information for the time intervals provided in 'itervals'. This includes start, end, observed death rate (per 1,000 per year), expected death rate, standard deviation for the death rate, observed counts, expected counts, excess counts, standard deviation
Examples
data(cdc_state_counts)
counts <- cdc_state_counts[cdc_state_counts$state == "Massachusetts", ]
exclude_dates <- c(seq(as.Date("2017-12-16"), as.Date("2018-01-16"), by = "day"),
seq(as.Date("2020-01-01"), max(cdc_state_counts$date), by = "day"))
f <- excess_model(counts,
exclude = exclude_dates,
start = min(counts$date),
end = max(counts$date),
knots.per.year = 12)
data(new_jersey_counts)
exclude_dates <- as.Date("2012-10-29") + 0:180
control_dates <- seq(min(new_jersey_counts$date), min(exclude_dates) - 1, by = "day")
f <- excess_model(new_jersey_counts,
start = as.Date("2012-09-01"),
end = as.Date("2013-09-01"),
exclude = exclude_dates,
model = "correlated",
weekday.effect = TRUE,
control.dates = control_dates)
Plot results from fitted excess count model
Description
Plot results from fitted excess count model
Usage
excess_plot(fit, title = "", ylim = NULL, show.data = TRUE, alpha = 0.05)
Arguments
fit |
The output from 'excess_model' |
title |
A title to add to plot |
ylim |
A vector with two numbers that determines the kimits for the y-axis |
show.data |
A logical that determines if the observed percent changes are shown |
alpha |
1 - 'alpha' confidence intervals are shown |
Value
An ggplot object containing the plot.
Examples
data(new_jersey_counts)
exclude_dates <- as.Date("2012-10-29") + 0:180
control_dates <- seq(min(new_jersey_counts$date), min(exclude_dates) - 1, by = "day")
f <- excess_model(new_jersey_counts,
start = as.Date("2012-09-01"),
end = as.Date("2013-09-01"),
exclude = exclude_dates,
model = "correlated",
weekday.effect = TRUE,
control.dates = control_dates)
library(ggplot2)
excess_plot(f)
Excess counts in an interval
Description
Helper function to compute excess deaths statistics for a
Usage
excess_stats(
start,
end,
obs,
mu,
cov,
pop,
frequency,
fhat = NULL,
X = NULL,
betacov = NULL
)
Arguments
start |
First day of interval |
end |
Last day of interval |
obs |
Observed counts |
mu |
Expected counts |
cov |
Covariance matrix for percent change |
pop |
Population size |
frequency |
Observations per year |
fhat |
Estimated percent increase |
X |
Design matrix used to estimate fhat |
betacov |
Covariance matrix for parameter estimates used to estimate fhat |
Diagnostic Plots for Model Fit
Description
Check mean model fit via diagnostic figures of the model components
Usage
expected_diagnostic(
expected,
start = NULL,
end = NULL,
color = "#D22B2B",
alpha = 0.5
)
Arguments
expected |
The output from 'compute_expected' with 'keep.components = TRUE' |
start |
First day to show |
end |
Last day to show |
color |
Color for the expected curve |
alpha |
alpha blending for points |
Value
A list with six ggplot objects: 'population' is a time series plot of the population. 'seasonal' is a plot showing the estimated seasonal effect. 'trend' is a time series plot showing the estimated trend. 'weekday' is a plot of the estimated weekday effects if they were estimated. 'expected'is a time series plot of the expected values. 'residual' is a time series plot of the residuals.
Examples
library(dplyr)
library(lubridate)
library(ggplot2)
flu_season <- seq(make_date(2017, 12, 16), make_date(2018, 1, 16), by = "day")
exclude_dates <- c(flu_season, seq(make_date(2020, 1, 1), today(), by = "day"))
res <- cdc_state_counts %>%
filter(state == "Massachusetts") %>%
compute_expected(exclude = exclude_dates,
keep.components = TRUE)
p <- expected_diagnostic(expected = res, alpha = 0.50)
p$population
p$seasonal
p$trend
p$expected
p$residual
Plot Expected Counts
Description
Check if expected counts fit data
Usage
expected_plot(
expected,
title = "",
start = NULL,
end = NULL,
ylim = NULL,
weekly = FALSE,
color = "#3366FF",
alpha = 0.5
)
Arguments
expected |
The output from 'compute_expected' |
title |
A title to add to plot |
start |
First day to show |
end |
Last day to show |
ylim |
A vector with two numbers that determines the kimits for the y-axis |
weekly |
Logical that determines if data should be summarized into weekly counts |
color |
Color for the expected curve |
alpha |
alpha blending for points |
Value
A ggplot object containing a plot of the original counts and the estimated expected values.
Examples
data(new_jersey_counts)
exclude_dates <- as.Date("2012-10-29") + 0:180
e <- compute_expected(new_jersey_counts, exclude = exclude_dates, weekday.effect = TRUE)
library(ggplot2)
expected_plot(e, start = as.Date("2012-09-01"), end = as.Date("2013-09-01"))
Fit an ar model to residuals from expected counts
Description
Helper function to estimate autoregressive mode
Usage
fit_ar(counts, control.dates = NULL, order.max = 5, aic = FALSE, plot = FALSE)
Arguments
counts |
Output from 'compute_excpected' |
control.dates |
Dates to use to estimate covariance |
order.max |
Maximum order of autoregressive process |
aic |
Logical that determines if AIC is used |
plot |
logical that determines if an autocorrelation plot is generated for exploration purposes |
Get demographic data from Census
Description
Get demographic data from Census
Usage
get_demographics(
geography = "state",
state,
county = NULL,
years = 2018,
vars = c("SEX", "AGEGROUP", "RACE", "HISP")
)
Arguments
geography |
state or county |
state |
name of the state |
county |
name of the county |
years |
vector of years for which we obtain data |
vars |
names of variables that define strat of which we want population estimates |
Assign age to group
Description
Assign age to group
Usage
group_age(age, breaks)
Arguments
age |
Vector of ages |
breaks |
Ages that define strata |
Value
A factor that groups the ages into the age groups defined by 'breaks'.
Louisiana daily mortality
Description
A data frame with Louisiana daily mortality counts from 2003-01-01 to 2016-12-31, which includes the day Katrina made landfall on 2015-08-29. The outcome column includes the number of deaths for that day.
Usage
data("louisiana_counts")
Format
An object of class tbl_df
(inherits from tbl
, data.frame
) with 1461 rows and 3 columns.
New Jersey daily mortality
Description
A data frame with Louisiana daily mortality counts from 2007-01-01 to 2015-12-31 which includes the day Sandy made landfall on 2012-10-29. The outcome column includes the number of deaths for that day.
Usage
data("new_jersey_counts")
Format
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3287 rows and 3 columns.
Compute year of the day ingoring Feb 29
Description
Compute year of the day ingoring Feb 29
Usage
noleap_yday(x)
Arguments
x |
date |
Puerto Rico daily mortality
Description
A data frame with Puerto Rico daily mortality counts, stratified by agegroup, from 1985 to 2020. which includes the days hurricanes Hugo, Georges, and Maria made landfall on 1989-09-18, 1998-09-21, and 2017-09-20, respectively. The outcome column includes the number of deaths for that day. This data frame also includes population counts and estimates for the period. Population counts for 1985 to 2000 are interpolations of decennial census products since 1980 to 2000. For 2000 onward, we use interpolated population estimates (PEP).
Usage
data("puerto_rico_counts")
Format
An object of class data.table
(inherits from data.frame
) with 499644 rows and 5 columns.
Puerto Rico daily mortality by cause of death
Description
A data frame with Puerto Rico daily mortality counts, stratified by cause of death from 1999 to 2020. which includes the day hurricanes Maria made landfall on 2017-09-20. The outcome column includes the number of deaths for that day for that ICD-10 code. The object 'icd' is included to show the description of each ICD-10 code.
Usage
data("puerto_rico_icd")
Format
An object of class tbl_df
(inherits from tbl
, data.frame
) with 262980 rows and 4 columns.
Weekly death counts for several countries
Description
A data frame with weekly death counts from multiple countries that includes the year 2020. Each country has a different range of data and most countries have at least 5 years of pre-2020 data. The original data was collated by the Financial Times .
Usage
data(world_counts)
Format
An object of class "data.frame"
Details
country Name of the country
date Corresponding date of observation
outcome Estimated number of deaths
population Population estimate (from the Census)