The ces
package provides easy access to Canadian
Election Study (CES) data, simplifying the process of downloading,
cleaning, and analyzing these datasets in R. The CES has been conducted
during federal elections since 1965, providing valuable insight into
Canadian political behavior, attitudes, and preferences.
The package provides access to Canadian Election Study datasets spanning from 1965 to 2021, including different survey variants where available:
# List all available datasets with variants
datasets <- list_ces_datasets()
#> # A tibble: 16 × 2
#> year variants
#> <chr> <chr>
#> 1 1965 single_survey
#> 2 1968 single_survey
#> 3 1972 jnjl, sep, nov
#> 4 1974 single_survey, 1974_1980
#> 5 1984 single_survey
#> 6 1988 single_survey
#> 7 1993 single_survey
#> 8 1997 single_survey
#> 9 2000 single_survey
#> 10 2004 single_survey
#> 11 2006 single_survey
#> 12 2008 single_survey
#> 13 2011 single_survey
#> 14 2015 web, phone, combo
#> 15 2019 web, phone
#> 16 2021 web
print(datasets)
#> # A tibble: 16 × 2
#> year variants
#> <chr> <chr>
#> 1 1965 single_survey
#> 2 1968 single_survey
#> 3 1972 jnjl, sep, nov
#> 4 1974 single_survey, 1974_1980
#> 5 1984 single_survey
#> 6 1988 single_survey
#> 7 1993 single_survey
#> 8 1997 single_survey
#> 9 2000 single_survey
#> 10 2004 single_survey
#> 11 2006 single_survey
#> 12 2008 single_survey
#> 13 2011 single_survey
#> 14 2015 web, phone, combo
#> 15 2019 web, phone
#> 16 2021 web
# View unique years available
unique(datasets$year)
#> [1] "1965" "1968" "1972" "1974" "1984" "1988" "1993" "1997" "2000" "2004"
#> [11] "2006" "2008" "2011" "2015" "2019" "2021"
Some years have multiple survey variants available:
The primary function for accessing CES data is
get_ces()
, which downloads and processes the dataset for a
specified year:
For years with multiple survey modes, you can specify which variant to download:
# Get the 2019 web survey (default)
ces_2019_web <- get_ces("2019", variant = "web")
# Get the 2019 phone survey
ces_2019_phone <- get_ces("2019", variant = "phone")
# Get the 2015 combined web and phone survey
ces_2015_combo <- get_ces("2015", variant = "combo")
# Get the 1972 September survey
ces_1972_sep <- get_ces("1972", variant = "sep")
# See which variants are available for a given year
datasets_2015 <- list_ces_datasets()
datasets_2015[datasets_2015$year == "2015", ]
Default Behavior: - For 2015 and 2019: defaults to “web” variant - For all other years: uses the available variant (most have only one)
The get_ces()
function offers several options for
customizing how data is retrieved and processed:
# Get raw (uncleaned) data
ces_raw <- get_ces("2019", clean = FALSE)
# Get data as a data.frame instead of a tibble
ces_df <- get_ces("2019", format = "data.frame")
# Bypass cache and download fresh data
ces_fresh <- get_ces("2019", use_cache = FALSE)
# Disable metadata preservation if needed (not recommended)
ces_without_metadata <- get_ces("2019", preserve_metadata = FALSE)
# Silent mode - no progress messages
ces_silent <- get_ces("2019", verbose = FALSE)
CES datasets contain rich metadata including question text and value labels. The package preserves this metadata, which you can access:
# All metadata is preserved by default
ces_data <- get_ces("2019")
# Access variable label (question text)
attr(ces_data$vote_choice, "label")
# Access value labels
attr(ces_data$vote_choice, "labels")
# See all attributes of a variable
attributes(ces_data$vote_choice)
You can also examine metadata across the entire dataset with the
examine_metadata()
function:
For many analyses, you may only need a subset of variables. The
get_ces_subset()
function allows you to select specific
variables:
# Get a subset of variables by name from web survey (default)
variables <- c("vote_choice", "age", "gender", "province", "education")
ces_subset <- get_ces_subset("2019", variables = variables)
# Get subset from phone survey
ces_subset_phone <- get_ces_subset("2019", variant = "phone", variables = variables)
# Get all variables containing "vote" in their name (using regex)
vote_vars <- get_ces_subset("2019", variables = "vote", regex = TRUE)
CES datasets contain many variables with complex coding schemes. The
create_codebook()
function helps you understand these
variables:
# Get 2019 data
ces_2019 <- get_ces("2019")
# Create a codebook
codebook <- create_codebook(ces_2019)
# View the first few entries
head(codebook)
# Find variables about a specific topic
library(dplyr)
voting_vars <- codebook %>%
filter(grepl("vote|voted", question, ignore.case = TRUE)) %>%
pull(variable)
# Use these variables in your analysis
voting_data <- get_ces_subset("2019", variables = voting_vars)
You can export the codebook to a CSV or Excel file:
# Export to CSV
export_codebook(codebook, "ces_2019_codebook.csv")
# Export to Excel (requires openxlsx package)
export_codebook(codebook, "ces_2019_codebook.xlsx")
You can also download the official PDF codebook documents:
# Download the official CES codebook PDF
download_pdf_codebook("2019")
# Download to a specific folder
download_pdf_codebook("2015", path = "~/Documents/CES_codebooks")
The package also allows downloading the raw data files directly:
# Download a single CES dataset (defaults to web for 2015/2019)
download_ces_dataset("2019", path = "~/Documents/CES_datasets")
# Download a specific variant
download_ces_dataset("2019", variant = "phone", path = "~/Documents/CES_datasets")
# Download all available CES datasets to a folder (all variants)
download_all_ces_datasets(path = "~/Documents/CES_datasets")
# Download only specific years (all variants for those years)
download_all_ces_datasets(years = c("2015", "2019", "2021"))
# Download only web surveys for specific years
download_all_ces_datasets(years = c("2015", "2019"), variants = "web")
Here’s a simple example of how to use the package to analyze voting patterns:
# Get 2019 web survey data (default)
ces_2019_web <- get_ces("2019")
# Get 2019 phone survey data
ces_2019_phone <- get_ces("2019", variant = "phone")
# Compare sample sizes between survey modes
cat("Web survey sample size:", nrow(ces_2019_web), "\n")
cat("Phone survey sample size:", nrow(ces_2019_phone), "\n")
# Table of vote choice by province (web survey)
if (requireNamespace("dplyr", quietly = TRUE)) {
library(dplyr)
# Create a table of vote choice by province
vote_by_province <- ces_2019_web %>%
group_by(province, vote_choice) %>%
summarize(count = n(), .groups = "drop") %>%
pivot_wider(names_from = vote_choice, values_from = count, values_fill = 0)
print(vote_by_province)
}
The ces
package aims to make working with Canadian
Election Study data more accessible to R users. By handling the
downloading, storage, and initial processing of these datasets,
researchers can focus on analysis rather than data wrangling.
For more information about the Canadian Election Study, visit the official website or refer to the dataset documentation.
This package accesses Canadian Election Study data from multiple sources including the Borealis Data repository (the primary institutional repository for most CES datasets) and the Canadian Election Study website (the official CES project website hosting additional datasets). We gratefully acknowledge both Borealis Data and the Canadian Election Study team for maintaining and providing access to these valuable datasets.
Important Disclaimer: This package is not officially affiliated with the Canadian Election Study, Borealis Data, or the University of British Columbia. Users of this package should properly cite the original Canadian Election Study data in their research publications according to the citation guidelines provided by the CES.
The package was developed with assistance from Claude Sonnet 3.7, an AI assistant by Anthropic, demonstrating how these tools can be used to create helpful resources for the research community.