Title: | Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation |
Version: | 3.0.3 |
Date: | 2024-05-24 |
Description: | Predicts individual race/ethnicity using surname, first name, middle name, geolocation, and other attributes, such as gender and age. The method utilizes Bayes' Rule (with optional measurement error correction) to compute the posterior probability of each racial category for any given individual. The package implements methods described in Imai and Khanna (2016) "Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records" Political Analysis <doi:10.1093/pan/mpw001> and Imai, Olivella, and Rosenman (2022) "Addressing census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements" <doi:10.1126/sciadv.adc9824>. The package also incorporates the data described in Rosenman, Olivella, and Imai (2023) "Race and ethnicity data for first, middle, and surnames" <doi:10.1038/s41597-023-02202-2>. |
License: | GPL (≥ 3) |
URL: | https://github.com/kosukeimai/wru |
BugReports: | https://github.com/kosukeimai/wru/issues |
Depends: | R (≥ 4.1.0), utils |
Imports: | cli, dplyr, tidyr, furrr, future, piggyback (≥ 0.1.4), PL94171, purrr, Rcpp, rlang |
Suggests: | covr, testthat (≥ 3.0.0), tidycensus |
LinkingTo: | Rcpp, RcppArmadillo |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
LazyData: | yes |
LazyDataCompression: | xz |
LazyLoad: | yes |
RoxygenNote: | 7.3.1 |
NeedsCompilation: | yes |
Packaged: | 2024-05-24 16:06:47 UTC; beb |
Author: | Kabir Khanna [aut], Brandon Bertelsen [aut, cre], Santiago Olivella [aut], Evan Rosenman [aut], Alexander Rossell Hayes [aut], Kosuke Imai [aut] |
Maintainer: | Brandon Bertelsen <brandon@bertelsen.ca> |
Repository: | CRAN |
Date/Publication: | 2024-05-24 18:00:02 UTC |
Pre-process vector of names to match census style. Internal function
Description
Pre-process vector of names to match census style. Internal function
Usage
.name_preproc(voter_names, target_names)
Arguments
voter_names |
Character vector to be pre-processed. |
target_names |
Character vector of census names to be matched. |
Value
A character vector of pre-processed named
Convert between state names, postal abbreviations, and FIPS codes
Description
Convert between state names, postal abbreviations, and FIPS codes
Usage
as_fips_code(x)
as_state_abbreviation(x)
Arguments
x |
A numeric or character vector of state names, postal abbreviations, or FIPS codes. Matches for state names and abbreviations are not case sensitive. FIPS codes may be matched from numeric or character vectors, with or without leading zeroes. |
Value
as_state_fips_code()
-
A character vector of two-digit FIPS codes. One-digit FIPS codes are prefixed with a leading zero, e.g.,
"06"
for California. as_state_abbreviation()
-
A character vector of two-letter postal abbreviations, e.g.,
"CA"
for California.
Examples
as_fips_code("california")
as_state_abbreviation("california")
# Character vector matches ignore case
as_fips_code(c("DC", "Md", "va"))
as_state_abbreviation(c("district of columbia", "Maryland", "VIRGINIA"))
# Note that `3` and `7` are standardized to `NA`,
# because no state is assigned those FIPS codes
as_fips_code(1:10)
as_state_abbreviation(1:10)
# You can even mix methods in the same vector
as_fips_code(c("utah", "NM", 8, "04"))
as_state_abbreviation(c("utah", "NM", 8, "04"))
Preflight census data
Description
Preflight census data
Usage
census_data_preflight(census.data, census.geo, year)
Arguments
census.data |
A list indexed by two-letter state abbreviations,
which contains pre-saved Census geographic data.
Can be generated using |
census.geo |
An optional character vector specifying what level of
geography to use to merge in U.S. Census geographic data. Currently
|
year |
An optional character vector specifying the year of U.S. Census geographic
data to be downloaded. Use |
Census Data download function.
Description
census_geo_api
retrieves U.S. Census geographic data for a given state.
Usage
census_geo_api(
key = Sys.getenv("CENSUS_API_KEY"),
state,
geo = c("tract", "block", "block_group", "county", "place", "zcta"),
age = FALSE,
sex = FALSE,
year = c("2020", "2010"),
retry = 3,
save_temp = NULL,
counties = NULL
)
Arguments
key |
A character string containing a valid Census API key, which can be requested from the U.S. Census API key signup page. By default, attempts to find a census key stored in an
environment variable named |
state |
A required character object specifying which state to extract Census data for,
e.g., |
geo |
A character object specifying what aggregation level to use.
Use |
age |
A |
sex |
A |
year |
A character object specifying the year of U.S. Census data to be downloaded.
Use |
retry |
The number of retries at the census website if network interruption occurs. |
save_temp |
File indicating where to save the temporary outputs. Defaults to NULL. If specified, the function will look for an .RData file with the same format as the expected output. |
counties |
A vector of counties contained in your data. If |
Details
This function allows users to download U.S. Census geographic data (2010 or 2020), at either the county, tract, block, or place level, for a particular state.
Value
Output will be an object of class list
, indexed by state names. It will
consist of the original user-input data with additional columns of Census geographic data.
References
Relies on get_census_api()
, get_census_api_2()
, and vec_to_chunk()
functions authored by Nicholas Nagle,
available here.
Examples
## Not run: census_geo_api(states = c("NJ", "DE"), geo = "block")
## Not run: census_geo_api(states = "FL", geo = "tract", age = TRUE, sex = TRUE)
## Not run: census_geo_api(states = "MA", geo = "place", age = FALSE, sex = FALSE,
year = "2020")
## End(Not run)
Census geo API helper functions
Description
Census geo API helper functions
Usage
census_geo_api_names(
year = c("2020", "2010", "2000"),
age = FALSE,
sex = FALSE
)
census_geo_api_url(year = c("2020", "2010", "2000"))
Arguments
year |
A character object specifying the year of U.S. Census data to be downloaded.
Use |
age |
A |
sex |
A |
Value
census_geo_api_names()
-
A named list of character vectors whose values correspond to columns of a Census API table and whose names represent the new columns they are used to calculate in
census_geo_api()
. census_geo_api_url()
-
A character string containing the base of the URL to a Census API table.
Census helper function.
Description
census_helper
links user-input dataset with Census geographic data.
Usage
census_helper(
key = Sys.getenv("CENSUS_API_KEY"),
voter.file,
states = "all",
geo = "tract",
age = FALSE,
sex = FALSE,
year = "2020",
census.data = NULL,
retry = 3,
use.counties = FALSE
)
Arguments
key |
A character string containing a valid Census API key, which can be requested from the U.S. Census API key signup page. By default, attempts to find a census key stored in an
environment variable named |
voter.file |
An object of class |
states |
A character vector specifying which states to extract
Census data for, e.g. |
geo |
A character object specifying what aggregation level to use.
Use |
age |
A |
sex |
A |
year |
A character object specifying the year of U.S. Census data to be downloaded.
Use |
census.data |
A optional census object of class |
retry |
The number of retries at the census website if network interruption occurs. |
use.counties |
A logical, defaulting to FALSE. Should census data be filtered by counties available in census.data? |
Details
This function allows users to link their geocoded dataset (e.g., voter file) with U.S. Census data (2010 or 2020). The function extracts Census Summary File data at the county, tract, block, or place level. Census data calculated are Pr(Geolocation | Race) where geolocation is county, tract, block, or place.
Value
Output will be an object of class data.frame
. It will
consist of the original user-input data with additional columns of
Census data.
Examples
## Not run:
census_helper(voter.file = voters, states = "nj", geo = "block")
## End(Not run)
## Not run:
census_helper(
voter.file = voters, states = "all", geo = "tract",
age = TRUE, sex = TRUE
)
## End(Not run)
## Not run:
census_helper(
voter.file = voters, states = "all", geo = "county",
age = FALSE, sex = FALSE, year = "2020"
)
## End(Not run)
Census helper function.
Description
census_helper_new
links user-input dataset with Census geographic data.
Usage
census_helper_new(
key = Sys.getenv("CENSUS_API_KEY"),
voter.file,
states = "all",
geo = c("tract", "block", "block_group", "county", "place", "zcta"),
age = FALSE,
sex = FALSE,
year = "2020",
census.data = NULL,
retry = 3,
use.counties = FALSE,
skip_bad_geos = FALSE
)
Arguments
key |
A character string containing a valid Census API key, which can be requested from the U.S. Census API key signup page. By default, attempts to find a census key stored in an
environment variable named |
voter.file |
An object of class |
states |
A character vector specifying which states to extract
Census data for, e.g. |
geo |
A character object specifying what aggregation level to use.
Use |
age |
A |
sex |
A |
year |
A character object specifying the year of U.S. Census data to be downloaded.
Use |
census.data |
A optional census object of class |
retry |
The number of retries at the census website if network interruption occurs. |
use.counties |
A logical, defaulting to FALSE. Should census data be filtered by counties available in census.data? |
skip_bad_geos |
Logical. Option to have the function skip any geolocations that are not present
in the census data, returning a partial data set. Default is set to |
Details
This function allows users to link their geocoded dataset (e.g., voter file) with U.S. Census data (2010 or 2020). The function extracts Census Summary File data at the county, tract, block, or place level. Census data calculated are Pr(Geolocation | Race) where geolocation is county, tract, block, or place.
Value
Output will be an object of class data.frame
. It will
consist of the original user-input data with additional columns of
Census data.
Examples
## Not run: census_helper_new(voter.file = voters, states = "nj", geo = "block")
## Not run: census_helper_new(voter.file = voters, states = "all", geo = "tract")
## Not run: census_helper_new(voter.file = voters, states = "all", geo = "place",
year = "2020")
## End(Not run)
Legacy data formatting function.
Description
format_legacy_data
formats legacy data from the U.S. census to allow
for Bayesian name geocoding.
Usage
format_legacy_data(legacyFilePath, state, outFile = NULL)
Arguments
legacyFilePath |
A character vector giving the location of a legacy census data folder, sourced from https://www2.census.gov/programs-surveys/decennial/2020/data/01-Redistricting_File–PL_94-171/. These file names should end in ".pl". |
state |
The two letter state postal code. |
outFile |
Optional character vector determining whether the formatted RData object should be saved. The filepath should end in ".RData". |
Details
This function allows users to construct datasets for analysis using the census legacy data format. These data are available for the 2020 census at https://www2.census.gov/programs-surveys/decennial/2020/data/01-Redistricting_File–PL_94-171/. This function returns data structured analogously to data from the Census API, which is not yet available for the 2020 Census as of September 2021.
Examples
## Not run:
gaCensusData <- format_legacy_data(PL94171::pl_url('GA', 2020))
predict_race_new(ga.voter.file, namesToUse = 'last, first, mid', census.geo = 'block',
census.data = gaCensusData)
## End(Not run)
Census API function.
Description
get_census_api
obtains U.S. Census data via the public API.
Usage
get_census_api(
data_url,
key = Sys.getenv("CENSUS_API_KEY"),
var.names,
region,
retry = 0
)
Arguments
data_url |
URL root of the API,
e.g., |
key |
A character string containing a valid Census API key, which can be requested from the U.S. Census API key signup page. By default, attempts to find a census key stored in an
environment variable named |
var.names |
A character vector of variables to get,
e.g., |
region |
Character object specifying which region to obtain data for.
Must contain "for" and possibly "in",
e.g., |
retry |
The number of retries at the census website if network interruption occurs. |
Details
This function obtains U.S. Census data via the public API. User can specify the variables and region(s) for which to obtain data.
Value
If successful, output will be an object of class data.frame
.
If unsuccessful, function prints the URL query that caused the error.
References
Based on code authored by Nicholas Nagle, which is available here.
Examples
## Not run:
get_census_api(
data_url = "https://api.census.gov/data/2020/dec/pl",
var.names = c("P2_005N", "P2_006N", "P2_007N", "P2_008N"), region = "for=county:*&in=state:34"
)
## End(Not run)
Census API URL assembler.
Description
get_census_api_2
assembles URL components for get_census_api
.
Usage
get_census_api_2(
data_url,
key = Sys.getenv("CENSUS_API_KEY"),
get,
region,
retry = 3
)
Arguments
data_url |
URL root of the API,
e.g., |
key |
A character string containing a valid Census API key, which can be requested from the U.S. Census API key signup page. By default, attempts to find a census key stored in an
environment variable named |
get |
A character vector of variables to get,
e.g., |
region |
Character object specifying which region to obtain data for.
Must contain "for" and possibly "in",
e.g., |
retry |
The number of retries at the census website if network interruption occurs. |
Details
This function assembles the URL components and sends the request to the Census server.
It is used by the get_census_api
function. The user should not need to call this
function directly.
Value
If successful, output will be an object of class data.frame
.
If unsuccessful, function prints the URL query that was constructed.
References
Based on code authored by Nicholas Nagle, which is available here.
Examples
## Not run: try(get_census_api_2(data_url = "https://api.census.gov/data/2020/dec/pl",
get = c("P2_005N", "P2_006N", "P2_007N", "P2_008N"), region = "for=county:*&in=state:34"))
## End(Not run)
Multilevel Census data download function.
Description
get_census_data
returns county-, tract-, and block-level Census data
for specified state(s). Using this function to download Census data in advance
can save considerable time when running predict_race
and census_helper
.
Usage
get_census_data(
key = Sys.getenv("CENSUS_API_KEY"),
states,
age = FALSE,
sex = FALSE,
year = "2020",
census.geo = c("tract", "block", "block_group", "county", "place", "zcta"),
retry = 3,
county.list = NULL
)
Arguments
key |
A character string containing a valid Census API key, which can be requested from the U.S. Census API key signup page. By default, attempts to find a census key stored in an
environment variable named |
states |
which states to extract Census data for, e.g., |
age |
A |
sex |
A |
year |
A character object specifying the year of U.S. Census data to be downloaded.
Use |
census.geo |
An optional character vector specifying what level of
geography to use to merge in U.S. Census 2010 geographic data. Currently
|
retry |
The number of retries at the census website if network interruption occurs. |
county.list |
A named list of character vectors of counties present in your voter.file, per state. |
Value
Output will be an object of class list
indexed by state.
Output will contain a subset of the following elements:
state
, age
, sex
,
county
, tract
, block_group
, block
, and place
.
Examples
## Not run: get_census_data(states = c("NJ", "NY"), age = TRUE, sex = FALSE)
## Not run: get_census_data(states = "MN", age = FALSE, sex = FALSE, year = "2020")
Surname probability merging function.
Description
merge_names
merges names in a user-input dataset with corresponding
race/ethnicity probabilities derived from both the U.S. Census Surname List
and Spanish Surname List and voter files from states in the Southern U.S.
Usage
merge_names(
voter.file,
namesToUse,
census.surname,
table.surnames = NULL,
table.first = NULL,
table.middle = NULL,
clean.names = TRUE,
impute.missing = FALSE,
model = "BISG"
)
Arguments
voter.file |
An object of class |
namesToUse |
A character vector identifying which names to use for the prediction.
The default value is |
census.surname |
A |
table.surnames |
An object of class |
table.first |
See |
table.middle |
See |
clean.names |
A |
impute.missing |
See |
model |
See |
Details
This function allows users to match names in their dataset with database entries estimating P(name | ethnicity) for each of the five major racial groups for each name. The database probabilities are derived from both the U.S. Census Surname List and Spanish Surname List and voter files from states in the Southern U.S.
By default, the function matches names as follows:
Search raw surnames in the database;
Remove any punctuation and search again;
Remove any spaces and search again;
Remove suffixes (e.g., "Jr") and search again (last names only)
Split double-barreled names into two parts and search first part of name;
Split double-barreled names into two parts and search second part of name;
Each step only applies to names not matched in a previous step.
Steps 2 through 6 are not applied if clean.surname
is FALSE.
Note: Any name appearing only on the Spanish Surname List is assigned a probability of 1 for Hispanics/Latinos and 0 for all other racial groups.
Value
Output will be an object of class data.frame
. It will
consist of the original user-input data with additional columns that
specify the part of the name matched with Census data (surname.match
),
and the probabilities Pr(Race | Surname) for each racial group
(p_whi
for White, p_bla
for Black,
p_his
for Hispanic/Latino,
p_asi
for Asian and Pacific Islander, and
p_oth
for Other/Mixed).
Examples
data(voters)
## Not run: try(merge_names(voters, namesToUse = "surname", census.surname = TRUE))
Surname probability merging function.
Description
merge_surnames
merges surnames in user-input dataset with corresponding
race/ethnicity probabilities from U.S. Census Surname List and Spanish Surname List.
Usage
merge_surnames(
voter.file,
surname.year = 2020,
name.data,
clean.surname = TRUE,
impute.missing = TRUE
)
Arguments
voter.file |
An object of class |
surname.year |
An object of class |
name.data |
An object of class |
clean.surname |
A |
impute.missing |
A |
Details
This function allows users to match surnames in their dataset with the U.S. Census Surname List (from 2000 or 2010) and Spanish Surname List to obtain Pr(Race | Surname) for each of the five major racial groups.
By default, the function matches surnames to the Census list as follows:
Search raw surnames in Census surname list;
Remove any punctuation and search again;
Remove any spaces and search again;
Remove suffixes (e.g., Jr) and search again;
Split double-barreled surnames into two parts and search first part of name;
Split double-barreled surnames into two parts and search second part of name;
For any remaining names, impute probabilities using distribution for all names not appearing on Census list.
Each step only applies to surnames not matched in a previous ste.
Steps 2 through 7 are not applied if clean.surname
is FALSE.
Note: Any name appearing only on the Spanish Surname List is assigned a probability of 1 for Hispanics/Latinos and 0 for all other racial groups.
Value
Output will be an object of class data.frame
. It will
consist of the original user-input data with additional columns that
specify the part of the name matched with Census data (surname.match
),
and the probabilities Pr(Race | Surname) for each racial group
(p_whi
for White, p_bla
for Black,
p_his
for Hispanic/Latino,
p_asi
for Asian and Pacific Islander, and
p_oth
for Other/Mixed).
#'
Examples
data(voters)
## Not run: try(merge_surnames(voters))
Internal model fitting functions
Description
These functions are intended for internal use only. Users should use the
predict_race()
interface rather any of these functions directly.
Usage
.predict_race_old(
voter.file,
census.surname = TRUE,
surname.only = FALSE,
surname.year = 2020,
name.dictionaries = NULL,
census.geo,
census.key = Sys.getenv("CENSUS_API_KEY"),
census.data = NULL,
age = FALSE,
sex = FALSE,
year = "2020",
party,
retry = 3,
impute.missing = TRUE,
use.counties = FALSE
)
predict_race_new(
voter.file,
names.to.use,
year = "2020",
age = FALSE,
sex = FALSE,
census.geo = c("tract", "block", "block_group", "county", "place", "zcta"),
census.key = Sys.getenv("CENSUS_API_KEY"),
name.dictionaries,
surname.only = FALSE,
census.data = NULL,
retry = 0,
impute.missing = TRUE,
skip_bad_geos = FALSE,
census.surname = FALSE,
use.counties = FALSE
)
predict_race_me(
voter.file,
names.to.use,
year = "2020",
age = FALSE,
sex = FALSE,
census.geo = c("tract", "block", "block_group", "county", "place", "zcta"),
census.key = Sys.getenv("CENSUS_API_KEY"),
name.dictionaries,
surname.only = FALSE,
census.data = NULL,
retry = 0,
impute.missing = TRUE,
census.surname = FALSE,
use.counties = FALSE,
race.init,
ctrl
)
Arguments
voter.file |
See documentation in |
census.surname |
See documentation in |
surname.only |
See documentation in |
surname.year |
See documentation in |
name.dictionaries |
See documentation in |
census.geo |
See documentation in |
census.key |
A character object specifying user's Census API key.
Required if If |
census.data |
See documentation in |
age |
See documentation in |
sex |
See documentation in |
year |
See documentation in |
party |
See documentation in |
retry |
See documentation in |
impute.missing |
See documentation in |
use.counties |
A logical, defaulting to FALSE. Should census data be filtered by counties available in census.data? |
names.to.use |
See documentation in |
skip_bad_geos |
See documentation in |
race.init |
See documentation in |
ctrl |
See |
Details
These functions fit different versions of WRU. .predict_race_old
fits
the original WRU model, also known as BISG with census-based surname dictionary.
.predict_race_new
fits a new version of BISG which uses a new, augmented
surname dictionary, and can also accommodate the use of first and middle
name information. Finally, .predict_race_me
fits a fully Bayesian Improved
Surname Geocoding model (fBISG), which fits a model with measurement-error
correction of erroneous zeros in census tables, in addition to also accommodating
the augmented surname dictionary, and the first and middle name
dictionaries when making predictions.
Value
Output will be an object of class data.frame
. It will
consist of the original user-input voter.file
with additional columns with
predicted probabilities for each of the five major racial categories:
pred.whi
for White,
pred.bla
for Black,
pred.his
for Hispanic/Latino,
pred.asi
for Asian/Pacific Islander, and
pred.oth
for Other/Mixed.
.predict_race_old
Original WRU race prediction function, implementing classical BISG with census-based surname dictionary.
.predict_race_new
New race prediction function, implementing classical BISG with augmented surname dictionary, as well as first and middle name information.
.predict_race_me
New race prediction function, implementing fBISG (i.e. measurement error correction, fully Bayesian model) with augmented surname dictionary, as well as first and middle name information.
Race prediction function.
Description
predict_race
makes probabilistic estimates of individual-level race/ethnicity.
Usage
predict_race(
voter.file,
census.surname = TRUE,
surname.only = FALSE,
census.geo = c("tract", "block", "block_group", "county", "place", "zcta"),
census.key = Sys.getenv("CENSUS_API_KEY"),
census.data = NULL,
age = FALSE,
sex = FALSE,
year = "2020",
party = NULL,
retry = 3,
impute.missing = TRUE,
skip_bad_geos = FALSE,
use.counties = FALSE,
model = "BISG",
race.init = NULL,
name.dictionaries = NULL,
names.to.use = "surname",
control = NULL
)
Arguments
voter.file |
An object of class |
census.surname |
A |
surname.only |
A |
census.geo |
An optional character vector specifying what level of
geography to use to merge in U.S. Census geographic data. Currently
|
census.key |
A character object specifying user's Census API key.
Required if If |
census.data |
A list indexed by two-letter state abbreviations,
which contains pre-saved Census geographic data.
Can be generated using |
age |
An optional |
sex |
optional |
year |
An optional character vector specifying the year of U.S. Census geographic
data to be downloaded. Use |
party |
An optional character object specifying party registration field
in |
retry |
The number of retries at the census website if network interruption occurs. |
impute.missing |
Logical, defaults to TRUE. Should missing be imputed? |
skip_bad_geos |
Logical. Option to have the function skip any geolocations that are not present
in the census data, returning a partial data set. Default is set to |
use.counties |
A logical, defaulting to FALSE. Should census data be filtered by counties available in census.data? |
model |
Character string, either "BISG" (default) or "fBISG" (for error-correction, fully-Bayesian model). |
race.init |
Vector of initial race for each observation in voter.file.
Must be an integer vector, with 1=white, 2=black, 3=hispanic, 4=asian, and
5=other. Defaults to values obtained using |
name.dictionaries |
Optional named list of |
names.to.use |
One of 'surname', 'surname, first', or 'surname, first, middle'. Defaults to 'surname'. |
control |
List of control arguments only used when
|
Details
This function implements the Bayesian race prediction methods outlined in Imai and Khanna (2015). The function produces probabilistic estimates of individual-level race/ethnicity, based on surname, geolocation, and party.
Value
Output will be an object of class data.frame
. It will
consist of the original user-input voter.file
with additional columns with
predicted probabilities for each of the five major racial categories:
pred.whi
for White,
pred.bla
for Black,
pred.his
for Hispanic/Latino,
pred.asi
for Asian/Pacific Islander, and
pred.oth
for Other/Mixed.
Examples
#' data(voters)
try(predict_race(voter.file = voters, surname.only = TRUE))
## Not run:
try(predict_race(voter.file = voters, census.geo = "tract"))
## End(Not run)
## Not run:
try(predict_race(
voter.file = voters, census.geo = "place", year = "2020"))
## End(Not run)
## Not run:
CensusObj <- try(get_census_data(state = c("NY", "DC", "NJ")))
try(predict_race(
voter.file = voters, census.geo = "tract", census.data = CensusObj, party = "PID")
)
## End(Not run)
## Not run:
CensusObj2 <- try(get_census_data(state = c("NY", "DC", "NJ"), age = T, sex = T))
try(predict_race(
voter.file = voters, census.geo = "tract", census.data = CensusObj2, age = T, sex = T))
## End(Not run)
## Not run:
CensusObj3 <- try(get_census_data(state = c("NY", "DC", "NJ"), census.geo = "place"))
try(predict_race(voter.file = voters, census.geo = "place", census.data = CensusObj3))
## End(Not run)
Collapsed Gibbs sampler for hWRU. Internal function
Description
Collapsed Gibbs sampler for hWRU. Internal function
Usage
sample_me(
last_name,
first_name,
mid_name,
geo,
N_rg,
pi_s,
pi_f,
pi_m,
pi_nr,
which_names,
samples,
burnin,
race_init,
verbose
)
Arguments
last_name |
Integer vector of last name identifiers for each record (zero indexed; as all that follow). Must match columns numbers in M_rs. |
first_name |
See last_name |
mid_name |
See last_name |
geo |
Integer vector of geographic units for each record. Must match column number in N_rg |
N_rg |
Integer matrix of race | geography counts in census (geograpgies in columns). |
pi_s |
Numeric matrix of race | surname prior probabilities. |
pi_f |
Same as |
pi_m |
Same as |
pi_nr |
Matrix of marginal probability distribution over missing names; non-keyword names default to this distribution. |
which_names |
Integer; 0=surname only. 1=surname + first name. 2= surname, first, and middle names. |
samples |
Integer number of samples to take after (in total) |
burnin |
Integer number of samples to discard as burn-in of Markov chain |
race_init |
Integer vector of initial race assignments |
verbose |
Boolean; should informative messages be printed? |
Dataset with FIPS codes for US states
Description
Dataset including FIPS codes and postal abbreviations for each U.S. state, district, and territory.
Usage
state_fips
Format
A tibble with 57 rows and 3 columns:
state
Two-letter postal abbreviation
state_code
Two-digit FIPS code
state_name
English name
Source
Derived from tidycensus::fips_codes()
Census Surname List (2000).
Description
Census Surname List from 2000 with race/ethnicity probabilities by surname.
Usage
surnames2000
Format
A data frame with 157,728 rows and 6 variables:
- surname
Surname
- p_whi
Pr(White | Surname)
- p_bla
Pr(Black | Surname)
- p_his
Pr(Hispanic/Latino | Surname)
- p_asi
Pr(Asian/Pacific Islander | Surname)
- p_oth
Pr(Other | Surname)
#'
Examples
data(surnames2000)
Census Surname List (2010).
Description
Census Surname List from 2010 with race/ethnicity probabilities by surname.
Usage
surnames2010
Format
A data frame with 167,613 rows and 6 variables:
- surname
Surname
- p_whi
Pr(White | Surname)
- p_bla
Pr(Black | Surname)
- p_his
Pr(Hispanic/Latino | Surname)
- p_asi
Pr(Asian/Pacific Islander | Surname)
- p_oth
Pr(Other | Surname)
#'
Examples
data(surnames)
Variable vector into chunks.
Description
vec_to_chunk
takes a list of variables and collects them into 50-variable chunks.
Usage
vec_to_chunk(x)
Arguments
x |
Character vector of variable names. |
Details
This function takes a list of variable names and collects them into chunks with no more than 50 variables each. This helps to get around requests with more than 50 variables,because the API only allows queries of 50 variables at a time. The user should not need to call this function directly.
Value
Object of class list
.
References
Based on code authored by Nicholas Nagle, which is available here.
Examples
## Not run:
vec_to_chunk(x = c(paste("P012F0", seq(10:49), sep = ""),
paste("P012I0", seq(10, 49), sep = "")))
## End(Not run)
Example voter file.
Description
An example dataset containing voter file information.
Usage
voters
Format
A data frame with 10 rows and 12 variables:
- VoterID
Voter identifier (numeric)
- surname
Surname
- state
State of residence
- CD
Congressional district
- county
Census county (three-digit code)
- first
First name
- last
Last name or surname
- tract
Census tract (six-digit code)
- block
Census block (four-digit code)
- precinct
Voting precinct
- place
Voting place
- age
Age in years
- sex
0=male, 1=female
- party
Party registration (character)
- PID
Party registration (numeric)
#'
Examples
data(voters)
Preflight for name data
Description
Checks if namedata is available in the current working directory, if not
downloads it from github using piggyback. By default, wru will download the
data to a temporary directory that lasts as long as your session does.
However, you may wish to set the wru_data_wd
option to save the
downloaded data to your current working directory for more permanence.
Usage
wru_data_preflight()