Title: | Working with United States ZIP Code and ZIP Code Tabulation Area Data |
Version: | 0.1.2 |
Description: | Provides a set of functions for working with American postal codes, which are known as ZIP Codes. These include accessing ZIP Code to ZIP Code Tabulation Area (ZCTA) crosswalks, retrieving demographic data for ZCTAs, and tabulating demographic data for three-digit ZCTAs. |
Depends: | R (≥ 3.5) |
License: | Apache License (≥ 2) |
URL: | https://github.com/pfizer-opensource/zippeR |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | cli, datasets, dplyr, httr, jsonlite, purrr, readr, sf, spatstat.univar, stats, stringr, tibble, tidycensus, tidyr, tigris |
Suggests: | knitr, rmarkdown, testthat |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-04-25 20:15:49 UTC; prenec |
Author: | Christopher Prener
|
Maintainer: | Christopher Prener <Christopher.Prener@pfizer.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-25 20:50:02 UTC |
Aggregate ZCTAs to Three-digit ZCTAs
Description
This function takes input ZCTA data and aggregates it to three-digit areas, which are considerably larger. These regions are sometimes used in American health care contexts for publishing geographic identifiers.
Usage
zi_aggregate(.data, year, extensive = NULL, intensive = NULL,
intensive_method = "mean", survey, output = "tidy", zcta = NULL,
key = NULL)
Arguments
.data |
A tidy set of demographic data containing one or more variables that should be aggregated to three-digit ZCTAs. This data frame or tibble should contain all five-digit ZCTAs within the three digit ZCTAs that you plan to use for aggregating data. See Details below for formatting requirements. |
year |
A four-digit numeric scalar for year. |
extensive |
A character scalar or vector listing all extensive (i.e. count data) variables you wish to aggregate. These will be summed. For American Community Survey data, the margin of error will be calculated by taking the square root of the summed, squared margins of error for each five-digit ZCTA within a given three-digit ZCTA. |
intensive |
A character scalar or vector listing all intensive (i.e.
ratio, percent, or median data) variables you wish to aggregate. These
will be combined using the approach listed for |
intensive_method |
A character scalar; either |
survey |
A character scalar representing the Census product. It can
be either a Decennial Census product (either |
output |
A character scalar; one of |
zcta |
An optional vector of ZCTAs that demographic data are requested
for. If this is |
key |
A Census API key, which can be obtained at
https://api.census.gov/data/key_signup.html. This can be omitted if
|
Value
A tibble containing all aggregated data requested in either
"tidy"
or "wide"
format.
Examples
# load sample demographic data
mo22_demos <- zi_mo_pop
# the above data can be replicated with the following code:
# zi_get_demographics(year = 2022, variables = c("B01003_001", "B19013_001"),
# survey = "acs5")
# load sample geometric data
mo22_zcta3 <- zi_mo_zcta3
# the above data can be replicated with the following code:
# zi_get_geometry(year = 2022, style = "zcta3", state = "MO",
# method = "intersect")
# aggregate a single variable
zi_aggregate(mo22_demos, year = 2020, extensive = "B01003_001", survey = "acs5",
zcta = mo22_zcta3$ZCTA3)
# aggregate multiple variables, outputting wide data
zi_aggregate(mo22_demos, year = 2020,
extensive = "B01003_001", intensive = "B19013_001", survey = "acs5",
zcta = mo22_zcta3$ZCTA3, output = "wide")
Convert Five-digit ZIP Codes to Three-digit ZIP Codes
Description
This function converts five-digit ZIP Codes to three-digit ZIP Codes. The first three digits of a ZIP Code are known as the ZIP3 Code, and corresponds to the sectional center facility (SCF) that processes mail for a region.
Usage
zi_convert(.data, input_var, output_var)
Arguments
.data |
A data frame containing a column of five-digit ZIP Codes. |
input_var |
A character scalar specifying the column name with the five-digit ZIP Codes in the data frame. |
output_var |
Optional; A character scalar specifying the column name to store the three-digit ZIP Codes in the data frame. |
Value
A tibble containing the original data frame with a new column of three-digit ZIP Codes.
Examples
# add new column
## create sample data
df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636"))
## convert ZIP Codes to ZIP3, creating a new column
zi_convert(.data = df, input_var = zip5, output_var = zip3)
# overwrite existing column
## create sample data
df <- data.frame(id = c(1:3), zip = c("63005", "63139", "63636"))
## convert ZIP Codes to ZIP3, creating a new column
zi_convert(.data = df, input_var = zip)
Crosswalk ZIP Codes with UDS, HUD, or a Custom Dictionary
Description
This function compares input data containing ZIP Codes with a crosswalk file that will append ZCTAs. This is an important step because not all ZIP Codes have the same five digits as their enclosing ZCTA.
Usage
zi_crosswalk(.data, input_var, zip_source = "UDS", source_var,
source_result, year = NULL, qtr = NULL, target = NULL, query = NULL,
by = NULL, return_max = NULL, key = NULL, return = "id")
Arguments
.data |
An "input object" that is data.frame or tibble that contains ZIP Codes to be crosswalked. |
input_var |
The column in the input data that contains five-digit ZIP Codes. If the input is numeric, it will be transformed to character data and leading zeros will be added. |
zip_source |
Required character scalar or data frame; specifies the
source of ZIP Code crosswalk data. This can be one of either |
source_var |
Character scalar, required when |
source_result |
Character scalar, required when |
year |
Optional four-digit numeric scalar for year; varies based on source.
For |
qtr |
Numeric scalar, required when |
target |
Character scalar, required when |
query |
Scalar or vector, required when |
by |
Character scalar, required when |
return_max |
Logical scalar, required when |
key |
Optional when |
return |
Character scalar, specifies the type of output to return. Can be
one of |
Value
A tibble
with crosswalk values (or optionally, the full
crosswalk file) appended based on the return
argument.
Examples
# create sample data
df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636"))
# UDS crosswalk
zi_crosswalk(df, input_var = zip5, zip_source = "UDS", year = 2022)
# HUD crosswalk
# you will need to replace INSERT_HUD_KEY with your own key
## Not run:
zi_crosswalk(df, input_var = zip5, zip_source = "HUD", year = 2023,
qtr = 1, target = "COUNTY", query = "MO", by = "residential",
return_max = TRUE, key = INSERT_HUD_KEY)
## End(Not run)
# custom dictionary
## load sample crosswalk data to simulate custom dictionary
mo_xwalk <- zi_mo_hud
# prep crosswalk
# when a ZIP Code crosses county boundaries, the portion with the largest
# number of residential addresses will be returned
mo_xwalk <- zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE)
## crosswalk
zi_crosswalk(df, input_var = zip5, zip_source = mo_xwalk, source_var = zip5,
source_result = geoid)
Download Demographic Data for Five-digit ZCTAs
Description
This function returns demographic data for five-digit ZIP Code Tabulation Areas (ZCTAs), which are rough approximations of many (but not all) USPS ZIP codes.
Usage
zi_get_demographics(year, variables = NULL, table = NULL,
survey, output = "tidy", zcta = NULL, key = NULL)
Arguments
year |
A four-digit numeric scalar for year. |
variables |
A character scalar or vector of variable IDs. |
table |
A character scalar of a table ID (only one table may be requested per call). |
survey |
A character scalar representing the Census product. It can
be either a Decennial Census product (either |
output |
A character scalar; one of |
zcta |
An optional vector of ZCTAs that demographic data are requested
for. If this is |
key |
A Census API key, which can be obtained at
https://api.census.gov/data/key_signup.html. This can be omitted if
|
Value
A tibble containing all demographic data requested in either
"tidy"
or "wide"
format.
Examples
# download all ZCTAs
zi_get_demographics(year = 2012, variables = "B01003_001", survey = "acs5")
# limit output to subset of ZCTAs
## download all ZCTAs in Missouri, intersects method
mo20 <- zi_get_geometry(year = 2020, state = "MO", method = "intersect")
## download demographic data
zi_get_demographics(year = 2012, variables = "B01003_001", survey = "acs5",
zcta = mo20$GEOID)
Download and Optionally Geoprocess ZCTAs
Description
This function returns geometric data for ZIP Code Tabulation
Areas (ZCTAs), which are rough approximations of many (but not all)
USPS ZIP codes. Downloading and processing these data will be heavily
affected by your internet connection, your choice for the cb
argument, and the processing power of your computer (if you select
specific counties).
Usage
zi_get_geometry (year, style = "zcta5", return = "id", class = "sf",
state = NULL, county = NULL, territory = NULL, cb = FALSE,
starts_with = NULL, includes = NULL, excludes = NULL, method,
shift_geo = FALSE)
Arguments
year |
A four-digit numeric scalar for year. |
style |
A character scalar - either |
return |
A character scalar; if |
class |
A character scalar; if |
state |
A character scalar or vector with character state abbreviations
(e.x. |
county |
A character scalar or vector with character GEOIDs (e.x.
|
territory |
A character scalar or vector with character territory abbreviations
(e.x. |
cb |
A logical scalar; if This argument does not apply to |
starts_with |
A character scalar or vector containing the first two
digits of a GEOID or ZCTA3 value to return. It defaults to |
includes |
A character scalar or vector containing GEOID's or ZCTA3 values
to include when finalizing output. This may be necessary depending on what
is identified with the |
excludes |
A character scalar or vector containing GEOID's or ZCTA3 values
to exclude when finalizing output. This may be necessary depending on what
is identified with the |
method |
A character scalar - either |
shift_geo |
A logical scalar; if |
Details
This function contains options for both the type of ZCTA and,
optionally, for how state and county data are identified. For type,
either five-digit or three-digit ZCTA geometries are available. The
three-digit ZCTAs were created by geoprocessing the five-digit boundaries
for each year, and then applying a modest amount of simplification
(with sf::st_simplify()
) to reduce file size. The source files
are available on GitHub at https://github.com/chris-prener/zcta3.
Since ZCTAs cross state lines, two methods are used to create these
geometry data for years 2012 and beyond for states and all years for counties.
The "intersect"
method will return ZCTAs that border the states or
counties selected. In most cases, this will result in more ZCTAs being
returned than are actually within the states or counties selected.
Conversely, the "centroid"
method will return only ZCTAs whose
centroids (geographical centers) lie within the states or counties named.
In most cases, this will return fewer ZCTAs than actually lie within the
states or counties selected. Users will need to review their data carefully
and will likely need to use the include
and exclude
arguments
to finalize the geographies returned.
For state-level data in 2010 and 2011, the Census Bureau published individual
state files that will be utilized automatically by zippeR
. If
county-level data are requested for these years, the state-specific file
will be used as a base before identifying ZCTAs within counties using
either the "intersect"
or "centroid"
method described above.
Value
A sf
object with ZCTAs matching the parameters specified above:
either a nationwide file, a specific state or states, or a specific
county or counties.
Examples
# five-digit ZCTAs
## download all ZCTAs for 2020 including territories
zi_get_geometry(year = 2020, territory = c("AS", "GU", "MP", "PR", "VI"),
shift_geo = TRUE)
## download all ZCTAs for 2020 excluding territories
zi_get_geometry(year = 2020, shift_geo = TRUE)
## download all ZCTAs in a selection of states, intersects method
zi_get_geometry(year = 2020, state = c("IA", "IL", "MO"), method = "intersect")
## download all ZCTAs in a single county - St. Louis City, MO
zi_get_geometry(year = 2020, state = "MO", county = "29510",
method = "intersect")
# three-digit ZCTAs
## download all ZCTAs for 2018 including territories
zi_get_geometry(year = 2018, territory = c("AS", "GU", "MP", "PR", "VI"),
shift_geo = TRUE)
Label ZIP Codes with Contextual Data
Description
This function appends information about the city (for five-digit ZIP Codes) or area (for three-digit ZIP Codes) to a data frame containing these values. State is returned for both types of ZIP Codes. The function also optionally returns data on Sectional Center Facilities (SCFs) for three-digit ZIP Codes.
Usage
zi_label(.data, input_var, label_source = "UDS", source_var,
type = "zip5", include_scf = FALSE, vintage = 2022)
Arguments
.data |
An "input object" that is data.frame or tibble that contains ZIP Codes to be crosswalked. |
input_var |
The column in the input data that contains five-digit ZIP Codes. If the input is numeric, it will be transformed to character data and leading zeros will be added. |
label_source |
Required character scalar or data frame; specifies the
source of the label data. This could be either |
source_var |
Character scalar, required when |
type |
Character scalar, required when |
include_scf |
A logical scalar required when |
vintage |
Character or numeric scalar, required when |
Details
Labels are approximations of the actual location of a ZIP Code. For five-digit ZIP Codes, the city and state may or may not correspond to an individuals' mailing address city (since multiple cities may be accepted as valid by USPS for a particular ZIP Code) or state (since ZIP Codes may cross state lines).
For three-digit ZIP Codes, the area and state may or may not correspond to
an individuals' mailing address state (since SCFs cover multiple states).
For example, the three digit ZIP Code 010
covers Western Massachusetts
in practice, but is assigned to the state of Connecticut.
Value
A tibble containing the original data with additional columns from the selected label data set appended.
Examples
# create sample data
df <- data.frame(
id = c(1:3),
zip5 = c("63005", "63139", "63636"),
zip3 = c("630", "631", "636")
)
# UDS crosswalk
zi_label(df, input_var = zip5, label_source = "UDS", vintage = 2022)
# USPS crosswalk
zi_label(df, input_var = zip3, label_source = "USPS", type = "zip3",
vintage = 202408)
# custom dictionary
## load sample ZIP3 label data to simulate custom dictionary
mo_label <- zi_mo_usps
## label
zi_label(df, input_var = zip3, label_source = mo_label, source_var = zip3,
type = "zip3")
List ZCTA GEOIDs for States
Description
This function returns a vector of GEOIDs that represent ZCTAs in and around states, depending on the method selected. The two methods included described in Details below.
Usage
zi_list_zctas(year, state, method)
Arguments
year |
A four-digit numeric scalar for year. |
state |
A scalar or vector with state abbreviations (e.x. |
method |
A character scalar - either |
Details
Since ZCTAs cross state lines, two methods are used to create these
vectors. The "intersect"
method will return ZCTAs that border
the state selected. In most cases, this will result in more ZCTAs
being returned than are actually within the states(s) named in the
state
argument. Conversely, the "centroid"
method will
return only ZCTAs whose centroids (geographical centers) lie within the
states named. In most cases, this will return fewer ZCTAs than
actually lie within the state selected. Users will need to review
their data carefully and, when using other zipperR
functions,
will likely need to use the include
and exclude
arguments
to finalize the geographies returned.
Value
A vector of GEOIDs representing ZCTAs in and around the state selected.
Examples
# Missouri ZCTAs, intersect method
## return list
mo_zctas <- zi_list_zctas(year = 2021, state = "MO", method = "intersect")
## preview ZCTAs
mo_zctas[1:10]
# Missouri ZCTAs, centroid method
## return list
mo_zctas <- zi_list_zctas(year = 2021, state = "MO", method = "centroid")
## preview ZCTAs
mo_zctas[1:10]
Load Crosswalk Files
Description
Spatial data on USPS ZIP Codes are not published by the U.S. Postal Service or the U.S. Census Bureau. Instead, ZIP Codes can be converted to a variety of Census Bureau geographies using crosswalk files. This function reads in ZIP Code to ZIP Code Tabulation Area (ZCTA) crosswalk files from the former UDS Mapper project, which was sunset by the American Academy of Family Physicians in early 2024. It also provides access to the U.S. Department of Housing and Urban Development's ZIP Code crosswalk files, which provide similar functionality for converting ZIP Codes to a variety of geographies including counties.
Usage
zi_load_crosswalk(zip_source = "UDS", year, qtr = NULL, target = NULL,
query = NULL, key = NULL)
Arguments
zip_source |
Required character scalar; specifies the source of ZIP Code
crosswalk data. This can be one of either |
year |
Required four-digit numeric scalar for year; varies based on source.
For |
qtr |
Numeric scalar, required when |
target |
Character scalar, required when |
query |
Scalar or vector, required when |
key |
Optional when |
Value
A tibble containing the crosswalk file.
Examples
# former UDS mapper crosswalks
zi_load_crosswalk(zip_source = "UDS", year = 2020)
## Not run:
# HUD crosswalks
# you will need to replace INSERT_HUD_KEY with your own key
## ZIP Code to CBSA crosswalk for all ZIP Codes
zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, target = "CBSA",
query = "all", key = INSERT_HUD_KEY)
## ZIP Code to County crosswalk for all ZIP Codes in Missouri
zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, target = "COUNTY",
query = "MO", key = INSERT_HUD_KEY)
## ZIP Code to Tract crosswalk for ZIP Code 63139 in St. Louis City
zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, target = "TRACT",
query = 63139, key = INSERT_HUD_KEY)
## End(Not run)
Load Label Data
Description
This function loads a specific label data set that can be used to label five or three-digit ZIP codes in a data frame.
Usage
zi_load_labels(source = "UDS", type = "zip5", include_scf = FALSE,
vintage = 2022)
Arguments
source |
A required character scalar; specifies the source of the label
data. The only supported sources are |
type |
A required character scalar; one of either |
include_scf |
A logical scalar required when |
vintage |
A required character or numeric scalar; specifying the date
for |
Details
Labels are approximations of the actual location of a ZIP Code. For five-digit ZIP Codes, the city and state may or may not correspond to an individuals' mailing address city (since multiple cities may be accepted as valid by USPS for a particular ZIP Code) or state (since ZIP Codes may cross state lines).
For three-digit ZIP Codes, the area and state may or may not correspond to
an individuals' mailing address state (since SCFs cover multiple states).
For example, the three digit ZIP Code 010
covers Western Massachusetts
in practice, but is assigned to the state of Connecticut.
Value
A tibble with the specified label data for either five or three-digit ZIP Codes.
Examples
# zip5 labels via UDS
zi_load_labels(source = "UDS", type = "zip5", vintage = 2022)
# zip3 labels via USPS
zi_load_labels(source = "USPS", type = "zip3", vintage = 202408)
Load List of Available Label Data Sets
Description
This function loads a list of available label data sets that can be used to label ZIP Codes. Currently, only three-digit ZIP Codes are supported.
Usage
zi_load_labels_list(type = "zip3")
Arguments
type |
A character scalar specifying the type of label data to load. The
only supported type is |
Value
A tibble containing date values that can be used with zi_load_labels
.
Examples
zi_load_labels_list(type = "zip3")
Missouri HUD ZIP Code to County Crosswalk, 2023
Description
A tibble containing the HUD ZIP Code to County Crosswalk file for Missouri's ZIP Codes in 2023's first quarter.
Usage
data(zi_mo_hud)
Format
A data frame with 1749 rows and 8 variables:
- ZIP
five-digit United States Postal Service ZIP Code
- GEOID
five-digit county FIPS code
- RES_RATIO
for ZIP Codes that cross county boundaries, the proportion of the ZIP Code's residential customers in the given county
- BUS_RATIO
for ZIP Codes that cross county boundaries, the proportion of the ZIP Code's commercial customers in the given county
- OTH_RATIO
for ZIP Codes that cross county boundaries, the proportion of the ZIP Code's other customers in the given county
- TOT_RATIO
for ZIP Codes that cross county boundaries, the proportion of the ZIP Code's total customers in the given county
- CITY
United States Postal Service city name
- STATE
United States Postal Service state abbreviation
Details
The data included in zi_mo_hud
can be replicated with the
following code: zi_load_crosswalk(zip_source = "HUD", year = 2023,
qtr = 1, target = "COUNTY", query = "MO")
. This assumes your HUD API key
is stored in your .Rprofile
file as hud_key
.
Source
U.S. Department of Housing and Urban Development's ZIP Code crosswalk files
Examples
utils::str(zi_mo_hud)
utils::head(zi_mo_hud)
Total Population and Median Household Income, Missouri ZCTAs 2022
Description
A tibble containing the total population and median household income estimates from the 2018-2022 5-year U.S. Census Bureau American Communiy Survey estimates for Missouri five-digit ZIP Code Tabulation Areas (ZCTAs).
Usage
data(zi_mo_pop)
Format
A data frame with 2664 rows and 4 variables:
- GEOID
full GEOID string
- variable
variable, either
B01003_001
(total population) orB19013_001
(median household income)- estimate
value for associated
variable
- moe
margin of error for associated
variable
Details
The data included in zi_mo_pop
can be replicated with the
following code: zi_get_demographics(year = 2022,
variables = c("B01003_001", "B19013_001"), survey = "acs5")
.
Source
U.S. Census Bureau American Community Survey
Examples
utils::str(zi_mo_pop)
utils::head(zi_mo_pop)
Missouri USPS Three-digit ZIP Code Labels, August 2024
Description
A tibble containing the USPS Three-digit ZIP Code labels for August 2024.
Usage
data(zi_mo_usps)
Format
A data frame with 37 rows and 3 variables:
- zip3
three-digit United States Postal Service ZIP Code
- label_area
area associated with the three-digit ZIP Code
- label_state
state associated with the three-digit ZIP Code
Details
The data included in zi_mo_usps
can be replicated with the
following code: zi_load_labels(type = "zip3", source = "USPS",
vintage = 202408)
. After downloading the data, subset to
label_state == "MO"
.
Source
U.S. Postal Service Facility Access and Shipment Tracking (FAST) Database
Examples
utils::str(zi_mo_usps)
utils::head(zi_mo_usps)
Missouri Three-digit ZCTAs, 2022
Description
A simple features data set containing the geometric data for Missouri's three-digit ZIP Code Tabulation Areas (ZCTAs) for 2022, derived from the U.S. Census Bureau's 2022 TIGER/Line shapefiles.
Usage
data(zi_mo_zcta3)
Format
A data frame with 31 rows and 2 variables:
- ZCTA3
three-digit ZCTA value
- geometry
simple features geometry
Details
The data included in zi_mo_zcta3
can be replicated with the
following code: zi_get_geometry(year = 2022, style = "zcta3",
state = "MO", method = "intersect")
.
Source
U.S. Census Bureau's TIGER/Line database
Examples
utils::str(zi_mo_zcta3)
utils::head(zi_mo_zcta3)
Convert HUD Crosswalk Data to Finalized Crosswalk
Description
The output from zi_load_crosswalk()
for HUD data requires
additional processing to be used in the zi_crosswalk()
function.
This function prepares the HUD data for use in joins.
Usage
zi_prep_hud(.data, by, return_max = TRUE)
Arguments
.data |
The output from |
by |
Character scalar; the column name to use for identifying the best
match for a given ZIP Code. This could be either |
return_max |
Logical scalar; if |
Value
A tibble that has been further prepared for use as a crosswalk.
Examples
# load sample crosswalk data
mo_xwalk <- zi_mo_hud
# the above data can be replicated with the following code:
# zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1,
# target = "COUNTY", query = "MO")
# prep crosswalk
# when a ZIP Code crosses county boundaries, the portion with the largest
# number of residential addresses will be returned
zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE)
Repair ZIP Code or ZCTA Vector
Description
This function repairs two of the four conditions identified
in the validation checks with zi_validate()
. For the other two
conditions, values are conveted NA
. See Details below for the
specific changes made.
Usage
zi_repair(x, style = "zcta5")
Arguments
x |
A vector containing ZIP or ZCTA values to be repaired. |
style |
A character scalar - either |
Details
The zi_repair()
function addresses four conditions:
If the input vector is numeric, it will be converted to character data.
If there are values less than five characters (if
style = "zcta5"
, the default), or three characters (ifstyle = "zcta3"
), they will be padded with leading zeros.If there are input values over five characters (if
style = "zcta5"
, the default), or three characters (ifstyle = "zcta3"
), they will be converted toNA
.If there are input values that have non-numeric characters, they will be converted to
NA
.
Since two of the four steps will result in NA
values, it is strongly
recommended to attempt to manually fix these issues first.
Value
A repaired vector of ZIP or ZCTA values.
Examples
# sample five-digit ZIPs with character
zips <- c("63088", "63108", "zip")
# failed validation
zi_validate(zips)
# repair
zips <- zi_repair(zips)
# successful validation
zi_validate(zips)
Validate ZIP Code or ZCTA Vector
Description
This function validates vectors of ZIP Code or ZCTA values. It
is used internally throughout zippeR
for data validation, but
is exported to facilitate troubleshooting.
Usage
zi_validate(x, style = "zcta5", verbose = FALSE)
Arguments
x |
A vector containing ZIP or ZCTA values to be validated. |
style |
A character scalar - either |
verbose |
A logical scalar; if |
Details
The zi_validate()
function checks for four conditions:
Is the input vector character data? This is important because of USPS's use of leading zeros in ZIP codes and ZCTAs.
Are all values five characters (if
style = "zcta5"
, the default), or three characters (ifstyle = "zcta3"
)?Are any input values over five characters (if
style = "zcta5"
, the default), or three characters (ifstyle = "zcta3"
)?Do any input values have non-numeric characters?
The questions provide a basis for repairing issues identified with
zi_repair()
.
Value
Either a logical value (if verbose = FALSE
) or a tibble
containing validation criteria and results.
Examples
# sample five-digit ZIPs
zips <- c("63088", "63108", "63139")
# successful validation
zi_validate(zips)
# sample five-digit ZIPs in data frame
zips <- data.frame(id = c(1:3), ZIP = c("63139", "63108", "00501"), stringsAsFactors = FALSE)
# successful validation
zi_validate(zips$ZIP)
# sample five-digit ZIPs with character
zips <- c("63088", "63108", "zip")
# failed validation
zi_validate(zips)
zi_validate(zips, verbose = TRUE)