% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/duplicate-detect.R
\name{duplicate_detect}
\alias{duplicate_detect}
\title{Detect duplicate values}
\usage{
duplicate_detect(x, ignore = NULL, colname_end = "dup", numeric_only)
}
\arguments{
\item{x}{Vector or data frame.}

\item{ignore}{Optionally, a vector of values that should not be checked. In
the test result columns, they will be marked \code{NA}.}

\item{colname_end}{String. Name ending of the Boolean test result columns.
Default is \code{"dup"}.}

\item{numeric_only}{[\link{Deprecated}] No longer used: All values are coerced to
character.}
}
\value{
A tibble (data frame). It has all the columns from \code{x}, and to each
of these columns' right, the corresponding test result column.

The tibble has the \code{scr_dup_detect} class, which is recognized by the
\code{audit()} generic.
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#superseded}{\figure{lifecycle-superseded.svg}{options: alt='[Superseded]'}}}{\strong{[Superseded]}}

\code{duplicate_detect()} is superseded because it's less informative than
\code{duplicate_tally()} and \code{duplicate_count()}. Use these functions instead.

For every value in a vector or data frame, \code{duplicate_detect()} tests
whether there is at least one identical value. Test results are presented
next to every value.

This function is a blunt tool designed for initial data checking. Don't put
too much weight on its results.

For summary statistics, call \code{audit()} on the results.
}
\details{
This function is not very informative with many input values that
only have a few characters each. Many of them may have duplicates just by
chance. For example, in R's built-in \code{iris} data set, 99\% of values have
duplicates.

In general, the fewer values and the more characters per value, the more
significant the results.
}
\section{Summaries with \code{audit()}}{
 There is an S3 method for the \code{audit()}
generic, so you can call \code{audit()} following \code{duplicate_detect()}. It
returns a tibble with these columns ---
\itemize{
\item \code{term}: The original data frame's variables.
\item \code{dup_count}: Number of "duplicated" values of that \code{term} variable: those
which have at least one duplicate anywhere in the data frame.
\item \code{total}: Number of all non-\code{NA} values of that \code{term} variable.
\item \code{dup_rate}: Rate of "duplicated" values of that \code{term} variable.
}

The final row, \code{.total}, summarizes across all other rows: It adds up the
\code{dup_count} and \code{total_count} columns, and calculates the mean of the
\code{dup_rate} column.
}

\examples{
# Find duplicate values in a data frame...
duplicate_detect(x = pigs4)

# ...or in a single vector:
duplicate_detect(x = pigs4$snout)

# Summary statistics with `audit()`:
pigs4 \%>\%
  duplicate_detect() \%>\%
  audit()

# Any values can be ignored:
pigs4 \%>\%
  duplicate_detect(ignore = c(8.131, 7.574))
}
\seealso{
\itemize{
\item \code{duplicate_tally()} to count instances of a value instead of just stating
whether it is duplicated.
\item \code{duplicate_count()} for a frequency table.
\item \code{duplicate_count_colpair()} to check each combination of columns for
duplicates.
\item \code{janitor::get_dupes()} to search for duplicate rows.
}
}
