% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/EAdet.R
\name{EAdet}
\alias{EAdet}
\title{Epidemic Algorithm for detection of multivariate outliers in incomplete survey data}
\usage{
EAdet(data, weights, reach = "max", transmission.function = "root",
  power = ncol(data), distance.type = "euclidean", maxl = 5,
  plotting = TRUE, monitor = FALSE, prob.quantile = 0.9,
  random.start = FALSE, fix.start, threshold = FALSE,
  deterministic = TRUE, rm.missobs = FALSE, verbose = FALSE)
}
\arguments{
\item{data}{a data frame or matrix with data.}

\item{weights}{a vector of positive sampling weights.}

\item{reach}{if \code{reach = "max"} the maximal nearest neighbor distance is
used as the basis for the transmission function, otherwise the weighted
\eqn{(1 - (p + 1) / n)} quantile of the nearest neighbor distances is used.}

\item{transmission.function}{form of the transmission function of distance d:
\code{"step"} is a heaviside function which jumps to \code{1} at \code{d0},
\code{"linear"} is linear between \code{0} and \code{d0}, \code{"power"} is
\code{(beta*d+1)^(-p)} for \code{p = ncol(data)} as default, \code{"root"} is the
function \code{1-(1-d/d0)^(1/maxl)}.}

\item{power}{sets \code{p = power}.}

\item{distance.type}{distance type in function \code{dist()}.}

\item{maxl}{maximum number of steps without infection.}

\item{plotting}{if \code{TRUE}, the cdf of infection times is plotted.}

\item{monitor}{if \code{TRUE}, verbose output on epidemic.}

\item{prob.quantile}{if mads fail, take this quantile absolute deviation.}

\item{random.start}{if \code{TRUE}, take a starting point at random instead of the
spatial median.}

\item{fix.start}{force epidemic to start at a specific observation.}

\item{threshold}{infect all remaining points with infection probability above
the threshold \code{1-0.5^(1/maxl)}.}

\item{deterministic}{if \code{TRUE}, the number of infections is the expected
number and the infected observations are the ones with largest infection probabilities.}

\item{rm.missobs}{set \code{rm.missobs=TRUE} if completely missing observations
should be discarded. This has to be done actively as a safeguard to avoid mismatches
when imputing.}

\item{verbose}{more output with \code{verbose=TRUE}.}
}
\value{
\code{EAdet} returns a list whose first component \code{output} is a sub-list
with the following components:
\describe{
  \item{\code{sample.size}}{Number of observations}
  \item{\code{discarded.observations}}{Indices of discarded observations}
  \item{\code{missing.observations}}{Indices of completely missing observations}
  \item{\code{number.of.variables}}{Number of variables}
  \item{\code{n.complete.records}}{Number of records without missing values}
  \item{\code{n.usable.records}}{Number of records with less than half of values
  missing (unusable observations are discarded)}
  \item{\code{medians}}{Component wise medians}
  \item{\code{mads}}{Component wise mads}
  \item{\code{prob.quantile}}{Use this quantile if mads fail, i.e. if one of the mads is 0}
  \item{\code{quantile.deviations}}{Quantile of absolute deviations}
  \item{\code{start}}{Starting observation}
  \item{\code{transmission.function}}{Input parameter}
  \item{\code{power}}{Input parameter}
  \item{\code{maxl}}{Maximum number of steps without infection}
  \item{\code{min.nn.dist}}{Maximal nearest neighbor distance}
  \item{\code{transmission.distance}}{\code{d0}}
  \item{\code{threshold}}{Input parameter}
  \item{\code{distance.type}}{Input parameter}
  \item{\code{deterministic}}{Input parameter}
  \item{\code{number.infected}}{Number of infected observations}
  \item{\code{cutpoint}}{Cutpoint of infection times for outlier definition}
  \item{\code{number.outliers}}{Number of outliers}
  \item{\code{outliers}}{Indices of outliers}
  \item{\code{duration}}{Duration of epidemic}
  \item{\code{computation.time}}{Elapsed computation time}
  \item{\code{initialisation.computation.time}}{Elapsed computation time for
  standardisation and calculation of distance matrix}
}
The further components returned by \code{EAdet} are:
\describe{
  \item{\code{infected}}{Indicator of infection}
  \item{\code{infection.time}}{Time of infection}
  \item{\code{outind}}{Indicator of outliers}
}
}
\description{
In \code{EAdet} an epidemic is started at a center of the data. The epidemic
spreads out and infects neighbouring points (probabilistically or deterministically).
The last points infected are outliers. After running \code{EAdet} an imputation
with \code{EAimp} may be run.
}
\details{
The form and parameters of the transmission function should be chosen such that the
infection times have at least a range of 10. The default cutting point to decide on
outliers is the median infection time plus three times the mad of infection times.
A better cutpoint may be chosen by visual inspection of the cdf of infection times.
\code{EAdet} calls the function \code{EA.dist}, which passes the counterprobabilities
of infection (a \eqn{n * (n - 1) / 2} size vector!) and three parameters (sample
spatial median index, maximal distance to nearest neighbor and transmission distance =
reach) as arguments to \code{EAdet}. The distances vector may be too large to be passed
as arguments. Then either the memory size must be increased. Former versions of the
code used a global variable to store the distances in order to save memory.
}
\examples{
data(bushfirem, bushfire.weights)
det.res <- EAdet(bushfirem, bushfire.weights)
print(det.res$output)
}
\references{
Béguin, C. and Hulliger, B. (2004) Multivariate outlier detection in
incomplete survey data: the epidemic algorithm and transformed rank correlations,
JRSS-A, 167, Part 2, pp. 275-294.
}
\seealso{
\code{\link{EAimp}} for imputation with the Epidemic Algorithm.
}
\author{
Beat Hulliger
}
