% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/score-permutations.R
\name{score_cp}
\alias{score_cp}
\title{Predict Class Probability (Sample Membership)}
\usage{
score_cp(x_train, x_test, n_trees = 500L, response_name = "label")
}
\arguments{
\item{x_train}{Training (reference) sample.}

\item{x_test}{Test sample.}

\item{n_trees}{The number of trees in random forest.}

\item{response_name}{The column name of the categorical outcome to predict.}
}
\value{
A named list or object of class \code{outlier.test} containing:
\itemize{
   \item \code{train}: vector of scores in training set
   \item \code{test}: vector of scores in test set
}
}
\description{
Predict class probability using random forest with the \pkg{ranger}
package. The prefix \emph{cp} stands for class probability, which reflects
sample membership between training and test set. This function is useful to
test for dataset shift via classifier performance to mimic tests of equal
distribution.
}
\details{
\code{score_cp} fits a classifier to discriminate between training and test
sets. It uses out-of-bag predictions, namely class probabilities, to
estimate sample memberships. As a result, estimating p-value via permutations
does not require refitting the algorithm for every permutation.
}
\section{Notes}{

Kim et al. (2022) describes how a classifier can serve as a proxy for
two-sample comparison. As in Hediger et al. (2022), we use random forest as
the underlying classifier. The probability of belonging to the test set, as
as opposed to the training set, is the outlier score. That is, the binary
classifier assigns training and test set to different classes.
}

\examples{
\donttest{
library(dsos)
set.seed(12345)
data(iris)
setosa <- iris[1:50, 1:4] # Training sample: Species == 'setosa'
versicolor <- iris[51:100, 1:4] # Test sample: Species == 'versicolor'
outlier_scores <- score_cp(setosa, versicolor, response_name = "label")
str(outlier_scores)
}

}
\references{
Hediger, S., Michel, L., & Näf, J. (2022).
\emph{On the use of random forest for two-sample testing}.
Computational Statistics & Data Analysis, 170, 107435.

Kim, I., Ramdas, A., Singh, A., & Wasserman, L. (2021).
\emph{Classification accuracy as a proxy for two-sample testing}.
The Annals of Statistics, 49(1), 411-434.
}
\seealso{
Other scoring: 
\code{\link{score_od}()},
\code{\link{score_rd}()},
\code{\link{score_rue}()}
}
\concept{scoring}
