% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/FateID_functions.R
\name{fateBias}
\alias{fateBias}
\title{Computation of fate bias}
\usage{
fateBias(x, y, tar, z = NULL, minnr = 5, minnrh = 10, nbfactor = 5,
  use.dist = FALSE, seed = NULL, nbtree = NULL, ...)
}
\arguments{
\item{x}{expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis.}

\item{y}{clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of x.}

\item{tar}{vector of integers representing target cluster numbers. Each element of \code{tar} corresponds to a cluster of cells committed towards a particular mature state. One cluster per different cell lineage has to be given and is used as a starting point for learning the differentiation trajectory.}

\item{z}{Matrix containing cell-to-cell distances to be used in the fate bias computation. Default is \code{NULL}. In this case, a correlation-based distance is computed from \code{x} by \code{1 - cor(x)}}

\item{minnr}{integer number of cells per target cluster to be selected for classification (test set) in each round of training. For each target cluster, the \code{minnr} cells with the highest similarity to a cell in the training set are selected for classification. If \code{z} is not \code{NULL} it is used as the similarity matrix for this step. Otherwise, \code{1-cor(x)} is used. Default value is 5.}

\item{minnrh}{integer number of cells from the training set used for classification. From each training set, the \code{minnrh} cells with the highest similarity to the training set are selected. If \code{z} is not \code{NULL} it is used as the similarity matrix for this step. Default value is 10.}

\item{nbfactor}{positive integer number. Determines the number of trees grown for each random forest. The number of trees is given by the number of columns of th training set multiplied by \code{nbfactor}. Default value is 5.}

\item{use.dist}{logical value. If \code{TRUE} then the distance matrix is used as feature matrix (i. e. \code{z} if not equal to \code{NULL} and \code{1-cor(x)} otherwise). If \code{FALSE}, gene expression values in \code{x} are used. Default is \code{FALSE}.}

\item{seed}{integer seed for initialization. If equal to \code{NULL} then each run will yield slightly different results due to the radomness of the random forest algorithm. Default is \code{NULL}}

\item{nbtree}{integer value. If given, it specifies the number of trees for each random forest explicitely. Default is \code{NULL}.}

\item{...}{additional arguments to be passed to the low level function \code{randomForest}.}
}
\value{
A list with the following three components:
  \item{probs}{a data frame with the fraction of random forest votes for each cell. Columns represent the target clusters. Column names are given by a concatenation of \code{t} and target cluster number.}
  \item{votes}{a data frame with the number of random forest votes for each cell. Columns represent the target clusters. Column names are given by a concatenation of \code{t} and target cluster number.}
  \item{tr}{list of vectors. Each component contains the IDs of all cells on the trajectory to a given target cluster. Component names are given by a concatenation of \code{t} and target cluster number.}
  \item{rfl}{list of randomForest objects for each iteration of the classification.}
  \item{trall}{vector of cell ids ordered by the random forest iteration in which they have been classified into one of the target clusters.}
}
\description{
This function computes fate biases for single cells based on expression data from a single cell sequencing experiment. It requires a clustering partition and a target cluster representing a commited state for each trajectory.
}
\details{
The bias is computed as the ratio of the number of random forest votes for a trajectory and the number of votes for the trajectory with the second largest number of votes. By this means only the trajectory with the largest number of votes will receive a bias >1. The siginifcance is computed based on counting statistics on the difference in the number of votes. A significant bias requires a p-value < 0.05. Cells are assigned to a trajectory if they exhibit a significant bias >1 for this trajectory.
}
\examples{
x <- intestine$x
y <- intestine$y
tar <- c(6,9,13)
fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL)
head(fb$probs)
}
