% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bayesian.R
\name{inferGenotypeBayesian}
\alias{inferGenotypeBayesian}
\title{Infer a subject-specific genotype using a Bayesian approach}
\usage{
inferGenotypeBayesian(data, germline_db = NA, novel = NA,
  v_call = "V_CALL", find_unmutated = TRUE, priors = c(0.6, 0.4, 0.4,
  0.35, 0.25, 0.25, 0.25, 0.25, 0.25))
}
\arguments{
\item{data}{a \code{data.frame} containing V allele
calls from a single subject. If \code{find_unmutated} 
is \code{TRUE}, then the sample IMGT-gapped V(D)J sequence 
should be provided in a column \code{"SEQUENCE_IMGT"}}

\item{germline_db}{named vector of sequences containing the
germline sequences named in \code{allele_calls}. 
Only required if \code{find_unmutated} is \code{TRUE}.}

\item{novel}{an optional \code{data.frame} of the type
novel returned by \link{findNovelAlleles} containing
germline sequences that will be utilized if
\code{find_unmutated} is \code{TRUE}. See Details.}

\item{v_call}{column in \code{data} with V allele calls.
Default is \code{"V_CALL"}.}

\item{find_unmutated}{if \code{TRUE}, use \code{germline_db} to
find which samples are unmutated. Not needed
if \code{allele_calls} only represent
unmutated samples.}

\item{priors}{a numeric vector of priors for the multinomial distribution. 
The \code{priors} vector must be nine values that defined
the priors for the heterozygous (two allele), 
trizygous (three allele), and quadrozygous (four allele) 
distributions. The first two values of \code{priors} define 
the prior for the heterozygous case, the next three values are for
the trizygous case, and the final four values are for the 
quadrozygous case. Each set of priors should sum to one. 
Note, each distribution prior is actually defined internally 
by set of four numbers, with the unspecified final values 
assigned to \code{0}; e.g., the heterozygous case is 
\code{c(priors[1], priors[2], 0, 0)}. The prior for the 
homozygous distribution is fixed at \code{c(1, 0, 0, 0)}.}
}
\value{
A \code{data.frame} of alleles denoting the genotype of the subject with the log10
of the likelihood of each model and the log10 of the Bayes factor. The output 
contains the following columns:

\itemize{
  \item \code{GENE}: The gene name without allele.
  \item \code{ALLELES}: Comma separated list of alleles for the given \code{GENE}.
  \item \code{COUNTS}: Comma separated list of observed sequences for each 
        corresponding allele in the \code{ALLELES} list.
  \item \code{TOTAL}: The total count of observed sequences for the given \code{GENE}.
  \item \code{NOTE}: Any comments on the inferrence.
  \item \code{KH}: log10 likelihood that the \code{GENE} is homozygous.
  \item \code{KD}: log10 likelihood that the \code{GENE} is heterozygous.
  \item \code{KT}: log10 likelihood that the \code{GENE} is trizygous
  \item \code{KQ}: log10 likelihood that the \code{GENE} is quadrozygous.
  \item \code{K_DIFF}: log10 ratio of the highest to second-highest zygosity likelihoods.
}
}
\description{
\code{inferGenotypeBayesian} infers an subject's genotype by applying a Bayesian framework 
with a Dirichlet prior for the multinomial distribution. Up to four distinct alleles are 
allowed in an individual’s genotype. Four likelihood distributions were generated by 
empirically fitting three high coverage genotypes from three individuals 
(Laserson and Vigneault et al, 2014). A posterior probability is calculated for the 
four most common alleles. The certainty of the highest probability model was 
calculated using a Bayes factor (the most likely model divided by second-most likely model). 
The larger the Bayes factor (K), the greater the certainty in the model.
}
\details{
Allele calls representing cases where multiple alleles have been
assigned to a single sample sequence are rare among unmutated
sequences but may result if nucleotides for certain positions are
not available. Calls containing multiple alleles are treated as
belonging to all groups. If \code{novel} is provided, all
sequences that are assigned to the same starting allele as any
novel germline allele will have the novel germline allele appended
to their assignent prior to searching for unmutated sequences.
}
\note{
This method works best with data derived from blood, where a large
portion of sequences are expected to be unmutated. Ideally, there
should be hundreds of allele calls per gene in the input.
}
\examples{
# Infer IGHV genotype, using only unmutated sequences, including novel alleles
inferGenotypeBayesian(SampleDb, germline_db=GermlineIGHV, novel=SampleNovel, 
                      find_unmutated=TRUE)

}
\references{
\enumerate{
  \item  Laserson U and Vigneault F, et al. High-resolution antibody dynamics of 
         vaccine-induced immune responses. PNAS. 2014 111(13):4928-33.
}
}
\seealso{
\link{plotGenotype} for a colorful visualization and
         \link{genotypeFasta} to convert the genotype to nucleotide sequences.
         See \link{inferGenotype} to infer a subject-specific genotype using 
         a frequency method
}
