% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sim_StudySeqFunctions.R
\name{sim_RVstudy}
\alias{sim_RVstudy}
\title{Simulate sequence data for a sample of pedigrees}
\usage{
sim_RVstudy(ped_files, haplos, SNV_map, affected_only = TRUE,
  remove_wild = TRUE, pos_in_bp = TRUE, gamma_params = c(2.63, 2.63/0.5),
  burn_in = 1000)
}
\arguments{
\item{ped_files}{Data frame. A data frame of pedigrees for which to simulate sequence data, see details.}

\item{haplos}{sparseMatrix. A sparse matrix of haplotype data, which contains the haplotypes for unrelated individuals representing the founder population.  Rows are assumed to be haplotypes, while columns represent SNVs.  If the \code{\link{read_slim}} function was used to import SLiM data to \code{R}, users may supply the sparse matrix \code{Haplotypes} returned by \code{read_slim}.}

\item{SNV_map}{Data frame. A data frame that catalogs the SNVs in \code{haplos}.  If the \code{\link{read_slim}} function was used to import SLiM data to \code{R}, the data frame \code{Mutations} is of the proper format for \code{SNV_map}.  However, users must add the variable \code{is_CRV} to this data frame, see details.}

\item{affected_only}{Logical. When \code{affected_only = TRUE}, we only simulate SNV data for the disease-affected individuals and the family members that connect them along a line of descent.  When \code{affected_only = FALSE}, SNV data is simulated for the entire study. By default, \code{affected_only = TRUE}.}

\item{remove_wild}{Logical.  When \code{remove_wild = TRUE} the data is reduced by removing SNVs which are not observed in any of the study participants; otherwise if \code{remove_wild = FALSE} no data reduction occurs.  By default, \code{remove_wild = TRUE}.}

\item{pos_in_bp}{Logical. This argument indicates if the positions in \code{SNV_map} are listed in base pairs.  By default, \code{pos_in_bp = TRUE}. If the positions in \code{SNV_map} are listed in centiMorgan please set \code{pos_in_bp = FALSE} instead.}

\item{gamma_params}{Numeric list of length 2. The respective shape and rate parameters of the gamma distribution used to simulate distance between chiasmata.  By default, \code{gamma_params = c(2.63, 2*2.63)}, as discussed in Voorrips and Maliepaard (2012).}

\item{burn_in}{Numeric. The "burn-in" distance in centiMorgan, as defined by Voorrips and Maliepaard (2012), which is required before simulating the location of the first chiasmata with interference. By default, \code{burn_in = 1000}.
The burn in distance in cM. By default, \code{burn_in = 1000}.}
}
\value{
A object of class \code{famStudy}.  Objects of class \code{famStudy} are lists that include the following named items:

\item{\code{ped_files}}{A data frame containing the sample of pedigrees for which sequence data was simulated.}

\item{\code{ped_haplos}}{A sparse matrix that contains the simulated haplotypes for each pedigree member in \code{ped_files}.}

\item{\code{haplo_map}}{A data frame that maps the haplotypes (i.e. rows) in \code{ped_haplos} to the individuals in \code{ped_files}.}

\item{\code{SNV_map}}{A data frame cataloging the SNVs in \code{ped_haplos}.}

Objects of class \code{famStudy} are discussed in detail in section 5.2 of the vignette.
}
\description{
Simulate single-nucleotide variant (SNV) data for a sample of pedigrees.
}
\details{
The \code{sim_RVstudy} function is used to simulate single-nucleotide variant (SNV) data for a sample of pedigrees.  Please note: this function is NOT appropriate for users who wish to simulate genotype conditional on phenotype.  Instead, \code{sim_RVstudy} employs the following algorithm.

\enumerate{
\item For each pedigree, we sample a single \strong{causal rare variant (cRV)} from a pool of SNVs specified by the user.
\item Upon identifying the familial cRV we sample founder haplotypes from haplotype data conditional on the founder's cRV status at the familial cRV locus.
\item Proceeding forward in time, from founders to more recent generations, for each parent/offspring pair we:
\enumerate{
\item simulate recombination and formation of gametes, according to the model proposed by Voorrips and Maliepaard (2012), and then
\item perform a conditional gene drop to model inheritance of the cRV.
}}

It is important to note that due to the forwards-in-time algorithm used by \code{sim_RVstudy}, \strong{certain types of inbreeding and/or loops cannot be accommodated}. Please see examples.

For a detailed description of the model employed by \code{sim_RVstudy}, please refer to section 6 of the vignette.

The data frame of pedigrees, \code{ped_files}, supplied to \code{sim_RVstudy} must contain the variables:
\tabular{lll}{
\strong{name} \tab \strong{type} \tab \strong{description} \cr
\code{FamID} \tab numeric \tab family identification number\cr
\code{ID} \tab numeric \tab individual identification number\cr
\code{sex} \tab numeric \tab sex identification variable: \code{sex = 0} for males, and \code{sex = 1} females. \cr
\code{dadID} \tab numeric \tab identification number of father \cr
\code{momID} \tab numeric \tab identification number of mother \cr
\code{affected} \tab logical \tab disease-affection status: \code{affected = TRUE} if individual has developed disease, and \code{FALSE} otherwise. \cr
\code{DA1} \tab numeric \tab paternally inherited allele at the cRV locus: \code{DA1 = 1} if the cRV is inherited, and \code{0} otherwise. \cr
\code{DA2} \tab numeric \tab maternally inherited allele at the cRV locus: \code{DA2 = 1} if the cRV is inherited, and \code{0} otherwise.\cr
}

If \code{ped_files} does not contain the variables \code{DA1} and \code{DA2} the pedigrees are assumed to be fully sporadic.  Hence, the supplied pedigrees will not segregate any of the SNVs in the user-specified pool of cRVs.

Pedigrees simulated by the \code{\link{sim_RVped}} and \code{\link{sim_ped}} functions of the \code{SimRVPedigree} package are properly formatted for the \code{sim_RVstudy} function.  That is, the pedigrees generated by these functions contain all of the variables required for \code{ped_files} (including \code{DA1} and \code{DA2}).

The data frame \code{SNV_map} catalogs the SNVs in \code{haplos}. The variables in \code{SNV_map} must be formatted as follows:
\tabular{lll}{
\strong{name} \tab \strong{type} \tab \strong{description} \cr
\code{colID} \tab numeric \tab associates the rows in \code{SNV_map} to the columns of \code{haplos}\cr
\code{chrom} \tab numeric \tab the chromosome that the SNV resides on\cr
\code{position} \tab numeric \tab is the position of the SNV in base pairs when \code{pos_in_bp = TRUE} or centiMorgan when \code{pos_in_bp = FALSE}\cr
\code{marker} \tab character \tab (Optional) a unique character identifier for the SNV. If missing this variable will be created from \code{chrom} and \code{position}. \cr
\code{pathwaySNV} \tab logical \tab (Optional) identifies SNVs located within the pathway of interest as \code{TRUE} \cr
\code{is_CRV} \tab logical \tab  identifies causal rare variants (cRVs) as \code{TRUE}.  Note familial cRVs are sampled, with replacement from the SNVs for which \code{is_crv = TRUE}. \cr
}

Please note that when the variable \code{is_CRV} is missing from \code{SNV_map}, we sample a single SNV to be the causal rare variant for all pedigrees in the study, which is identified in the returned \code{famStudy} object.
}
\examples{
library(SimRVSequences)

#load pedigree, haplotype, and mutation data
data(study_peds)
data(EXmuts)
data(EXhaps)

# create variable 'is_CRV' in EXmuts.  This variable identifies the pool of
# causal rare variants  from which to sample familial cRVs.
EXmuts$is_CRV = FALSE
EXmuts$is_CRV[c(26, 139, 223, 228, 472)] = TRUE

#supply required inputs to the sim_RVstudy function
seqDat = sim_RVstudy(ped_files = study_peds,
                     SNV_map = EXmuts,
                     haplos = EXhaps)


# Inbreeding examples
# Due to the forward-in-time model used by sim_RVstudy certain types of
# inbreeding and/or loops *may* cause fatal errors when using sim_RVstudy.
# The following examples demonstrate: (1) imbreeding that can be accommodated
# under this model, and (2) when this limitation is problematic.

# Create inbreeding in family 1 of study_peds
imb_ped1 <- study_peds[study_peds$FamID == 3, ]
imb_ped1[imb_ped1$ID == 18, c("momID")] = 7
plot(imb_ped1)

# Notice that this instance of inbreeding can be accommodated by our model.
seqDat = sim_RVstudy(ped_files = imb_ped1,
                     SNV_map = EXmuts,
                     haplos = EXhaps)

# Create different type of inbreeding in family 1 of study_peds
imb_ped2 <- study_peds[study_peds$FamID == 3, ]
imb_ped2[imb_ped1$ID == 8, c("momID")] = 18
plot(imb_ped2)

# Notice that inbreeding in imb_ped2 will cause a fatal
# error when the sim_RVstudy function is executed
\dontrun{
seqDat = sim_RVstudy(ped_files = imb_ped2,
                     SNV_map = EXmuts,
                     haplos = EXhaps)
}

}
\references{
Roeland E. Voorrips and Chris A Maliepaard. (2012). \emph{The simulation of meiosis in diploid and tetraploid organisms using various genetic models}. BMC Bioinformatics, 13:248.
}
\seealso{
\code{\link{sim_RVped}}, \code{\link{read_slim}}, \code{\link{summary.famStudy}}
}
