\name{seqCompare}
\alias{seqCompare}
\alias{seqLRT}
\alias{seqBIC}

\title{BIC and Likelihood ratio test for comparing two sequence data}

\description{The function \code{seqCompare} computes the likelihood ratio test (LRT) and Bayesian Information Criterion (BIC) for comparing two groups within each of a series of set. The functions \code{seqBIC} and \code{seqLRT} are aliases that return only the BIC or the LRT.
}

\usage{
seqCompare(seqdata, seqdata2=NULL, group=NULL, set=NULL,
    s=100, seed=36963, stat="all", squared="LRTonly",
    weighted=TRUE, opt=NULL, BFopt=NULL, method, ...)

seqLRT(seqdata, seqdata2=NULL, group=NULL, set=NULL, s=100,
    seed=36963, squared="LRTonly", weighted=TRUE, opt=NULL,
    BFopt=NULL, method, ...)

seqBIC(seqdata, seqdata2=NULL, group=NULL, set=NULL, s=100,
    seed=36963, squared="LRTonly", weighted=TRUE, opt=NULL,
    BFopt=NULL, method, ...)
}

\arguments{
\item{seqdata}{Either a state sequence object (\code{stslist} created with \code{\link[TraMineR]{seqdef}}) or a list of state sequence objects, e.g., \code{list(cohort1.seq,cohort2.seq,cohort3.seq)}.
  %\code{list()} function. E.g., list(cohort1.seq,cohort2.seq,cohort3.seq) where *.seq is
  %generated by \code{\link[TraMineR]{seqdef}}.
}

\item{seqdata2}{Either a state sequence object (\code{stslist} or a list of state sequence objects. Must be \code{NULL} when \code{group} is not \code{NULL}. If not \code{NULL}, must be of same type than \code{seqdata}. See details.
}
  \item{group}{Vector of length equal to number of sequences in \code{seqdata}. A dichotomous grouping variable. See details.
  }
  \item{set}{Vector of length equal to number of sequences in \code{seqdata}. Variable defining the sets. See details.
  }

\item{s}{Integer. Default 100. The size of random samples of sequences. When 0, no sampling is done.
}

\item{seed}{Integer. Default 36963. Using the same seed number guarantees the same results
  each time.
}
  \item{stat}{String. The requested statistics. One of \code{"LRT"}, \code{"BIC"}, or \code{"all"}
  }
  \item{squared}{Logical. Should squared distances be used? Can also be \code{"LRTonly"}, in which case the distances to the centers are computed using non-squared distances and LRT is computed with squared distances.
  }
  \item{weighted}{Logical or String. Should weights be taken into account when available? Can also be \code{"by.group"}, in which case weights are used and normalized to respect group sizes.
  }
  \item{opt}{Integer or \code{NULL}. Either 1 or 2. Computation option. When 1, the distance matrix is computed successively for each pair of samples of size s. When 2, the distances are computed only once for each pair of sets of observed sequences and the distances for the samples are extracted from that matrix. When \code{NULL} (default), 1 is chosen when the sum of sizes of the two groups is larger than 2*s and 2 otherwise.
  }
  \item{BFopt}{Integer or \code{NULL}. Either 1 or 2. Applies only when BIC is computed on multiple samples. When 1 the displayed Bayes Factor (BF) is the averaged BF. When 2, the displayed BF is obtained from the averaged BIC. When \code{NULL} both BFs are displayed.
  }
\item{method}{String. Method for computing sequence distances. See documentation for \code{\link[TraMineR]{seqdist}}. Additional arguments may be required depending on the method chosen.
}
\item{...}{Additional arguments passed to \code{\link[TraMineR]{seqdist}}.
}

}

\details{
The \code{group} and \code{set} arguments can only be used when \code{seqdata} is an \code{stslist} object (a state sequence object).

When \code{seqdata} and \code{seqdata2} are both provided, the LRT and BIC statistics are computed for comparing these two sets. In that case both \code{group} and \code{set} should be left at their default \code{NULL} value.

When \code{seqdata} is a list of \code{stslist} objects, \code{seqdata2} must be a list of the same number of \code{stslist} objects.

The default option \code{squared="LRTonly"} corresponds to the initial proposition of Liao and Fasang (2020). With that option, the distances to the virtual center are obtained from the pairwise non-squared dissimilarities and the resulting distances to the virtual center are  squared when computing the LRT (which is in turn used to compute the BIC). With \code{squared=FALSE}, non-squared distances are used in both cases, and with \code{squared=TRUE}, squared distances are used in both cases.

The computation is based on the pairwise distances between the sequences. The \code{opt} argument permits to chose between two strategies. With \code{opt=1}, the matrix of distances is computed successively for each pair of samples of size s. When \code{opt=2}, the matrix of distances is computed once for the observed sequences and the distances for the samples are extracted from that matrix. Option 2 is often more efficient, especially for distances based on spells. It may be slower for methods such as OM or LCS when the number of observed sequences becomes large.
}

\value{The function \code{seqLRT} (and seqCompare) with the default \code{"LRT"} stat value) outputs two variables, \var{LRT} and \var{p.LRT}.

  \item{LRT}{This is the likelihood ratio test statistic for comparing the two groups.
  }
  \item{p.LRT}{This is the upper tail probability associated with the LRT.
  }
The function \code{seqBIC} (and \code{seqLRT} with the \code{"BIC"} stat value) outputs two variables, \var{BIC} and \var{BF}.

  \item{BIC}{This is the difference between two BICs for comparing the two groups.
  }
  \item{BF}{This is the Bayes factor associated with the BIC difference.
  }

\code{seqCompare} with \code{stat="all"} outputs all four indicators.
}

\examples{
## biofam data set
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
                "Child", "Left+Child", "Left+Marr+Child", "Divorced")
alph <- seqstatl(biofam[10:25])
## To illustrate, we use only a sample of 150 cases
set.seed(10)
biofam <- biofam[sample(nrow(biofam),150),]
biofam.seq <- seqdef(biofam, 10:25, alphabet=alph, labels=biofam.lab)

## Defining the grouping variable
lang <- as.vector(biofam[["plingu02"]])
lang[is.na(lang)] <- "unknown"
lang <- factor(lang)

## Chronogram by language group
seqdplot(biofam.seq, group=lang)

## Extracting the sequence subsets by language
lev <- levels(lang)
l <- length(lev)
seq.list <- list()
for (i in 1:l){
  seq.list[[i]] <- biofam.seq[lang==lev[i],]
}

seqCompare(list(seq.list[[1]]),list(seq.list[[2]]), stat="all", method="OM", sm="CONSTANT")
seqBIC(biofam.seq, group=biofam$sex, method="HAM")
seqLRT(biofam.seq, group=biofam$sex, set=lang, s=80, method="HAM")

}

\references{
Tim F. Liao & Anette E. Fasang. Forthcoming. "Comparing Groups of Life Course Sequences Using the Bayesian Information Criterion and the Likelihood Ratio Test.” \emph{Sociological Methodology} \doi{10.1177/0081175020959401}.
}
