% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Functions.R
\name{MoE_stepwise}
\alias{MoE_stepwise}
\title{Stepwise model/variable selection for MoEClust models}
\usage{
MoE_stepwise(data,
             network.data = NULL,
             gating = NULL,
             expert = NULL,
             modelNames = NULL,
             noise = FALSE,
             criterion = c("bic", "icl", "aic"),
             equalPro = c("both", "yes", "no"),
             noise.gate = c("both", "yes", "no"),
             verbose = interactive(),
             ...)
}
\arguments{
\item{data}{A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.}

\item{network.data}{An optional matrix or data frame in which to look for the covariates specified in the \code{gating} &/or \code{expert} networks, if any. Must include column names. Columns in \code{network.data} corresponding to columns in \code{data} will be automatically removed. While a single covariate can be supplied as a vector (provided the '\code{$}' operator is not used), it is safer to supply a named 1-column matrix or data frame in this instance.}

\item{gating}{A vector giving the names of columns in \code{network.data} used to define the scope of the gating network. The initial model will contain no covariates, thereafter all variables in \code{gating} will be considered for inclusion where appropriate.

If \code{gating} is not supplied, \emph{all} variables in \code{network.data} will be considered for the gating network. \code{gating} can also be supplied as \code{NA}, in which case \emph{no} gating network covariates will ever be considered. Supplying \code{gating} and \code{expert} can be used to ensure subsets of different covariates enter different parts of the model.}

\item{expert}{A vector giving the names of columns in \code{network.data} used to define the scope of the expert network. The initial model will contain no covariates, thereafter all variables in \code{expert} will be considered for inclusion where appropriate.

If \code{expert} is not supplied, \emph{all} variables in \code{network.data} will be considered for the expert network. \code{expert} can also be supplied as \code{NA}, in which case \emph{no} expert network covariates will ever be considered. Supplying \code{expert} and \code{gating} can be used to ensure different covariates enter different parts of the model.}

\item{modelNames}{A character string or valid model names, to be used to restrict the size of the search space, if desired. By default, \emph{all} valid model types are explored. Rather than considered the changing of the model type as an additional step, every step is evaluated over all entries in \code{modelNames}. See \code{\link{MoE_clust}} for more details.}

\item{noise}{A logical indicating whether to assume all models contain an additional noise component (\code{TRUE}) or not (\code{FALSE}, the default). When \code{TRUE}, the search starts from a \code{G=0} noise-only model, otherwise the search starts from a \code{G=1} model with no covariates. See \code{\link{MoE_control}} for more details.}

\item{criterion}{The model selection criterion used to determine the optimal action at each step. Defaults to \code{"bic"}.}

\item{equalPro}{A character string indicating whether models with equal mixing proportions should be considered. \code{"both"} (the default) means models with both equal and unequal mixing proportions will be considered, \code{"yes"} means only models with equal mixing proportions will be considered, and \code{"no"} means only models with unequal mixing proportions will be considered.

Considering \code{"both"} equal and unequal mixing proportion models increases the search space and the computational burden, but this argument becomes irrelevant after a model, if any, with gating network covariate(s) is considered optimal for a given step. See \code{\link{MoE_control}} for more details.}

\item{noise.gate}{A character string indicating whether models where the gating network for the noise component depends on covariates are considered. \code{"yes"} means only models where this is the case will be considered, \code{"no"} means only models for which the noise component's mixing proportion is constant will be considered and \code{"both"} (the default) means both of these scenarios will be considered.

Considering \code{"both"} increases the search space and the computational burden, but this argument is only relevant when \code{noise=TRUE} and \code{gating} covariates are being considered. See \code{\link{MoE_control}} for more details.}

\item{verbose}{Logical indicating whether to print messages pertaining to progress to the screen during fitting. By default is \code{TRUE} if the session is interactive, and \code{FALSE} otherwise. If \code{FALSE}, warnings and error messages will still be printed to the screen, but everything else will be suppressed.}

\item{...}{Additional arguments to \code{\link{MoE_control}}. Note that these arguments will be supplied to \emph{all} candidate models for every step.}
}
\value{
An object of class \code{"MoECompare"} containing information on all visited models and the optimal model (accessible via \code{x$optimal}).
}
\description{
Conducts a greedy forward stepwise search to identify the optimal \code{MoEClust} model according to some \code{criterion}. Components and/or \code{gating} covariates and/or \code{expert} covariates are added to new \code{\link{MoE_clust}} fits at each step, while each step is evaluated for all valid \code{modelNames}.
}
\details{
The arguments \code{modelNames}, \code{equalPro}, and \code{noise.gate} are provided for computational convenience. They can be used to reduce the number of models under consideration at each stage. 

The same is true of the arguments \code{gating} and \code{expert}, which can each separately be made to consider all variables in \code{network.data}, or a subset, or none at all. 

Without any prior information, it is best to accept the defaults at the expense of a longer run-time.
}
\note{
It is advised to run this function once with \code{noise=FALSE} and once with \code{noise=TRUE} and then choose the optimal model across both sets of results.
}
\examples{
\donttest{# data(CO2data)
# Search over all models where the single covariate can enter either network
# (mod1 <- MoE_stepwise(CO2data$CO2, CO2data[,"GNP", drop=FALSE]))
#
# data(ais)
# Only look for EVE & EEE models with at most one expert network covariate
# Do not consider any gating covariates
# (mod2 <- MoE_stepwise(ais[,3:7], ais, gating=NA, expert="sex", modelNames=c("EVE", "EEE")))
#
# Look for models with a noise component, unequal mixing proportions,
# and only consider models with a constant mixing proportion for the noise component
# (mod3 <- MoE_stepwise(ais[,3:7], ais, noise=TRUE,  gating=c("SSF", "Ht"), expert="sex", 
#                       equalPro="no", noise.gate="no", modelNames="EEE"))
#
# Compare both sets of results (with & without a noise component) for the ais data
# (comp <- MoE_compare(mod2, mod3, optimal.only=TRUE))
# comp$optimal}
}
\references{
K. Murphy and T. B. Murphy (2019). Gaussian parsimonious clustering models with covariates and a noise component. \emph{Advances in Data Analysis and Classification}, 1-33. <\href{https://doi.org/10.1007/s11634-019-00373-8}{doi:10.1007/s11634-019-00373-8}>.
}
\seealso{
\code{\link{MoE_clust}}, \code{\link{MoE_compare}}, \code{\link{MoE_control}}
}
\author{
Keefe Murphy - <\email{keefe.murphy@ucd.ie}>
}
\keyword{clustering}
\keyword{main}
