% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/VarSelLCM.R
\docType{package}
\name{VarSelLCM-package}
\alias{VarSelLCM-package}
\alias{VarSelLCM}
\title{Variable Selection for Model-Based Clustering of Mixed-Type Data Set with Missing Values}
\description{
Model-based clustering with variable selection and estimation of the number of clusters. Data to analyze can be continuous, categorical, integer or mixed. Moreover, missing values can occur and do not necessitate any pre-processing. Shiny application permits an easy interpretation of the results.
}
\details{
\tabular{ll}{
  Package: \tab VarSelLCM\cr 
  Type: \tab Package\cr 
  Version: \tab 2.1.0\cr
  Date: \tab 2018-03-14\cr 
  License: \tab GPL-3\cr  
  LazyLoad: \tab yes\cr
  URL:  \tab http://varsellcm.r-forge.r-project.org/\cr
}

The main function to use is \link{VarSelCluster}. Function \link{VarSelCluster} carries out the model selection (according to AIC, BIC or MICL) and maximum likelihood estimation.

Function \link{VarSelShiny} runs a shiny application which permits an easy interpretation of the clustering results.

Function \link{VarSelImputation} permits the imputation of missing values by using the model parameters.

Tool methods \link{summary}, \link{print} and \link{plot} are also available for facilitating the interpretation.
}
\examples{
\dontrun{
# Package loading
require(VarSelLCM)

# Data loading:
# x contains the observed variables
# z the known statu (i.e. 1: absence and 2: presence of heart disease)
data(heart)
z <- heart[,"Class"]
x <- heart[,-13]

# Cluster analysis without variable selection
res_without <- VarSelCluster(x, 2, vbleSelec = FALSE)

# Cluster analysis with variable selection (with parallelisation)
res_with <- VarSelCluster(x, 2, nbcores = 2, initModel=40)

# Confusion matrices and ARI: variable selection decreases the misclassification error rate
print(table(z, res_without@partitions@zMAP))
print(table(z, res_with@partitions@zMAP))
ARI(z, res_without@partitions@zMAP)
ARI(z, res_with@partitions@zMAP)

# Summary of the best model
summary(res_with)

# Parameters of the best model
print(res_with)

# Opening Shiny application to easily see the results
VarSelShiny(res_with)

# Discriminative power of the variables (here, the most discriminative variable is MaxHeartRate)
plot(out, type="bar")
# Boxplot for continuous (or interger) variable
plot(out, y="MaxHeartRate", type="boxplot")

# Empirical and theoretical distributions (to check that clustering is pertinent)
plot(out, y="MaxHeartRate", type="cdf")

# Summary of categorical variable
plot(out, y="Sex")

# Summary of the probabilities of missclassification
plot(out, type="probs-class")

# Imputation by posterior mean for the first observation
not.imputed <- heart[1,-13]
imputed <- VarSelImputation(out)[1,]
rbind(not.imputed, imputed)

}

}
\references{
Marbac, M. and Sedki, M. (2017). Variable selection for model-based clustering using the integrated completed-data likelihood. Statistics and Computing, 27 (4), 1049-1063.

Marbac, M. and Patin, E. and Sedki, M. (2018). Variable selection for mixed data clustering: Application in human population genomics. Arxiv 1703.02293.
}
\author{
Matthieu Marbac and Mohammed Sedki Maintainer: Mohammed Sedki <mohammed.sedki@u-psud.fr>
}
\keyword{package}
