Type: | Package |
Title: | Coarsened Mixtures of Hierarchical Skew Kernels |
Version: | 1.0.0 |
Description: | Bayesian fit of a Dirichlet Process Mixture with hierarchical multivariate skew normal kernels and coarsened posteriors. For more information, see Gorsky, Chan and Ma (2020) <doi:10.48550/arXiv.2001.06451>. |
License: | CC0 |
Encoding: | UTF-8 |
Imports: | Rcpp (≥ 0.12.18), dplyr, ggplot2, stringr, coda, tidyr, rlang, methods |
Suggests: | sn, R.rsp |
LinkingTo: | Rcpp, RcppArmadillo, RcppEigen, RcppNumerical |
RoxygenNote: | 7.2.1 |
VignetteBuilder: | R.rsp |
NeedsCompilation: | yes |
Packaged: | 2022-11-23 15:51:17 UTC; hhh |
Author: | S. Gorsky [aut, cre], C. Chan [ctb], L. Ma [ctb] |
Maintainer: | S. Gorsky <sgorsky@umass.edu> |
Repository: | CRAN |
Date/Publication: | 2022-11-23 16:20:02 UTC |
The function computes (and by default plots) estimates of the autocovariance or autocorrelation function for the different parameters of the model. This is a wrapper for coda::acf.
Description
The function computes (and by default plots) estimates of the autocovariance or autocorrelation function for the different parameters of the model. This is a wrapper for coda::acf.
Usage
acfParams(
res,
params = c("w", "xi", "xi0", "psi", "G", "E", "eta"),
only_non_trivial_clusters = TRUE,
lag.max = NULL,
type = c("correlation", "covariance", "partial"),
plot = TRUE,
...
)
Arguments
res |
An object of class |
params |
A character vector naming the parameters to compute and plot the autocorrelation plots for. |
only_non_trivial_clusters |
Logical, if |
lag.max |
maximum lag at which to calculate the autocorrelation. See more details at ?acf. |
type |
Character string giving the type of autocorrelation to be computed. See more details at ?acf. |
plot |
Logical. If |
... |
Other arguments passed to |
Value
An acfParamsCOMIX
object which is a named list,
with a named element for each requested parameter. Each element is
an object of class acf
(from the coda
package).
#' @examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data: p <- 3
# Scale and skew parameters for first cluster: Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p) alpha1 <- rep(0, p) alpha1[1] <- -5 # location parameter for first cluster in first sample: xi11 <- rep(0, p) # location parameter for first cluster in second sample (aligned with first): xi21 <- rep(0, p)
# Scale and skew parameters for second cluster: Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p) alpha2 <- rep(0, p) alpha2[2] <- 5 # location parameter for second cluster in first sample: xi12 <- rep(3, p) # location parameter for second cluster in second sample (misaligned with first): xi22 <- rep(4, p)
# Sample data: set.seed(1) Y <- rbind( sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1), sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2), sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1), sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2) )
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10) pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation # Fit the model: res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues: res_relab <- relabelChain(res) effssz <- effectiveSampleSize(res_relab, "w") # Or: tidy_chain <- tidyChain(res_relab, "w") acf_w <- acfParams(tidy_chain, "w")
# (see vignette for a more detailed example)
This function aligns multiple samples so that their location parameters are equal.
Description
This function aligns multiple samples so that their location parameters are equal.
Usage
calibrate(x, reference.group = NULL)
Arguments
x |
An object of class COMIX. |
reference.group |
An integer between 1 and the number of groups in the data
( |
Value
A named list of 3:
-
Y_cal
: anrow(x$data$Y)
\times
ncol(x$data$Y)
matrix, a calibrated version of the original data. -
calibration_distribution
: anx$pmc$nsave
\times
ncol(x$data$Y)
\times
nrow(x$data$Y)
array storing the difference between the estimated sample-specific location parameter and the group location parameter for each saved step of the chain. -
calibration_median
: anrow(x$data$Y)
\times
ncol(x$data$Y)
matrix storing the median difference between the estimated sample-specific location parameter and the group location parameter for each saved step of the chain. This matrix is equal to the difference between the uncalibrated data (x$data$Y
) and the calibrated data (Y_cal
).
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
# Generate calibrated data:
cal <- calibrateNoDist(res_relab)
# Compare raw and calibrated data: (see plot in vignette)
# par(mfrow=c(1, 2))
# plot(Y, col = C, xlim = range(Y[,1]), ylim = range(Y[,2]) )
# Get posterior estimates for the model parameters:
res_summary <- summarizeChain(res_relab)
# Check for instance, the cluster assignment labels:
table(res_summary$t)
# Indeed the same as
colSums(njk)
# Or examine the skewness parameter for the non-trivial clusters:
res_summary$alpha[ , unique(res_summary$t)]
# And compare those to
cbind(alpha1, alpha2)
# (see vignette for a more detailed example)
This function aligns multiple samples so that their location parameters are equal.
Description
This function aligns multiple samples so that their location parameters are equal.
Usage
calibrateNoDist(x, reference.group = NULL)
Arguments
x |
An object of class COMIX. |
reference.group |
An integer between 1 and the number of groups in the data
( |
Value
A named list of 2:
-
Y_cal
: anrow(x$data$Y)
\times
ncol(x$data$Y)
matrix, a calibrated version of the original data. -
calibration_median
: anrow(x$data$Y)
\times
ncol(x$data$Y)
matrix storing the median difference between the estimated sample-specific location parameter and the group location parameter for each saved step of the chain. This matrix is equal to the difference between the uncalibrated data (x$data$Y
) and the calibrated data (Y_cal
).
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
# Generate calibrated data:
cal <- calibrateNoDist(res_relab)
# Compare raw and calibrated data: (see plot in vignette)
# par(mfrow=c(1, 2))
# plot(Y, col = C, xlim = range(Y[,1]), ylim = range(Y[,2]) )
# Get posterior estimates for the model parameters:
res_summary <- summarizeChain(res_relab)
# Check for instance, the cluster assignment labels:
table(res_summary$t)
# Indeed the same as
colSums(njk)
# Or examine the skewness parameter for the non-trivial clusters:
res_summary$alpha[ , unique(res_summary$t)]
# And compare those to
cbind(alpha1, alpha2)
# (see vignette for a more detailed example)
This function generates a sample from the posterior of COMIX.
Description
This function generates a sample from the posterior of COMIX.
Usage
comix(Y, C, prior = NULL, pmc = NULL, state = NULL, ncores = 2)
Arguments
Y |
Matrix of the data. Each row represents an observation. |
C |
Vector of the group label of each observation. Labels must be integers starting from 1. |
prior |
A list giving the prior information. If unspecified, a default prior is used. The list includes the following parameters:
|
pmc |
A list giving the Population Monte Carlo (PMC) parameters:
|
state |
A list giving the initial cluster labels:
|
ncores |
The number of CPU cores to utilize in parallel. Defaults to 2. |
Value
An object of class COMIX, a list of 4:
chain
, a named list:
-
t
: annsave
\times
nrow(Y)
matrix with estimated cluster labels for each saved step of the chain and each observation in the dataY
. -
z
: ansave
\times
nrow(Y)
matrix with estimated values of the latentz_{i,j}
variable for the parameterization of the multivariate skew normal distribution used in the sampler for each saved step of the chain and each observation in the dataY
. -
W
: anlength(unique(C))
\times
K
\times
-
nsave
: array storing the estimated sample- and cluster-specific weights for each saved step of the chain. -
xi
: anlength(unique(C))
\times
(ncol(Y) x K)
\times
nsave
array storing the estimated sample- and cluster-specific multivariate skew normal location parameters of the kernel for each saved step of the chain. -
xi0
: anncol(Y)
\times
K
\times
-
nsave
: array storing the estimated cluster-specific group location parameters for each saved step of the chain. -
psi
: anncol(Y)
\times
K
\times
nsave
array storing the estimated cluster-specific skew parameters of the kernels in the parameterization of the multivariate skew normal distribution used in the sampler for each saved step of the chain. -
G
: anncol(Y)
\times
(ncol(Y) x K)
\times
nsave
array storing the estimated cluster-specific multivariate skew normal scale matrix (in row format) of the kernel used in the sampler for each saved step of the chain. -
E
: anncol(Y)
\times
(ncol(Y) x K)
\times
nsave
array storing the estimated covariance matrix (in row format) of the multivariate normal distribution from which the sample- and cluster-specific location parameters are drawn for each saved step of the chain. -
eta
: ansave
\times
1
matrix storing the estimated Dirichlet Process Mixture concentration parameter for each saved step of the chain. -
Sigma
: anncol(Y)
\times
(ncol(Y) x K)
\times
nsave
array storing the estimated cluster-specific multivariate skew normal scale matrix (in row format) of the kernel for each saved step of the chain. -
alpha
: anncol(Y)
\times
K
\times
nsave
array storing the estimated cluster-specific skew parameters of the kernel's multivariate skew normal distribution for each saved step of the chain.
-
data
, a named list that includes the matrix of the dataY
andC
the vector of the group label of each observation. -
prior
andpmc
, the lists, as above, that were provided as inputs to the function.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
# Generate calibrated data:
cal <- calibrateNoDist(res_relab)
# Compare raw and calibrated data: (see plot in vignette)
# par(mfrow=c(1, 2))
# plot(Y, col = C, xlim = range(Y[,1]), ylim = range(Y[,2]) )
# Get posterior estimates for the model parameters:
res_summary <- summarizeChain(res_relab)
# Check for instance, the cluster assignment labels:
table(res_summary$t)
# Indeed the same as
colSums(njk)
# Or examine the skewness parameter for the non-trivial clusters:
res_summary$alpha[ , unique(res_summary$t)]
# And compare those to
cbind(alpha1, alpha2)
# (see vignette for a more detailed example)
This function creates an object that summarizes the effective sample size for the parameters of the model.
Description
This function creates an object that summarizes the effective sample size for the parameters of the model.
Usage
effectiveSampleSize(res, params = c("w", "xi", "xi0", "psi", "G", "E", "eta"))
Arguments
res |
An object of class |
params |
A character vector naming the parameters to compute the effective sample size for. |
Value
An effectiveSampleSizeCOMIX
object which is a named list,
with a named element for each requested parameter. Each element is a data
frame that includes the effective sample size for the parameter.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
effssz <- effectiveSampleSize(res_relab, "w")
# Or:
tidy_chain <- tidyChain(res_relab, "w")
effssz <- effectiveSampleSize(tidy_chain, "w")
# (see vignette for a more detailed example)
This function creates an object that summarizes the Geweke convergence diagnostic.
Description
This function creates an object that summarizes the Geweke convergence diagnostic.
Usage
gewekeParams(
res,
params = c("w", "xi", "xi0", "psi", "G", "E", "eta"),
frac1 = 0.1,
frac2 = 0.5,
probs = c(0.025, 0.975)
)
Arguments
res |
An object of class |
params |
A character vector naming the parameters to compute the Geweke diagnostic for. |
frac1 |
Double, fraction to use from beginning of chain. |
frac2 |
Double, fraction to use from end of chain. |
probs |
A vector of 2 doubles, probabilities denoting the limits of a confidence interval for the convergence diagnostic. |
Value
An gewekeParamsCOMIX
object which is a named list,
with a named element for each requested parameter. Each element is a data
frame that includes the Geweke diagnostic and result of a stationarity test
for the parameter.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
effssz <- effectiveSampleSize(res_relab, "w")
# Or:
tidy_chain <- tidyChain(res_relab, "w")
gwk <- gewekeParams(tidy_chain, "w")
# (see vignette for a more detailed example)
This function creates an object that summarizes the Heidelberg-Welch convergence diagnostic.
Description
This function creates an object that summarizes the Heidelberg-Welch convergence diagnostic.
Usage
heidelParams(
res,
params = c("w", "xi", "xi0", "psi", "G", "E", "eta"),
eps = 0.1,
pvalue = 0.05
)
Arguments
res |
An object of class |
params |
A character vector naming the parameters to compute the Heidelberg-Welch diagnostic for. |
eps |
Target value for ratio of halfwidth to sample mean. |
pvalue |
Significance level to use. |
Value
An heidelParamsCOMIX
object which is a named list,
with a named element for each requested parameter. Each element is a data
frame that includes the Heidelberg-Welch diagnostic and results of a
stationarity test for the parameter.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
effssz <- effectiveSampleSize(res_relab, "w")
# Or:
tidy_chain <- tidyChain(res_relab, "w")
hd <- heidelParams(tidy_chain, "w")
# (see vignette for a more detailed example)
This function creates plots for the effective sample size for the parameters of the model.
Description
This function creates plots for the effective sample size for the parameters of the model.
Usage
plotEffectiveSampleSize(effssz, param)
Arguments
effssz |
An object of class |
param |
Character, naming the parameter to create a plot of effective sample sizes. |
Value
A ggplot2
plot containing the effective sample size plot.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
effssz <- effectiveSampleSize(res_relab, "w")
# Or:
tidy_chain <- tidyChain(res_relab, "w")
effssz <- effectiveSampleSize(tidy_chain, "w")
plotEffectiveSampleSize(effssz, "w")
# (see vignette for a more detailed example)
This function creates plots for the Geweke diagnostic and results of test of stationarity for the parameters of the model.
Description
This function creates plots for the Geweke diagnostic and results of test of stationarity for the parameters of the model.
Usage
plotGewekeParams(gwk, param)
Arguments
gwk |
An object of class |
param |
Character, naming the parameter to create a plot of the Geweke diagnostic for. |
Value
A ggplot2
plot containing the Geweke diagnostic plot.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
effssz <- effectiveSampleSize(res_relab, "w")
# Or:
tidy_chain <- tidyChain(res_relab, "w")
gwk <- gewekeParams(tidy_chain, "w")
plotGewekeParams(gwk, "w")
# (see vignette for a more detailed example)
This function creates plots for the Heidelberg-Welch diagnostic and results of test of stationarity for the parameters of the model.
Description
This function creates plots for the Heidelberg-Welch diagnostic and results of test of stationarity for the parameters of the model.
Usage
plotHeidelParams(hd, param)
Arguments
hd |
An object of class |
param |
Character, naming the parameter to create a plot of the Heidelberg-Welch diagnostic for. |
Value
A ggplot2
plot containing the Heidelberg-Welch diagnostic plot.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
effssz <- effectiveSampleSize(res_relab, "w")
# Or:
tidy_chain <- tidyChain(res_relab, "w")
hd <- heidelParams(tidy_chain, "w")
plotHeidelParams(hd, "w")
# (see vignette for a more detailed example)
This function creates trace plots for different parameters of the MCMC chain.
Description
This function creates trace plots for different parameters of the MCMC chain.
Usage
plotTracePlots(res, param)
Arguments
res |
An object of class |
param |
Character, naming the parameter to create a trace plot for. |
Value
A ggplot2
plot containing the trace plot.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
plotTracePlots(res_relab, "w")
# Or:
tidy_chain <- tidyChain(res_relab, "w")
plotTracePlots(tidy_chain, "w")
# (see vignette for a more detailed example)
This function relabels the chain to avoid label switching issues.
Description
This function relabels the chain to avoid label switching issues.
Usage
relabelChain(res)
Arguments
res |
An object of class COMIX. |
Value
An object of class COMIX where res$chain$t
is replaced with the
new labels.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
# Generate calibrated data:
cal <- calibrateNoDist(res_relab)
# Compare raw and calibrated data: (see plot in vignette)
# par(mfrow=c(1, 2))
# plot(Y, col = C, xlim = range(Y[,1]), ylim = range(Y[,2]) )
# Get posterior estimates for the model parameters:
res_summary <- summarizeChain(res_relab)
# Check for instance, the cluster assignment labels:
table(res_summary$t)
# Indeed the same as
colSums(njk)
# Or examine the skewness parameter for the non-trivial clusters:
res_summary$alpha[ , unique(res_summary$t)]
# And compare those to
cbind(alpha1, alpha2)
# (see vignette for a more detailed example)
This function provides post-hoc estimates of the model parameters.
Description
This function provides post-hoc estimates of the model parameters.
Usage
summarizeChain(res)
Arguments
res |
An object of class COMIX. |
Value
A named list:
-
xi0
: ancol(res$data$Y)
\times
res$prior$K
matrix storing the posterior mean of the group location parameter. -
psi
: ancol(res$data$Y)
\times
res$prior$K
matrix storing the posterior mean of the multivariate skew normal kernels skewness parameter (in the parameterization used in the sampler). -
alpha
: ancol(res$data$Y)
\times
res$prior$K
matrix storing the posterior mean of the multivariate skew normal kernels skewness parameter. -
W
: alength(unique(res$data$C))
\times
res$prior$K
matrix storing the posterior mean of the mixture weights for each sample and cluster. -
xi
: anlength(unique(res$data$C))
\times
ncol(res$data$Y)
\times
res$prior$K
array storing the the posterior mean of the multivariate skew normal kernels location parameter for each sample and cluster. -
Sigma
: anncol(res$data$Y)
\times
ncol(res$data$Y)
\times
res$prior$K
array storing the the posterior mean of the scaling matrix of the multivariate skew normal kernels for each cluster. -
G
: anncol(res$data$Y)
\times
ncol(res$data$Y)
\times
res$prior$K
array storing the the posterior mean of the scaling matrix of the multivariate skew normal kernels for each cluster (in the parameterization used in the sampler). -
E
: anncol(res$data$Y)
\times
ncol(res$data$Y)
\times
res$prior$K
array storing the the posterior mean of the covariance matrix of the multivariate normal distributions for each cluster form which the sample specific location parameters are drawn. -
meanvec
: anlength(unique(res$data$C))
\times
ncol(res$data$Y)
\times
res$prior$K
array storing the the posterior mean of the multivariate skew normal kernels mean parameter for each sample and cluster. -
meanvec0
: ancol(res$data$Y)
\times
res$prior$K
matrix storing the posterior mean of the group mean parameter. -
t
: Vector of lengthnrow(x$data$Y)
. Each element is the mode of the posterior distribution of cluster labels. -
eta
: scalar, the mean of the posterior distribution of the estimated Dirichlet Process Mixture concentration parameter.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
# Generate calibrated data:
cal <- calibrateNoDist(res_relab)
# Compare raw and calibrated data: (see plot in vignette)
# par(mfrow=c(1, 2))
# plot(Y, col = C, xlim = range(Y[,1]), ylim = range(Y[,2]) )
# Get posterior estimates for the model parameters:
res_summary <- summarizeChain(res_relab)
# Check for instance, the cluster assignment labels:
table(res_summary$t)
# Indeed the same as
colSums(njk)
# Or examine the skewness parameter for the non-trivial clusters:
res_summary$alpha[ , unique(res_summary$t)]
# And compare those to
cbind(alpha1, alpha2)
# (see vignette for a more detailed example)
This function creates tidy versions of the stored chain. This object can then be used as input for the other diagnostic functions in this package.
Description
This function creates tidy versions of the stored chain. This object can then be used as input for the other diagnostic functions in this package.
Usage
tidyChain(
res,
params = c("t", "w", "xi", "xi0", "psi", "G", "E", "eta", "Sigma", "alpha")
)
Arguments
res |
An object of class COMIX. |
params |
A character vector naming the parameters to tidy. |
Value
A tidyChainCOMIX
object: a named list of class whose length is the length
of params
. Each element of the list contains a tibble with a tidy version of the samples
from the MCMC chain.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
tidy_chain <- tidyChain(res_relab)
# (see vignette for a more detailed example)
Convert between parameterizations of the multivariate skew normal distribution.
Description
Convert between parameterizations of the multivariate skew normal distribution.
Usage
transform_params(Sigma, alpha)
Arguments
Sigma |
A scale matrix. |
alpha |
A vector for the skew parameter. |
Value
A list:
-
delta
: a reparameterized skewness vector, a transformed version ofalpha
. -
omega
: a diagonal matrix of the same dimensions asSigma
, the diagonal elements are the square roots of the diagonal elements ofSigma
. -
psi
: another reparameterized skewness vector, utilized in the sampler. -
G
: a reparameterized version ofSigma
, utilized in the sampler.
Examples
library(COMIX)
# Scale and skew parameters:
Sigma <- matrix(0.5, nrow = 4, ncol = 4) + diag(0.5, nrow = 4)
alpha <- c(0, 0, 0, 5)
transformed_parameters <- transform_params(Sigma, alpha)