% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/gather_draws.R, R/spread_draws.R
\name{gather_draws}
\alias{gather_draws}
\alias{spread_draws}
\title{Extract draws of variables in a Bayesian model fit into a tidy data format}
\usage{
gather_draws(
  model,
  ...,
  regex = FALSE,
  sep = "[, ]",
  ndraws = NULL,
  seed = NULL,
  n
)

spread_draws(
  model,
  ...,
  regex = FALSE,
  sep = "[, ]",
  ndraws = NULL,
  seed = NULL,
  n
)
}
\arguments{
\item{model}{A supported Bayesian model fit. Tidybayes supports a variety of model objects;
for a full list of supported models, see \link{tidybayes-models}.}

\item{...}{Expressions in the form of
\code{variable_name[dimension_1, dimension_2, ...] | wide_dimension}. See \emph{Details}.}

\item{regex}{If \code{TRUE}, variable names are treated as regular expressions and all column matching the
regular expression and number of dimensions are included in the output. Default \code{FALSE}.}

\item{sep}{Separator used to separate dimensions in variable names, as a regular expression.}

\item{ndraws}{The number of draws to return, or \code{NULL} to return all draws.}

\item{seed}{A seed to use when subsampling draws (i.e. when \code{ndraws} is not \code{NULL}).}

\item{n}{(Deprecated). Use \code{ndraws}.}
}
\value{
A data frame.
}
\description{
Extract draws from a Bayesian model for one or more variables (possibly with named
dimensions) into one of two types of long-format data frames.
}
\details{
Imagine a JAGS or Stan fit named \code{model}. The model may contain a variable named
\code{b[i,v]} (in the JAGS or Stan language) with dimension \code{i} in \code{1:100} and
dimension \code{v} in \code{1:3}. However, the default format for draws returned from
JAGS or Stan in R will not reflect this indexing structure, instead
they will have multiple columns with names like \code{"b[1,1]"}, \code{"b[2,1]"}, etc.

\code{spread_draws} and \code{gather_draws} provide a straightforward
syntax to translate these columns back into properly-indexed variables in two different
tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.

\code{spread_draws} and \code{gather_draws} return data frames already grouped by
all dimensions used on the variables you specify.

The difference between \code{spread_draws} is that names of variables in the model will
be spread across the data frame as column names, whereas \code{gather_draws} will
gather variables into a single column named \code{".variable"} and place values of variables into a
column named \code{".value"}. To use naming schemes from other packages (such as \code{broom}), consider passing
results through functions like \code{\link[=to_broom_names]{to_broom_names()}} or \code{\link[=to_ggmcmc_names]{to_ggmcmc_names()}}.

For example, \code{spread_draws(model, a[i], b[i,v])} might return a grouped
data frame (grouped by \code{i} and \code{v}), with:
\itemize{
\item column \code{".chain"}: the chain number. \code{NA} if not applicable to the model
type; this is typically only applicable to MCMC algorithms.
\item column \code{".iteration"}: the iteration number. Guaranteed to be unique within-chain only.
\code{NA} if not applicable to the model type; this is typically only applicable to MCMC algorithms.
\item column \code{".draw"}: a unique number for each draw from the posterior. Order is not
guaranteed to be meaningful.
\item column \code{"i"}: value in \code{1:5}
\item column \code{"v"}: value in \code{1:10}
\item column \code{"a"}: value of \code{"a[i]"} for draw \code{".draw"}
\item column \code{"b"}: value of \code{"b[i,v]"} for draw \code{".draw"}
}

\code{gather_draws(model, a[i], b[i,v])} on the same model would return a grouped
data frame (grouped by \code{i} and \code{v}), with:
\itemize{
\item column \code{".chain"}: the chain number
\item column \code{".iteration"}: the iteration number
\item column \code{".draw"}: the draw number
\item column \code{"i"}: value in \code{1:5}
\item column \code{"v"}: value in \code{1:10}, or \code{NA}
if \code{".variable"} is \code{"a"}.
\item column \code{".variable"}: value in \code{c("a", "b")}.
\item column \code{".value"}: value of \code{"a[i]"} (when \code{".variable"} is \code{"a"})
or \code{"b[i,v]"} (when \code{".variable"} is \code{"b"}) for draw \code{".draw"}
}

\code{spread_draws} and \code{gather_draws} can use type information
applied to the \code{model} object by \code{\link[=recover_types]{recover_types()}} to convert columns
back into their original types. This is particularly helpful if some of the dimensions in
your model were originally factors. For example, if the \code{v} dimension
in the original data frame \code{data} was a factor with levels \code{c("a","b","c")},
then we could use \code{recover_types} before \code{spread_draws}:

\preformatted{model \%>\%
 recover_types(data) %\>\%
 spread_draws(model, b[i,v])
}

Which would return the same data frame as above, except the \code{"v"} column
would be a value in \code{c("a","b","c")} instead of \code{1:3}.

For variables that do not share the same subscripts (or share
some but not all subscripts), we can supply their specifications separately.
For example, if we have a variable \code{d[i]} with the same \code{i} subscript
as \code{b[i,v]}, and a variable \code{x} with no subscripts, we could do this:

\preformatted{spread_draws(model, x, d[i], b[i,v])}

Which is roughly equivalent to this:

\preformatted{spread_draws(model, x) \%>\%
 inner_join(spread_draws(model, d[i])) \%>\%
 inner_join(spread_draws(model, b[i,v])) \%>\%
 group_by(i,v)
}

Similarly, this:

\preformatted{gather_draws(model, x, d[i], b[i,v])}

Is roughly equivalent to this:

\preformatted{bind_rows(
 gather_draws(model, x),
 gather_draws(model, d[i]),
 gather_draws(model, b[i,v])
)}

The \code{c} and \code{cbind} functions can be used to combine multiple variable names that have
the same dimensions. For example, if we have several variables with the same
subscripts \code{i} and \code{v}, we could do either of these:

\preformatted{spread_draws(model, c(w, x, y, z)[i,v])}
\preformatted{spread_draws(model, cbind(w, x, y, z)[i,v])  # equivalent}

Each of which is roughly equivalent to this:

\preformatted{spread_draws(model, w[i,v], x[i,v], y[i,v], z[i,v])}

Besides being more compact, the \code{c()}-style syntax is currently also
faster (though that may change).

Dimensions can be omitted from the resulting data frame by leaving their names
blank; e.g. \code{spread_draws(model, b[,v])} will omit the first dimension of
\code{b} from the output. This is useful if a dimension is known to contain all
the same value in a given model.

The shorthand \code{..} can be used to specify one column that should be put
into a wide format and whose names will be the base variable name, plus a dot
("."), plus the value of the dimension at \code{..}. For example:

\code{spread_draws(model, b[i,..])} would return a grouped data frame
(grouped by \code{i}), with:
\itemize{
\item column \code{".chain"}: the chain number
\item column \code{".iteration"}: the iteration number
\item column \code{".draw"}: the draw number
\item column \code{"i"}: value in \code{1:20}
\item column \code{"b.1"}: value of \code{"b[i,1]"} for draw \code{".draw"}
\item column \code{"b.2"}: value of \code{"b[i,2]"} for draw \code{".draw"}
\item column \code{"b.3"}: value of \code{"b[i,3]"} for draw \code{".draw"}
}

An optional clause in the form \verb{| wide_dimension} can also be used to put
the data frame into a wide format based on \code{wide_dimension}. For example, this:

\preformatted{spread_draws(model, b[i,v] | v)}

is roughly equivalent to this:

\preformatted{spread_draws(model, b[i,v]) \%>\% spread(v,b)}

The main difference between using the \code{|} syntax instead of the
\code{..} syntax is that the \code{|} syntax respects prototypes applied to
dimensions with \code{\link[=recover_types]{recover_types()}}, and thus can be used to get
columns with nicer names. For example:\preformatted{model \%>\% recover_types(data) \%>\% spread_draws(b[i,v] | v)
}

would return a grouped data frame
(grouped by \code{i}), with:
\itemize{
\item column \code{".chain"}: the chain number
\item column \code{".iteration"}: the iteration number
\item column \code{".draw"}: the draw number
\item column \code{"i"}: value in \code{1:20}
\item column \code{"a"}: value of \code{"b[i,1]"} for draw \code{".draw"}
\item column \code{"b"}: value of \code{"b[i,2]"} for draw \code{".draw"}
\item column \code{"c"}: value of \code{"b[i,3]"} for draw \code{".draw"}
}

The shorthand \code{.} can be used to specify columns that should be nested
into vectors, matrices, or n-dimensional arrays (depending on how many dimensions
are specified with \code{.}).

For example, \code{spread_draws(model, a[.], b[.,.])} might return a
data frame, with:
\itemize{
\item column \code{".chain"}: the chain number.
\item column \code{".iteration"}: the iteration number.
\item column \code{".draw"}: a unique number for each draw from the posterior.
\item column \code{"a"}: a list column of vectors.
\item column \code{"b"}: a list column of matrices.
}

Ragged arrays are turned into non-ragged arrays with
missing entries given the value \code{NA}.

Finally, variable names can be regular expressions by setting \code{regex = TRUE}; e.g.:

\preformatted{spread_draws(model, `b_.*`[i], regex = TRUE)}

Would return a tidy data frame with variables starting with \code{b_} and having one dimension.
}
\examples{

library(dplyr)
library(ggplot2)

data(RankCorr, package = "ggdist")

RankCorr \%>\%
  spread_draws(b[i, j])

RankCorr \%>\%
  spread_draws(b[i, j], tau[i], u_tau[i])


RankCorr \%>\%
  gather_draws(b[i, j], tau[i], u_tau[i])

RankCorr \%>\%
  gather_draws(tau[i], typical_r) \%>\%
  median_qi()

}
\seealso{
\code{\link[=spread_rvars]{spread_rvars()}}, \code{\link[=recover_types]{recover_types()}}, \code{\link[=compose_data]{compose_data()}}.
}
\author{
Matthew Kay
}
\keyword{manip}
