\name{h2o.randomForest}
\alias{h2o.randomForest}
\alias{h2o.randomForest.VA}
\alias{h2o.randomForest.FV}
\title{
H2O: Random Forest
}
\description{
Performs random forest classification on a data set.
}
\usage{
## Default method:
h2o.randomForest(x, y, data, classification = TRUE, ntree = 50, depth = 20, 
  sample.rate = 2/3, classwt = NULL, nbins = 100, seed = -1, importance = FALSE, 
  validation, nodesize = 1, balance.classes = FALSE, max.after.balance.size = 5,
  use_non_local = TRUE, version = 2)

## Import to a ValueArray object:
h2o.randomForest.VA(x, y, data, ntree = 50, depth = 20, sample.rate = 2/3, 
  classwt = NULL, nbins = 100, seed = -1, use_non_local = TRUE)

## Import to a FluidVecs object:
h2o.randomForest.FV(x, y, data, classification = TRUE, ntree = 50, depth = 20, 
  sample.rate = 2/3, nbins = 100, seed = -1, importance = FALSE, validation, 
  nodesize = 1, balance.classes = FALSE, max.after.balance.size = 5)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{x}{
A vector containing the names or indices of the predictor variables to use in building the random forest model.
}
  \item{y}{
The name or index of the response variable. If the data does not contain a header, this is the column index, designated by increasing numbers from left to right. (The response must be either an integer or a categorical variable).
}
  \item{data}{
An \code{\linkS4class{H2OParsedDataVA}} (\code{version = 1}) or \code{\linkS4class{H2OParsedData}} (\code{version = 2}) object containing the variables in the model.
}
  \item{classification}{
(Optional) A logical value indicating whether a classification model should be built (as opposed to regression).
  }
  \item{ntree}{
(Optional) Number of trees to grow. (Must be a nonnegative integer).
}
  \item{depth}{
  (Optional) Maximum depth to grow the tree.
  }
  \item{sample.rate}{
  (Optional) Sampling rate for constructing data from which individual trees are grown.
  }
  \item{classwt}{
  (Optional) Numeric vector of class weights for a categorical response.
  }
  \item{nbins}{
  (Optional) Build a histogram of this many bins, then split at best point.
  }
  \item{seed}{
  (Optional) Seed for building the random forest. If \code{seed = -1}, one will automatically be generated by H2O.
  }
  \item{importance}{
  (Optional) A logical value indicating whether to calculate variable importance. Set to \code{FALSE} to speed up computations.
  }
  \item{validation}{
  (Optional) An \code{\linkS4class{H2OParsedDataVA}} (\code{version = 1}) or \code{\linkS4class{H2OParsedData}} (\code{version = 2}) object indicating the validation dataset used to construct confusion matrix. If left blank, this defaults to the training data.}
  \item{nodesize}{
  (Optional) Number of nodes to use for computation.
  }
  \item{balance.classes}{(Optional) Balance training data class counts via over/under-sampling (for imbalanced data)}
  \item{max.after.balance.size}{Maximum relative size of the training data after balancing class counts (can be less than 1.0)}
  \item{use_non_local}{
  (Optional) Logical value indicating whether to use non-local data in building random forest model.
  }
  \item{version}{
  (Optional) The version of random forest to run. If \code{version = 1}, this will run the single-node ValueArray implementation, while \code{version = 2} selects the distributed, but still beta stage FluidVecs implementation.
  }
}

\details{
IMPORTANT: Currently, to run k-means with \code{version = 1}, you must import data to a ValueArray object using \code{\link{h2o.importFile.VA}}, \code{\link{h2o.importFolder.VA}} or one of its variants. To run with \code{version = 2}, you must import data to a FluidVecs object using \code{\link{h2o.importFile.FV}}, \code{\link{h2o.importFolder.FV}} or one of its variants.
}

\value{
An object of class \code{\linkS4class{H2ORFModelVA}} (\code{version = 1}) or \code{\linkS4class{H2ODRFModel}} (\code{version = 2}) with slots key, data, and model, where the last is a list of the following components:
\item{ntree }{Number of trees grown.}
\item{mse }{Mean-squared error for each tree.}
\item{forest }{A matrix giving the minimum, mean, and maximum of the tree depth and number of leaves.}
\item{confusion }{Confusion matrix of the prediction.}
}

\examples{
\dontrun{
# Run an RF model on iris data
library(h2o)
localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
irisPath = system.file("extdata", "iris.csv", package = "h2o")
iris.hex = h2o.importFile(localH2O, path = irisPath, key = "iris.hex")
h2o.randomForest(y = 5, x = c(2,3,4), data = iris.hex, ntree = 50, depth = 100)
}
}
% Add one or more standard keywords, see file 'KEYWORDS' in the
% R documentation directory.
\keyword{ ~kwd1 }
\keyword{ ~kwd2 }% __ONLY ONE__ keyword per line
