% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/PipeOpTaskPreprocTorch.R
\name{mlr_pipeops_preproc_torch}
\alias{mlr_pipeops_preproc_torch}
\alias{PipeOpTaskPreprocTorch}
\title{Base Class for Lazy Tensor Preprocessing}
\description{
This \code{PipeOp} can be used to preprocess (one or more) \code{\link{lazy_tensor}} columns contained in an \code{\link[mlr3:Task]{mlr3::Task}}.
The preprocessing function is specified as construction argument \code{fn} and additional arguments to this
function can be defined through the \code{PipeOp}'s parameter set.
The preprocessing is done per column, i.e. the number of lazy tensor output columns is equal
to the number of lazy tensor input columns.

To create custom preprocessing \code{PipeOp}s you can use \code{\link{pipeop_preproc_torch}}.
}
\section{Inheriting}{

In addition to specifying the construction arguments, you can overwrite the private \code{.shapes_out()} method.
If you don't overwrite it, the output shapes are assumed to be unknown (\code{NULL}).
\itemize{
\item \code{.shapes_out(shapes_in, param_vals, task)}\cr
(\code{list()}, \verb{list(), }Task\code{or}NULL\verb{) -> }list()\verb{\\cr This private method calculates the output shapes of the lazy tensor columns that are created from applying the preprocessing function with the provided parameter values (}param_vals\verb{). The }task\verb{is very rarely needed, but if it is it should be checked that it is not}NULL`.

This private method only has the responsibility to calculate the output shapes for one input column, i.e. the
input \code{shapes_in} can be assumed to have exactly one shape vector for which it must calculate the output shapes
and return it as a \code{list()} of length 1.
It can also be assumed that the shape is not \code{NULL} (i.e. unknown).
Also, the first dimension can be \code{NA}, i.e. is unknown (as for the batch dimension).
}
}

\section{Input and Output Channels}{

See \code{\link[mlr3pipelines:PipeOpTaskPreproc]{PipeOpTaskPreproc}}.
}

\section{State}{

In addition to state elements from \code{\link[mlr3pipelines:PipeOpTaskPreprocSimple]{PipeOpTaskPreprocSimple}},
the state also contains the \verb{$param_vals} that were set during training.
}

\section{Parameters}{

In addition to the parameters inherited from \code{\link[mlr3pipelines:PipeOpTaskPreproc]{PipeOpTaskPreproc}} as well as those specified during construction
as the argument \code{param_set} there are the following parameters:
\itemize{
\item \code{stages} :: \code{character(1)}\cr
The stages during which to apply the preprocessing.
Can be one of \code{"train"}, \code{"predict"} or \code{"both"}.
The initial value of this parameter is set to \code{"train"} when the \code{PipeOp}'s id starts with \code{"augment_"} and
to \code{"both"} otherwise.
Note that the preprocessing that is applied during \verb{$predict()} uses the parameters that were set during
\verb{$train()} and not those that are set when performing the prediction.
}
}

\section{Internals}{

During \verb{$train()} / \verb{$predict()}, a \code{\link{PipeOpModule}} with one input and one output channel is created.
The pipeop applies the function \code{fn} to the input tensor while additionally
passing the parameter values (minus \code{stages} and \code{affect_columns}) to \code{fn}.
The preprocessing graph of the lazy tensor columns is shallowly cloned and the \code{PipeOpModule} is added.
This is done to avoid modifying user input and means that identical \code{PipeOpModule}s can be part of different
preprocessing graphs. This is only possible, because the created \code{PipeOpModule} is stateless.

At a later point in the graph, preprocessing graphs will be merged if possible to avoid unnecessary computation.
This is best illustrated by example:
One lazy tensor column's preprocessing graph is \code{A -> B}.
Then, two branches are created \code{B -> C} and \code{B -> D}, creating two preprocessing graphs
\code{A -> B -> C} and \code{A -> B -> D}. When loading the data, we want to run the preprocessing only once, i.e. we don't
want to run the \code{A -> B} part twice. For this reason, \code{\link[=task_dataset]{task_dataset()}} will try to merge graphs and cache
results from graphs. However, only graphs using the same dataset can currently be merged.

Also, the shapes created during \verb{$train()} and \verb{$predict()} might differ.
To avoid the creation of graphs where the predict shapes are incompatible with the train shapes,
the hypothetical predict shapes are already calculated during \verb{$train()} (this is why the parameters that are set
during train are also used during predict) and the \code{\link{PipeOpTorchModel}} will check the train and predict shapes for
compatibility before starting the training.

Otherwise, this mechanism is very similar to the \code{\link{ModelDescriptor}} construct.
}

\examples{
\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
# Creating a simple task
d = data.table(
  x1 = as_lazy_tensor(rnorm(10)),
  x2 = as_lazy_tensor(rnorm(10)),
  x3 = as_lazy_tensor(as.double(1:10)),
  y = rnorm(10)
)

taskin = as_task_regr(d, target = "y")

# Creating a simple preprocessing pipeop
po_simple = po("preproc_torch",
  # get rid of environment baggage
  fn = mlr3misc::crate(function(x, a) x + a),
  param_set = paradox::ps(a = paradox::p_int(tags = c("train", "required")))
)

po_simple$param_set$set_values(
  a = 100,
  affect_columns = selector_name(c("x1", "x2")),
  stages = "both" # use during train and predict
)

taskout_train = po_simple$train(list(taskin))[[1L]]
materialize(taskout_train$data(cols = c("x1", "x2")), rbind = TRUE)

taskout_predict_noaug = po_simple$predict(list(taskin))[[1L]]
materialize(taskout_predict_noaug$data(cols = c("x1", "x2")), rbind = TRUE)

po_simple$param_set$set_values(
  stages = "train"
)

# transformation is not applied
taskout_predict_aug = po_simple$predict(list(taskin))[[1L]]
materialize(taskout_predict_aug$data(cols = c("x1", "x2")), rbind = TRUE)

# Creating a more complex preprocessing PipeOp
PipeOpPreprocTorchPoly = R6::R6Class("PipeOpPreprocTorchPoly",
 inherit = PipeOpTaskPreprocTorch,
 public = list(
   initialize = function(id = "preproc_poly", param_vals = list()) {
     param_set = paradox::ps(
       n_degree = paradox::p_int(lower = 1L, tags = c("train", "required"))
     )
     param_set$set_values(
       n_degree = 1L
     )
     fn = mlr3misc::crate(function(x, n_degree) {
       torch::torch_cat(
         lapply(seq_len(n_degree), function(d) torch_pow(x, d)),
         dim = 2L
       )
     })

     super$initialize(
       fn = fn,
       id = id,
       packages = character(0),
       param_vals = param_vals,
       param_set = param_set,
       stages_init = "both"
     )
   }
 ),
 private = list(
   .shapes_out = function(shapes_in, param_vals, task) {
     # shapes_in is a list of length 1 containing the shapes
     checkmate::assert_true(length(shapes_in[[1L]]) == 2L)
     if (shapes_in[[1L]][2L] != 1L) {
       stop("Input shape must be (NA, 1)")
     }
     list(c(NA, param_vals$n_degree))
   }
 )
)

po_poly = PipeOpPreprocTorchPoly$new(
  param_vals = list(n_degree = 3L, affect_columns = selector_name("x3"))
)

po_poly$shapes_out(list(c(NA, 1L)), stage = "train")

taskout = po_poly$train(list(taskin))[[1L]]
materialize(taskout$data(cols = "x3"), rbind = TRUE)
\dontshow{\}) # examplesIf}
}
\section{Super classes}{
\code{\link[mlr3pipelines:PipeOp]{mlr3pipelines::PipeOp}} -> \code{\link[mlr3pipelines:PipeOpTaskPreproc]{mlr3pipelines::PipeOpTaskPreproc}} -> \code{PipeOpTaskPreprocTorch}
}
\section{Active bindings}{
\if{html}{\out{<div class="r6-active-bindings">}}
\describe{
\item{\code{fn}}{The preprocessing function.}

\item{\code{rowwise}}{Whether the preprocessing is applied rowwise.}
}
\if{html}{\out{</div>}}
}
\section{Methods}{
\subsection{Public methods}{
\itemize{
\item \href{#method-PipeOpTaskPreprocTorch-new}{\code{PipeOpTaskPreprocTorch$new()}}
\item \href{#method-PipeOpTaskPreprocTorch-shapes_out}{\code{PipeOpTaskPreprocTorch$shapes_out()}}
\item \href{#method-PipeOpTaskPreprocTorch-clone}{\code{PipeOpTaskPreprocTorch$clone()}}
}
}
\if{html}{\out{
<details open><summary>Inherited methods</summary>
<ul>
<li><span class="pkg-link" data-pkg="mlr3pipelines" data-topic="PipeOp" data-id="help"><a href='../../mlr3pipelines/html/PipeOp.html#method-PipeOp-help'><code>mlr3pipelines::PipeOp$help()</code></a></span></li>
<li><span class="pkg-link" data-pkg="mlr3pipelines" data-topic="PipeOp" data-id="predict"><a href='../../mlr3pipelines/html/PipeOp.html#method-PipeOp-predict'><code>mlr3pipelines::PipeOp$predict()</code></a></span></li>
<li><span class="pkg-link" data-pkg="mlr3pipelines" data-topic="PipeOp" data-id="print"><a href='../../mlr3pipelines/html/PipeOp.html#method-PipeOp-print'><code>mlr3pipelines::PipeOp$print()</code></a></span></li>
<li><span class="pkg-link" data-pkg="mlr3pipelines" data-topic="PipeOp" data-id="train"><a href='../../mlr3pipelines/html/PipeOp.html#method-PipeOp-train'><code>mlr3pipelines::PipeOp$train()</code></a></span></li>
</ul>
</details>
}}
\if{html}{\out{<hr>}}
\if{html}{\out{<a id="method-PipeOpTaskPreprocTorch-new"></a>}}
\if{latex}{\out{\hypertarget{method-PipeOpTaskPreprocTorch-new}{}}}
\subsection{Method \code{new()}}{
Creates a new instance of this \code{\link[R6:R6Class]{R6}} class.
\subsection{Usage}{
\if{html}{\out{<div class="r">}}\preformatted{PipeOpTaskPreprocTorch$new(
  fn,
  id = "preproc_torch",
  param_vals = list(),
  param_set = ps(),
  packages = character(0),
  rowwise = FALSE,
  stages_init = NULL,
  tags = NULL
)}\if{html}{\out{</div>}}
}

\subsection{Arguments}{
\if{html}{\out{<div class="arguments">}}
\describe{
\item{\code{fn}}{(\code{function} or \code{character(2)})\cr
The preprocessing function. Must not modify its input in-place.
If it is a \code{character(2)}, the first element should be the namespace and the second element the name.
When the preprocessing function is applied to the tensor, the tensor will be passed by position as the first argument.
If the \code{param_set} is inferred (left as \code{NULL}) it is assumed that the first argument is the \code{torch_tensor}.}

\item{\code{id}}{(\code{character(1)})\cr
The id for of the new object.}

\item{\code{param_vals}}{(named \code{list()})\cr
Parameter values to be set after construction.}

\item{\code{param_set}}{(\code{\link[paradox:ParamSet]{ParamSet}})\cr
In case the function \code{fn} takes additional parameter besides a \code{\link[torch:torch_tensor]{torch_tensor}} they can be
specfied as parameters. None of the parameters can have the \code{"predict"} tag.
All tags should include \code{"train"}.}

\item{\code{packages}}{(\code{character()})\cr
The packages the preprocessing function depends on.}

\item{\code{rowwise}}{(\code{logical(1)})\cr
Whether the preprocessing function is applied rowwise (and then concatenated by row) or directly to the whole
tensor. In the first case there is no batch dimension.}

\item{\code{stages_init}}{(\code{character(1)})\cr
Initial value for the \code{stages} parameter.}

\item{\code{tags}}{(\code{character()})\cr
Tags for the pipeop.}
}
\if{html}{\out{</div>}}
}
}
\if{html}{\out{<hr>}}
\if{html}{\out{<a id="method-PipeOpTaskPreprocTorch-shapes_out"></a>}}
\if{latex}{\out{\hypertarget{method-PipeOpTaskPreprocTorch-shapes_out}{}}}
\subsection{Method \code{shapes_out()}}{
Calculates the output shapes that would result in applying the preprocessing to one or more
lazy tensor columns with the provided shape.
Names are ignored and only order matters.
It uses the parameter values that are currently set.
\subsection{Usage}{
\if{html}{\out{<div class="r">}}\preformatted{PipeOpTaskPreprocTorch$shapes_out(shapes_in, stage = NULL, task = NULL)}\if{html}{\out{</div>}}
}

\subsection{Arguments}{
\if{html}{\out{<div class="arguments">}}
\describe{
\item{\code{shapes_in}}{(\code{list()} of (\code{integer()} or \code{NULL}))\cr
The input input shapes of the lazy tensors.
\code{NULL} indicates that the shape is unknown.
First dimension must be \code{NA} (if it is not \code{NULL}).}

\item{\code{stage}}{(\code{character(1)})\cr
The stage: either \code{"train"} or \code{"predict"}.}

\item{\code{task}}{(\code{\link[mlr3:Task]{Task}} or \code{NULL})\cr
The task, which is very rarely needed.}
}
\if{html}{\out{</div>}}
}
\subsection{Returns}{
\code{list()} of (\code{integer()} or \code{NULL})
}
}
\if{html}{\out{<hr>}}
\if{html}{\out{<a id="method-PipeOpTaskPreprocTorch-clone"></a>}}
\if{latex}{\out{\hypertarget{method-PipeOpTaskPreprocTorch-clone}{}}}
\subsection{Method \code{clone()}}{
The objects of this class are cloneable with this method.
\subsection{Usage}{
\if{html}{\out{<div class="r">}}\preformatted{PipeOpTaskPreprocTorch$clone(deep = FALSE)}\if{html}{\out{</div>}}
}

\subsection{Arguments}{
\if{html}{\out{<div class="arguments">}}
\describe{
\item{\code{deep}}{Whether to make a deep clone.}
}
\if{html}{\out{</div>}}
}
}
}
