% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/lulu.R
\name{lulu}
\alias{lulu}
\title{Post Clustering Curation of Amplicon Data.}
\usage{
lulu(
  otu_table,
  matchlist,
  minimum_ratio_type = "min",
  minimum_ratio = 1,
  minimum_match = 84,
  minimum_relative_cooccurence = 0.95,
  progress_bar = TRUE,
  log_conserved = FALSE
)
}
\arguments{
\item{otu_table}{a data.frame with with an OTU table that has sites/samples as
columns and OTUs (unique OTU id's) as rows, and observations as read
counts.}

\item{matchlist}{a data.frame containing three columns: (1) OTU id of
potential child, (2) OTU id of potential parent, (3) match - \% identiti
between the sequences of the potential parent and potential child OTUs.
\strong{NB: The matchlist is the product of a mapping of OTU sequences against each other. This is
currently carried out by an external script in e.g. Blastn or VSEARCH, prior to running lulu!}}

\item{minimum_ratio_type}{sets whether a potential error must have lower
abundance than the parent in all samples \code{min} (default), or if an error
just needs to have lower abundance on average \code{avg}. Choosing lower
abundance on average over globally lower abundance will greatly increase
the number of designated errors. This option was introduced to make it
possible to account for non-sufficiently clustered intraspecific variation,
but is not generally recommended, as it will also increase the potential of
cluster well-separated, but co-occuring, sequence similar species.}

\item{minimum_ratio}{sets the minimim abundance ratio between a potential error
and a potential parent to be identified as an error. If the \code{minimum_ratio_type} is
set to \code{min} (default), the \code{minimum_ratio} applies to the lowest observed
ration across the samples.  If the \code{minimum_ratio_type} is
set to \code{avg} (default), the \code{minimum_ratio} applies to the mean of observed
ration across the samples.\code{avg}. (default is 1).}

\item{minimum_match}{minimum threshold of sequence similarity
for considering any OTU as an error of another can be set (default 84\%).}

\item{minimum_relative_cooccurence}{minimum co-occurrence rate, i.e. the
lower rate of occurrence of the potential error explained by co-occurrence
with the potential parent for considering error state.}

\item{progress_bar}{(Logical, default TRUE) print progress during the calculation or not.}

\item{log_conserved}{(Logical, default FALSE) conserved log files writed in the disk}
}
\value{
Function \code{lulu} returns a list of results based on the input OTU
table and match list.
\itemize{
\item \code{curated_table} - a curated
OTU table with daughters merged with their matching parents.
\item \code{curated_count} - number of curated (parent) OTUs.
\item \code{curated_otus} - ids of the OTUs that were accepted as valid OTUs.
\item \code{discarded_count} - number of discarded (merged with parent) OTUs.
\item \code{discarded_otus} - ids of the OTUs that were identified as
errors (daughters) and merged with respective parents.
\item \code{runtime} - time used by the script.
\item \code{minimum_match} - the id threshold
(minimum match \\% between parent and daughter) for evaluating co-occurence (set
by user).
\item \code{minimum_relative_cooccurence} - minimum ratio of
daughter-occurences explained by co-occurence with parent (set by user).
\item \code{otu_map} - information of which daughters were mapped to which
parents.
\item \code{original_table} - original OTU table.
}

The matchlist is the product of a mapping of OTU sequences against each other. This is
currently carried out by an external script in e.g. BLASTN or VSEARCH, prior to running \code{lulu}!
Producing the match list requires a file with all the OTU sequences (centroids) - e.g. \code{OTUcentroids.fasta}. The matchlist can be produced by mapping all OTUs against each other with an external algorithm like VSEARCH or BLASTN. In \code{VSEARCH} a matchlist can be produced e.g. with the following command: \code{vsearch --usearch_global OTUcentroids.fasta --db OTUcentroids.fasta --strand plus --self --id .80 --iddef 1 --userout matchlist.txt --userfields query+target+id --maxaccepts 0 --query_cov .9 --maxhits 10}. In \code{BLASTN} a matchlist can be produces e.g. with the following commands. First we produce a blast-database from the fasta file: \code{makeblastdb -in OTUcentroids.fasta -parse_seqids -dbtype nucl}, then we match the centroids against that database: \code{blastn -db OTUcentoids.fasta -num_threads 10 -outfmt'6 qseqid sseqid pident' -out matchlist.txt -qcov_hsp_perc .90 -perc_identity .84 -query OTUcentroids.fasta}
}
\description{
\if{html}{\out{
<a href="https://adrientaudiere.github.io/MiscMetabar/articles/Rules.html#lifecycle">
<img src="https://img.shields.io/badge/lifecycle-stable-green" alt="lifecycle-stable"></a>
}}


The original function and documentation was written by Tobias Guldberg Frøslev
in the \href{https://github.com/tobiasgf/lulu}{lulu} package.

This algorithm \code{lulu} consumes an OTU table and a matchlist, and
evaluates cooccurence of 'daughters' (potential analytical artefacts) and
their 'parents' (~= real biological species/OTUs). The algorithm requires an
OTU table (species/site matrix), and a match list. The OTU table can be
made with various r-packages (e.g. \code{DADA2}) or
external pipelines (\code{VSEARCH, USEARCH, QIIME}, etc.), and the
match-list can be made with external bioinformatic tools like
\code{VSEARCH, USEARCH, BLASTN} or another algorithm
for pair-wise sequence matching.
}
\details{
Please cite the lulu original paper: https://www.nature.com/articles/s41467-017-01312-x
}
\author{
Tobias Guldberg Frøslev (orcid: \href{https://orcid.org/0000-0002-3530-013X}{0000-0002-3530-013X}),
modified by Adrien Taudière
}
