% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/grabMzmlFunctions.R
\name{grabMzmlData}
\alias{grabMzmlData}
\title{Get mass-spectrometry data from an mzML file}
\usage{
grabMzmlData(
  filename,
  grab_what,
  verbosity = 0,
  mz = NULL,
  ppm = NULL,
  rtrange = NULL,
  prefilter = -1
)
}
\arguments{
\item{filename}{A single filename to read into R's memory. Both absolute and
relative paths are acceptable.}

\item{grab_what}{What data should be read from the file? Options include
"MS1" for data only from the first spectrometer, "MS2" for fragmentation
data, "BPC" for rapid access to the base peak chromatogram, "TIC" for rapid
access to the total ion chromatogram, "DAD" for DAD (UV) data, and "chroms"
for precompiled chromatogram data (especially useful for MRM but often
contains BPC/TIC in other files). Metadata can be accessed with "metadata",
which provides information about the instrument and time the file was run.
These options can be combined (i.e. `grab_data=c("MS1", "MS2", "BPC")`) or
this argument can be set to "everything" to extract all of the above.
Options "EIC" and "EIC_MS2" are useful when working with files whose total
size exceeds working memory - it first extracts all relevant MS1 and MS2
data, respectively, then discards data outside of the mass range(s)
calculated from the provided mz and ppm. The default, "everything",
includes all MS1, MS2, BPC, TIC, and metadata.}

\item{verbosity}{Three levels of processing output to the R console are
available, with increasing verbosity corresponding to higher integers. A
verbosity of zero means that no output will be produced, useful when
wrapping within larger functions. A verbosity of 1 will produce a progress
bar using base R's txtProgressBar function. A verbosity of 2 or higher will
produce timing output for each individual file read in.}

\item{mz}{A vector of the mass-to-charge ratio for compounds of interest.
Only used when combined with `grab_what = "EIC"` (see above). Multiple
masses can be provided.}

\item{ppm}{A single number corresponding to the mass accuracy (in parts per
million) of the instrument on which the data was collected. Only used when
combined with `grab_what = "EIC"` (see above).}

\item{rtrange}{A vector of length 2 containing an upper and lower bound on
retention times of interest. Providing a range here can speed up load times
(although not enormously, as the entire file must still be read) and reduce
the final object's size.}

\item{prefilter}{A single number corresponding to the minimum intensity of
interest in the MS1 data. Data points with intensities below this threshold
will be silently dropped, which can dramatically reduce the size of the
final object. Currently only works with MS1 data, but could be expanded
easily to handle more.}
}
\value{
A list of `data.table`s, each named after the arguments requested in
  grab_what. E.g. $MS1 contains MS1 information, $MS2 contains fragmentation
  info, etc. MS1 data has four columns: retention time (rt), mass-to-charge
  (mz), intensity (int), and filename. MS2 data has six: retention time (rt),
  precursor m/z (premz), fragment m/z (fragmz), fragment intensity (int),
  collision energy (voltage), and filename. Data requested that does not
  exist in the provided files (such as MS2 data requested from MS1-only
  files) will return an empty (length zero) data.table. The data.tables
  extracted from each of the individual files are collected into one large
  table using data.table's `rbindlist`. $metadata is a little weirder because
  the metadata doesn't fit neatly into a tidy format but things are hopefully
  named helpfully. $chroms was added in v1.3 and contains 7 columns:
  chromatogram type (usually TIC, BPC or SRM info), chromatogram index,
  target mz, product mz, retention time (rt), and intensity (int). $DAD was
  also added in v1.3 and contains has three columns: retention time (rt),
  wavelength (lambda),and intensity (int). Data requested that does not exist
  in the provided files (such as MS2 data requested from MS1-only files) will
  return an empty (zero-row) data.table.
}
\description{
This function handles the mzML side of things, reading in files that are
written in the mzML format. Much of the code is similar to the mzXML format,
but the xpath handles are different and the mz/int array is encoded as two
separate entries rather than simultaneously. This function has been exposed
to the user in case per-file optimization (such as peakpicking or additional
filtering) is desired before the full data object is returned.
}
\examples{
sample_file <- system.file("extdata", "LB12HL_AB.mzML.gz", package = "RaMS")
file_data <- grabMzmlData(sample_file, grab_what="MS1")
\dontrun{
# Extract MS1 data and a base peak chromatogram
file_data <- grabMzmlData(sample_file, grab_what=c("MS1", "BPC"))
# Extract data from a retention time subset
file_data <- grabMzmlData(sample_file, grab_what=c("MS1", "BPC"),
                          rtrange=c(5, 7))
# Extract EIC for a specific mass
file_data <- grabMzmlData(sample_file, grab_what="EIC", mz=118.0865, ppm=5)
# Extract EIC for several masses simultaneously
file_data <- grabMzmlData(sample_file, grab_what="EIC", ppm=5,
                          mz=c(118.0865, 146.118104, 189.123918))

# Extract MS2 data
sample_file <- system.file("extdata", "DDApos_2.mzML.gz", package = "RaMS")
MS2_data <- grabMzmlData(sample_file, grab_what="MS2")
}
}
