Type: | Package |
Title: | Variant Calling in Targeted Analysis Sequencing Data |
Version: | 0.0.2 |
Date: | 2020-11-03 |
Description: | Multi-caller variant analysis pipeline for targeted analysis sequencing (TAS) data. Features a modular, automated workflow that can start with raw reads and produces a user-friendly PDF summary and a spreadsheet containing consensus variant information. |
SystemRequirements: | perl, bedtools (>=2.27.1), bwa |
License: | GPL-2 |
Suggests: | testthat, knitr, rmarkdown, futile.logger |
Imports: | stringr, dplyr, yaml, openxlsx, VennDiagram, assertthat, magrittr, tools, utils, tidyr, doParallel, foreach |
RoxygenNote: | 6.1.1 |
Encoding: | UTF-8 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2020-11-03 14:25:11 UTC; amills |
Author: | Adam Mills [aut, cre], Erle Holgersen [aut], Ros Cutts [aut], Syed Haider [aut] |
Maintainer: | Adam Mills <Adam.Mills@icr.ac.uk> |
Repository: | CRAN |
Date/Publication: | 2020-11-14 00:30:03 UTC |
add.option
Description
Add option to nested list of options. Applied recursively
Usage
add.option(name, value, old.options, nesting.character = "\\.")
Arguments
name |
Option name. Nesting is indicated by character specified in nesting.character. |
value |
New value of option |
old.options |
Nested list the option should be added to |
nesting.character |
String giving Regex pattern of nesting indication string. Defaults to '\.' |
Value
Nested list with updated options
alternate.gene.sort
Description
Given a data frame containing coverage statistics and gene information, returns that frame with the rows sorted by alternating gene size (for plotting)
Usage
alternate.gene.sort(coverage.statistics)
Arguments
coverage.statistics |
Data frame of coverage statistics |
Details
Genes have varying numbers of associated amplicons and when plotting coverage statistics, if two genes with very low numbers of amplicons are next to each other, the labels will overlap. This function sorts the coverage statistics data frame in a way that places the genes with the most amplicons (largest) next to those with the least (smallest).
Value
Coverage statistics data frame sorted by alternating gene size
build.variant.specification
Description
Build data frame with paths to variant files.
Usage
build.variant.specification(sample.ids, project.directory)
Arguments
sample.ids |
Vector of sample IDs. Must match subdirectories in project.directory. |
project.directory |
Path to directory where sample subdirectories |
Details
Parses through sample IDs in a project directory and returns paths to variant files based on (theoretical) file name patterns. Useful for testing, or for entering the pipeline at non-traditional stages.
Value
Data frame with paths to variant files.
. Make Venn diagram of variant caller overlap
Description
. Make Venn diagram of variant caller overlap
Usage
caller.overlap.venn.diagram(variants, file.name)
Arguments
variants |
Data frame containing variants, typically from merge.variants function |
file.name |
Name of output file |
capitalize.caller
Description
Capitalize variant caller name
Usage
capitalize.caller(caller)
capitalise.caller(caller)
Arguments
caller |
Character vector of callers to be capitalized |
Value
Vector of same length as caller where eligible callers have been capitalized
classify.variant
Description
Classify a variant as SNV, MNV, or indel based on the reference and alternative alleles
Usage
classify.variant(ref, alt)
Arguments
ref |
Vector of reference bases |
alt |
Vector of alternate bases |
Value
Character vector giving type of variant.
Convert output of iDES step 1 to variant call format
Description
Convert output of iDES step 1 to variant call format
Usage
convert.ides.output(filename, output = TRUE,
output.suffix = ".calls.txt", minreads = 5, mindepth = 50)
Arguments
filename |
Path to file |
output |
Logical indicating whether output should be saved to file. Defaults to true. |
output.suffix |
Suffix to be appended to input filename if saving results to file |
minreads |
Minimum numbers of reads |
mindepth |
Minimum depth |
Value
potential.calls Data frame of converted iDES calls
create.directories
Description
Create directories in a given path
Usage
create.directories(directory.names, path)
Arguments
directory.names |
Vector of names of directories to be created |
path |
Path where directories should be created |
date.stamp.file.name
Description
Prefix file name with a date-stamp.
Usage
date.stamp.file.name(file.name, date = Sys.Date(), separator = "_")
Arguments
file.name |
File name to be date-stamped |
date |
Date to be added. Defaults to current date. |
separator |
String that should separate the date from the file name. Defaults to a single underscore. |
Value
String giving the datestamped file name
Examples
date.stamp.file.name('plot.png');
date.stamp.file.name('yesterdays_plot.png', date = Sys.Date() - 1);
Extract sample IDs from file paths
Description
Extract sample IDs from a set of paths to files in sample-specific subfolders
Usage
extract.sample.ids(paths, from.filename = FALSE)
Arguments
paths |
vector of file paths |
from.filename |
Logical indicating whether sample ID should be extracted from filename rather than path |
Value
vector of extracted sample IDs
Filter variants in file.
Description
Filter variants from file, and save to output. Wrapper function that opens the variant file, calls filter.variants, and saves the result to file
Usage
filter.variant.file(variant.file, output.file, config.file = NULL,
caller = c("vardict", "ides", "mutect", "pgm", "consensus"))
Arguments
variant.file |
Path to variant file |
output.file |
Path to output file |
config.file |
Path to config file to be used. If not supplied, will use the pre-existing VariTAS options. |
caller |
Name of caller used (needed to match appropriate filters from settings) |
Value
None
Filter variant calls
Description
Filter data frame of variant calls based on thresholds specified in settings.
Usage
filter.variants(variants, caller = c("vardict", "ides", "mutect", "pgm",
"consensus", "isis", "varscan", "lofreq"), config.file = NULL,
verbose = FALSE)
Arguments
variants |
Data frame of variant calls with ANNOVAR annotation, or path to variant file. |
caller |
Name of caller used (needed to match appropriate filters from settings) |
config.file |
Path to config file to be used. If not supplied, will use the pre-existing VariTAS options. |
verbose |
Logical indicating whether to output descriptions of filtering steps. Defaults to False, useful for debugging. |
Value
filtered.variants Data frame of filtered variants
fix.lofreq.af
Description
LoFreq also does not output allele frequencies, so this script calculates them from the DP (depth) and AD (variant allele depth) values–which are also not output nicely– and adds them to the annotated vcf.
Usage
fix.lofreq.af(variant.specification)
Arguments
variant.specification |
Data frame of variant file information |
Fix variant call column names
Description
Fix headers of variant calls to prepare for merging. This mostly consists in making sure the column headers will be unique by prefixing the variant caller in question.
Usage
fix.names(column.names, variant.caller, sample.id = NULL)
Arguments
column.names |
Character vector of column names |
variant.caller |
String giving name of variant caller |
sample.id |
Optional sample ID. Used to fix headers. |
Value
new.column.names Vector of column names after fixing]
fix.varscan.af
Description
VarScan does not output allele frequencies, so this script calculates them from the DP (depth) and AD (variant allele depth) values and adds them to the annotated vcf.
Usage
fix.varscan.af(variant.specification)
Arguments
variant.specification |
Data frame of variant file information |
Get base substitution
Description
Get base substitution represented by pyrimidine in base pair. If more than one base in REF/ALT (i.e. MNV or indel rather than SNV), NA will be returned
Usage
get.base.substitution(ref, alt)
Arguments
ref |
Vector of reference bases |
alt |
Vector of alternate bases |
Value
base.substitutions
get.bed.chromosomes
Description
Extract chromosomes from bed file
Usage
get.bed.chromosomes(bed)
Arguments
bed |
Path to BED file |
Value
Vector containing all chromosomes in BED file
get.buildver
Description
Get build version (hg19/hg38) based on settings.
Parses VariTAS pipeline settings to get the build version. When this function was first developed, the idea was to be able to explicitly set ANNOVAR filenames based on the build version.
Usage
get.buildver()
Value
String giving reference genome build version (hg19 or hg38)
Generate a colour scheme
Description
Generate a colour scheme
Usage
get.colours(n)
Arguments
n |
Number of colours desired |
Value
Colour.scheme generated colours
Process sample coverage per amplicon data
Description
Parse coverageBed output to get coverage by amplicon
Usage
get.coverage.by.amplicon(project.directory)
Arguments
project.directory |
Path to project directory. Each sample should have its own subdirectory |
Value
combined.data Data frame giving coverage per amplicon per sample.
References
http://bedtools.readthedocs.io/en/latest/content/tools/coverage.html
Get statistics about coverage per sample
Description
Get statistics about coverage per sample
Usage
get.coverage.by.sample.statistics(project.directory)
Arguments
project.directory |
Path to project directory. Each sample should have its own subdirectory |
Value
coverage.by.sample.statistics Data frame with coverage statistics per sample
get.fasta.chromosomes
Description
Extract chromosomes from fasta headers.
Usage
get.fasta.chromosomes(fasta)
Arguments
fasta |
Path to reference fasta |
Value
Vector containing all chromosomes in fasta file.
get.file.path
Description
Get absolute path to sample-specific file for one or more samples
Usage
get.file.path(sample.ids, directory, extension = NULL,
allow.multiple = FALSE, allow.none = FALSE)
Arguments
sample.ids |
Vector of sample IDs to match filename on |
directory |
Path to directory containing files |
extension |
String giving extension of file |
allow.multiple |
Boolean indicating whether to allow multiple matching files. Defaults to false, which throws an error if the query matches more than one file. |
allow.none |
Boolean indicating whether to allow no matching files. Defaults to false, which throws an error if the query does not match any files. |
Value
Paths to matched files
get.filters
Description
Determine filters per caller, given default and caller-specific values.
Usage
get.filters(filters)
Arguments
filters |
List of filter values. These will be updated to use default as the baseline, with caller-specific filters taking precedence if supplied. |
Value
A list with updated filters
get.gene
Description
Use guesswork to extract gene from data frame of targeted panel data. The panel designer output can change, so try to guess what the format is.
Usage
get.gene(bed.data)
Arguments
bed.data |
Data frame containing data from bed file |
Value
vector of gene names, one entry for each row of bed.data
get.miniseq.sample.files
Description
Get files for a sample in a directory, ensuring there's only a single match per sample ID.
Usage
get.miniseq.sample.files(sample.ids, directory,
file.suffix = "_S\\d{1,2}_.*")
Arguments
sample.ids |
Vector of sample ids. Should form first part of file name |
directory |
Directory where files can be found |
file.suffix |
Regex expression for end of file name. For example, ‘file.suffix = ’_S\d1,2_.*_R1_.*'' will match R1 files.1 files. |
Value
Character vector of file paths
Helper function to recursively get an VariTAS option
Description
Helper function to recursively get an VariTAS option
Usage
get.option(name, varitas.options = NULL, nesting.character = "\\.")
Arguments
name |
Option name |
varitas.options |
Optional list of options to search in |
nesting.character |
String giving Regex pattern of nesting indication string. Defaults to '\.' |
Value
value Requested option
Summarise panel coverage by gene
Description
Summarise panel coverage by gene
Usage
get.panel.coverage.by.gene(panel.file, gene.col = 5)
Arguments
panel.file |
path to panel |
gene.col |
index of column containing gene name |
Value
panel.coverage.by.gene data frame giving the number of amplicons and their total length by gene
Get pool corresponding to each amplicon
Description
The bed files are not consistent, so it's not clear where the pool will appear. This function parses through the columns to identify where the pool
Usage
get.pool.from.panel.data(panel.data)
Arguments
panel.data |
data frame pool should be extracted from |
Value
pools vector of pool information
Return VariTAS settings
Description
Return VariTAS settings
Usage
get.varitas.options(option.name = NULL, nesting.character = "\\.")
Arguments
option.name |
Optional name of option. If no name is supplied, the full list of VariTAS options will be provided. |
nesting.character |
String giving Regex pattern of nesting indication string. Defaults to '\.' |
Value
varitas.options list specifying VariTAS options
Examples
reference.build <- get.varitas.options('reference_build');
mutect.filters <- get.varitas.options('filters.mutect');
get.vcf.chromosomes
Description
Extract chromosomes from a VCF file.
Usage
get.vcf.chromosomes(vcf)
Arguments
vcf |
Path to VCF file |
Value
Vector containing all chromosomes in VCF
Check if a key is in VariTAS options
Description
Check if a key is in VariTAS options
Usage
in.varitas.options(option.name = NULL, varitas.options = NULL,
nesting.character = "\\.")
Arguments
option.name |
String giving name of option (with different levels joined by |
varitas.options |
Ampliseq options as a list. If missing, they will be obtained from |
nesting.character |
String giving Regex pattern of nesting indication string. Defaults to '\.' |
Value
in.options Boolean indicating if the option name exists in the current varitas options
logical.to.character
Description
Convert a logical vector to a T/F coded character vector. Useful for preventing unwanted T->TRUE nucleotide conversions
Usage
logical.to.character(x)
Arguments
x |
Vector to be converted |
Value
Character vector after converting TRUE/FALSE
Make string with command line call from its individual components
Description
Make string with command line call from its individual components
Usage
make.command.line.call(main.command, options = NULL, flags = NULL,
option.prefix = "--", option.separator = " ", flag.prefix = "--")
Arguments
main.command |
String or vector of strings giving main part of command (e.g. "python test.py" or c("python", "test.py")) |
options |
Named vector or list giving options |
flags |
Vector giving flags to include. |
option.prefix |
String to preface all options. Defaults to "–" |
option.separator |
String to separate options form their values. Defaults to a single space. |
flag.prefix |
String to preface all flags. Defaults to "–" |
Value
command string giving command line call
mean.field.value
Description
Get mean value of a variant annotation field
Usage
## S3 method for class 'field.value'
mean(variants, field = c("TUMOUR.DP", "NORMAL.DP",
"NORMAL.AF", "TUMOUR.AF", "QUAL"), caller = c("consensus", "vardict",
"pgm", "mutect", "isis", "varscan", "lofreq"))
Arguments
variants |
Data frame with variants |
field |
String giving field of interest. |
caller |
String giving caller to calculate values from |
Details
As part of the variant merging process, annotated variant data frames are merged into one, with the value from each caller prefixed by CALLER. For example, the VarDict normal allele freqeuncy will have header VARDICT.NORMAL.AF. This function takes the average of all callers' value for a given field, removing NA's. If only a single caller is present in the data frame, that value is returned.
Value
Vector of mean values.
Merge potential iDES calls with variant annotation.
Description
Merge potential iDES calls with variant annotation.
Usage
## S3 method for class 'ides.annotation'
merge(ides.filename, output = TRUE,
output.suffix = ".ann.txt",
annovar.suffix.pattern = ".annovar.hg(\\d{2})_multianno.txt")
Arguments
ides.filename |
Path to formatted iDES output (typically from convert.ides.output file) |
output |
Logical indicating whether output should be saved to file. Defaults to true. |
output.suffix |
Suffix to be appended to input filename if saving results to file |
annovar.suffix.pattern |
Suffix to match ANNOAR file |
Details
The VarDict variant calling includes a GATK call merging the call vcf file (allele frequency information etc.) with the ANNOVAR annotation, and saving the result as a table. This function is an attempt to emulate that step for the iDES calls.
Value
annotated.calls Data frame of annotations and iDES output.
Merge variants
Description
Merge variants from multiple callers and return a data frame of merged calls. By default filtering is also applied, although this behaviour can be turned off by setting apply.filters to FALSE.
Usage
## S3 method for class 'variants'
merge(variant.specification, apply.filters = TRUE,
remove.structural.variants = TRUE,
separate.consensus.filters = FALSE, verbose = FALSE)
Arguments
variant.specification |
Data frame containing details of file paths, sample IDs, and caller. |
apply.filters |
Logical indicating whether to apply filters. Defaults to TRUE. |
remove.structural.variants |
Logical indicating whether structural variants (including CNVs) should be removed. Defaults to TRUE. |
separate.consensus.filters |
Logical indicating whether to apply different thresholds to variants called by more than one caller (specified under consensus in config file). Defaults to FALSE. |
verbose |
Logical indicating whether to print information to screen |
Value
Data frame
overwrite.varitas.options
Description
Overwrite VariTAS options with options provided in config file.
Usage
overwrite.varitas.options(config.file)
Arguments
config.file |
Path to config file that should be used to overwrite options |
Value
None
Examples
## Not run:
config <- file.path(path.package('varitas'), 'config.yaml')
overwrite.varitas.options(config)
## End(Not run)
Parse job dependencies
Description
Parse job dependencies to make the functions more robust to alternate inputs (e.g. people writing alignment instead of bwa)
Usage
parse.job.dependencies(dependencies)
Arguments
dependencies |
Job dependency strings to be parsed. |
Value
parsed.dependencies Vector of job dependencies after reformatting.
plot.amplicon.coverage.per.sample
Description
Create one scatterplot per sample, showing coverage per amplicon, and an additional plot giving the median
Usage
## S3 method for class 'amplicon.coverage.per.sample'
plot(coverage.statistics,
output.directory)
Arguments
coverage.statistics |
Data frame containing coverage per amplicon per sample, typically from |
output.directory |
Directory where per sample plots should be saved |
Value
None
Plot amplicon coverage by genome order
Description
Use values obtained by bedtools coverage to make a plot of coverage by genome order
Usage
## S3 method for class 'coverage.by.genome.order'
plot(coverage.data)
Arguments
coverage.data |
data frame with results from bedtools coverage command |
plot.coverage.by.sample
Description
Make a barplot of coverage per sample
Usage
## S3 method for class 'coverage.by.sample'
plot(coverage.sample, file.name,
statistic = c("mean", "median"))
Arguments
coverage.sample |
Data frame of coverage data, typically from |
file.name |
Name of output file |
statistic |
Statistic to be plotted (mean or median) |
Value
None
plot.ontarget.percent
Description
Make a scatterplot of ontarget percent per sample
Usage
## S3 method for class 'ontarget.percent'
plot(coverage.sample, file.name)
Arguments
coverage.sample |
Data frame of coverage data, typically from |
file.name |
Name of output file |
Value
None
plot.paired.percent
Description
Make a barplot of percent paired reads per sample
Usage
## S3 method for class 'paired.percent'
plot(coverage.sample, file.name)
Arguments
coverage.sample |
Data frame of coverage data, typically from |
file.name |
Name of output file |
Value
None
Post-processing of variants to generate outputs
Description
Post-processing of variants to generate outputs
Usage
post.processing(variant.specification, project.directory,
config.file = NULL, variant.callers = NULL,
remove.structural.variants = TRUE,
separate.consensus.filters = FALSE, sleep = FALSE, verbose = FALSE)
Arguments
variant.specification |
Data frame specifying variants to be processed, or path to data frame (useful if calling from Perl) |
project.directory |
Directory where output should be stored. Output files will be saved to a datestamped subdirectory |
config.file |
Path to config file specifying post-processing options. If not provided, the current options are used (i.e. from |
variant.callers |
Optional vector of variant callers for which filters should be included in Excel file |
remove.structural.variants |
Logical indicating whether structural variants (including CNVs) should be removed. Defaults to TRUE. |
separate.consensus.filters |
Logical indicating whether to apply different thresholds to variants called by more than one caller (specified under consensus in config file). Defaults to FALSE. |
sleep |
Logical indicating whether script should sleep for 60 seconds before starting. |
verbose |
Logical indicating whether to print verbose output |
Value
None
Prepare BAM specification data frame to standardized format for downstream analyses.
Description
This function prepares a data frame that can be used to run variant callers. For matched normal variant calling, this data frame will contain three columns with names: sample.id, tumour.bam, normal.bam For unpaired variant calling, the data frame will contain two columns with names: sample.id, tumour.bam
Usage
prepare.bam.specification(sample.details, paired = TRUE,
sample.id.column = 1, tumour.bam.column = 2, normal.bam.column = 3)
Arguments
sample.details |
Data frame where each row represents a sample to be run. Must contain sample ID, path to tumour BAM, and path to normal BAM. |
paired |
Logical indicating whether the sample specification is for a paired analysis. |
sample.id.column |
Index or string giving column of sample.details that contains the sample ID |
tumour.bam.column |
Index or string giving column of sample.details that contains the path to the tumour BAM |
normal.bam.column |
Index or string giving column of sample.details that contains the path to the normal BAM |
Value
bam.specification Data frame with one row per sample to be run
prepare.fastq.specification
Description
Prepare FASTQ specification data frame to standardized format for downstream analyses.
Usage
prepare.fastq.specification(sample.details, sample.id.column = 1,
fastq.columns = c(2, 3), patient.id.column = NA,
tissue.column = NA)
Arguments
sample.details |
Data frame where each row represents a sample to be run. Must contain sample ID, path to tumour BAM, and path to normal BAM. |
sample.id.column |
Index or string giving column of |
fastq.columns |
Index or string giving column(s) of |
patient.id.column |
Index or string giving column of |
tissue.column |
Index or string giving column of |
Details
This function prepares a data frame that can be used to run alignment. For paired-end reads, this data frame will contain three columns with names: sample.id, reads, mates For single-end reads, the data frame will contain two columns with names: sample.id, reads
Value
Data frame with one row per sample to be run
prepare.miniseq.specifications
Description
Process a MiniSeq directory and sample sheet to get specification data frames that can be used to run the VariTAS pipeline.
Note: This assumes normal samples are not available.
Usage
prepare.miniseq.specifications(sample.sheet, miniseq.directory)
Arguments
sample.sheet |
Data frame containing sample information, or path to a MiniSeq sample sheet |
miniseq.directory |
Path to directory with MiniSeq files |
Value
A list with specification data frames 'fastq', 'bam', and 'vcf' (as applicable)
Examples
miniseq.sheet <- file.path(path.package('varitas'), 'extdata/miniseq/Example_template.csv')
miniseq.directory <- file.path(path.package('varitas'), 'extdata/miniseq')
miniseq.info <- prepare.miniseq.specifications(miniseq.sheet, miniseq.directory)
prepare.vcf.specification
Description
Prepare VCF specification data frame for annotation
Usage
prepare.vcf.specification(vcf.details, sample.id.column = 1,
vcf.column = 2, job.dependency.column = NA, caller.column = NA)
Arguments
vcf.details |
Data frame containing details of VCF files |
sample.id.column |
Identifier of column in |
vcf.column |
Identifier of column in |
job.dependency.column |
Identifier of column in |
caller.column |
Identifier of column in |
Value
Properly formatted VCF details
Process coverageBed reports
Description
Process the coverage reports generated by bedtools coverage tool.
Usage
process.coverage.reports(project.directory)
Arguments
project.directory |
Path to project directory. Each sample should have its own subdirectory |
Value
final.statistics data frame of coverage statistics generated by parsing through coverage reports
Process sample contamination checks
Description
Takes *selfSM reports generated by VerifyBamID during alignment, and returns a vector of freemix scores. The freemix score is a sequence only estimate of sample contamination that ranges from 0 to 1.
Note: Targeted panels are often too small for this step to work properly.
Usage
process.sample.contamination.checks(project.directory)
Arguments
project.directory |
Path to project directory. Each sample should have its own subdirectory |
Value
freemix.scores Data frame giving sample contamination (column freemix) score per sample.
References
https://genome.sph.umich.edu/wiki/VerifyBamID
Process total coverage statistics
Description
Process reports generated by flagstat. Assumes reports for before and after off-target filtering have been written to the same file, with separating headers
Usage
process.total.coverage.statistics(project.directory)
Arguments
project.directory |
Path to project directory. Each sample should have its own subdirectory |
Value
data frame with extracted statistics
read.all.calls
Description
Read all calls made with a certain caller
Usage
read.all.calls(sample.ids, caller = c("vardict", "mutect", "pgm"),
project.directory, patient.ids = NULL, apply.filters = TRUE,
variant.file.pattern = NULL)
Arguments
sample.ids |
Vector giving sample IDs to process |
caller |
String indicating which caller was used |
project.directory |
Path to project directory |
patient.ids |
Optional vector giving patient ID (or other group) corresponding to each sample |
apply.filters |
Logical indicating whether filters specified in VariTAS options should be applied. Defaults to TRUE. ! |
variant.file.pattern |
Pattern indicating where the variant file can be found. Sample ID should be indicated by SAMPLE_ID |
Value
combined.variant.calls Data frame with variant calls from all patients
Read iDES output
Description
Read output from iDES_step1.pl and return data frame
Usage
read.ides.file(filename)
Arguments
filename |
path to file |
Value
ides.data data frame read from iDES output
Read variant calls from file and format for ease of downstream analyses.
Description
Read variant calls from file and format for ease of downstream analyses.
Usage
read.variant.calls(variant.file, variant.caller)
Arguments
variant.file |
Path to variant file. |
variant.caller |
String indicating which variant caller was used. Needed to format the headers. |
Value
variant.calls Data frame of variant calls
read.yaml
Description
Read a yaml file
Usage
read.yaml(file.name)
Arguments
file.name |
Path to yaml file |
Value
list containing contents of yaml file
Examples
read.yaml(file.path(path.package('varitas'), 'config.yaml'))
Run alignment
Description
Run alignment
Usage
run.alignment(fastq.specification, output.directory, paired.end = FALSE,
sample.directories = TRUE, output.subdirectory = FALSE,
job.name.prefix = NULL, job.group = "alignment", quiet = FALSE,
verify.options = !quiet)
Arguments
fastq.specification |
Data frame detailing FASTQ files to be processed, typically from prepare.fastq.specification |
output.directory |
Path to project directory |
paired.end |
Logical indicating whether paired-end sequencing was performed |
sample.directories |
Logical indicating whether all sample files should be saved to sample-specific subdirectories (will be created) |
output.subdirectory |
If further nesting is required, name of subdirectory. If no further nesting, set to FALSE |
job.name.prefix |
Prefix for job names on the cluster |
job.group |
Group job should be associated with on cluster |
quiet |
Logical indicating whether to print commands to screen rather than submit them |
verify.options |
Logical indicating whether to run verify.varitas.options |
Details
Runs alignment (and related processing steps) on each sample.
Value
None
Examples
run.alignment(
fastq.specification = data.frame(
sample.id = c('1', '2'),
reads = c('1-R1.fastq.gz', '2-R1.fastq.gz'),
mates = c('1-R2.fastq.gz', '2-R2.fastq.gz'),
patient.id = c('P1', 'P1'),
tissue = c('tumour', 'normal')
),
output.directory = '.',
quiet = TRUE,
paired.end = TRUE
)
Run alignment for a single sample
Description
Run alignment for a single sample
Usage
run.alignment.sample(fastq.files, sample.id, output.directory = NULL,
output.filename = NULL, code.directory = NULL,
log.directory = NULL, config.file = NULL, job.dependencies = NULL,
job.name = NULL, job.group = NULL, quiet = FALSE,
verify.options = !quiet)
Arguments
fastq.files |
Paths to FASTQ files (one file if single-end reads, two files if paired-end) |
sample.id |
Sample ID for labelling |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
job.name |
Name of job to be submitted |
job.group |
Group job should belong to |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
verify.options |
Logical indicating whether to run verify.varitas.options |
Run all the generated bash scripts without HPC commands
Description
Run all the scripts generated by previous parts of the pipeline, without using HPC commands
Usage
run.all.scripts(output.directory, stages.to.run = c("alignment", "qc",
"calling", "annotation", "merging"), variant.callers = NULL,
quiet = FALSE)
Arguments
output.directory |
Main directory where all files should be saved |
stages.to.run |
A character vector of all stages that need running |
variant.callers |
A character vector of variant callers to run |
quiet |
Logical indicating whether to print commands to screen rather than submit jobs. Defaults to FALSE, can be useful to set to TRUE for testing. |
Value
None
Run annotation on a set of VCF files
Description
Takes a data frame with paths to VCF files, and runs ANNOVAR annotation on each file. To allow for smooth connections with downstream pipeline steps, the function returns a variant specification data frame that can be used as input to merging steps.
Usage
run.annotation(vcf.specification, output.directory = NULL,
job.name.prefix = NULL, job.group = NULL, quiet = FALSE,
verify.options = !quiet)
Arguments
vcf.specification |
Data frame detailing VCF files to be processed, from |
output.directory |
Path to folder where code and log files should be stored in their respective subdirectories. If not supplied, code and log files will be stored in the directory with each VCF file. |
job.name.prefix |
Prefix to be added before VCF name in job name. Defaults to 'annotate', but should be changed if running multiple callers to avoid |
job.group |
Group job should be associated with on cluster |
quiet |
Logical indicating whether to print commands to screen rather than submit them |
verify.options |
Logical indicating whether to run verify.varitas.options |
Value
Data frame with details of variant files
Examples
run.annotation(
data.frame(
sample.id = c('a', 'b'),
vcf = c('a.vcf', 'b.vcf'),
caller = c('mutect', 'mutect')
),
output.directory = '.',
quiet = TRUE
)
Run ANNOVAR on a VCF file
Description
Run ANNOVAR on a VCF file
Usage
run.annovar.vcf(vcf.file, output.directory = NULL,
output.filename = NULL, code.directory = NULL,
log.directory = NULL, config.file = NULL, job.dependencies = NULL,
job.group = NULL, job.name = NULL, isis = FALSE, quiet = FALSE,
verify.options = !quiet)
Arguments
vcf.file |
Path to VCF file |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
job.group |
Group job should belong to |
job.name |
Name of job to be submitted |
isis |
Logical indicating whether VCF files are from the isis (MiniSeq) variant caller |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
verify.options |
Logical indicating whether to run verify.varitas.options |
Value
None
Run filtering on an ANNOVAR-annotated txt file
Description
Run filtering on an ANNOVAR-annotated txt file
Usage
run.filtering.txt(variant.file, caller = c("consensus", "vardict",
"ides", "mutect"), output.directory = NULL, output.filename = NULL,
code.directory = NULL, log.directory = NULL, config.file = NULL,
job.dependencies = NULL, job.group = NULL, quiet = FALSE)
Arguments
variant.file |
Path to variant file |
caller |
String giving variant caller that was used (affects which filters were applied. |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
job.group |
Group job should belong to |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
Run iDES
Description
Run iDES
Usage
run.ides(project.directory, sample.id.pattern = "._S\\d+$",
sample.ids = NULL, job.dependencies = NULL)
Arguments
project.directory |
Directory containing files |
sample.id.pattern |
Regex pattern to match sample IDs |
sample.ids |
Vector of sample IDs |
job.dependencies |
Vector of job dependencies |
Details
Run iDES step 1on each sample, to tally up calls by strand. Files are output to a the sample subdirectory
Value
None
Note
Deprecated function for running iDES. Follows previous development package without specification data frames
References
https://cappseq.stanford.edu/ides/
Run LoFreq for a sample
Description
Run LoFreq for a sample
Usage
run.lofreq.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
output.directory = NULL, output.filename = NULL,
code.directory = NULL, log.directory = NULL, config.file = NULL,
job.dependencies = NULL, quiet = FALSE, job.name = NULL,
verify.options = !quiet, job.group = NULL)
Arguments
tumour.bam |
Path to tumour sample BAM file. |
sample.id |
Sample ID for labelling |
paired |
Logical indicating whether to do variant calling with a matched normal. |
normal.bam |
Path to normal BAM file if |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
job.name |
Name of job to be submitted |
verify.options |
Logical indicating whether to run verify.varitas.options |
job.group |
Group job should belong to |
Run MuSE for a sample
Description
Run MuSE for a sample
Usage
run.muse.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
output.directory = NULL, output.filename = NULL,
code.directory = NULL, log.directory = NULL, config.file = NULL,
job.dependencies = NULL, quiet = FALSE, job.name = NULL,
verify.options = !quiet, job.group = NULL)
Arguments
tumour.bam |
Path to tumour sample BAM file. |
sample.id |
Sample ID for labelling |
paired |
Logical indicating whether to do variant calling with a matched normal. |
normal.bam |
Path to normal BAM file if |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
job.name |
Name of job to be submitted |
verify.options |
Logical indicating whether to run verify.varitas.options |
job.group |
Group job should belong to |
Run MuTect for a sample
Description
Run MuTect for a sample
Usage
run.mutect.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
output.directory = NULL, output.filename = NULL,
code.directory = NULL, log.directory = NULL, config.file = NULL,
job.dependencies = NULL, quiet = FALSE, job.name = NULL,
verify.options = !quiet, job.group = NULL)
Arguments
tumour.bam |
Path to tumour sample BAM file. |
sample.id |
Sample ID for labelling |
paired |
Logical indicating whether to do variant calling with a matched normal. |
normal.bam |
Path to normal BAM file if |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
job.name |
Name of job to be submitted |
verify.options |
Logical indicating whether to run verify.varitas.options |
job.group |
Group job should belong to |
run.post.processing
Description
Submit post-processing job to the cluster with appropriate job dependencies
Usage
run.post.processing(variant.specification, output.directory,
code.directory = NULL, log.directory = NULL, config.file = NULL,
job.name.prefix = NULL, quiet = FALSE, email = NULL,
verify.options = !quiet)
Arguments
variant.specification |
Data frame specifying files to be processed |
output.directory |
Path to directory where output should be saved |
code.directory |
Directory where code should be saved |
log.directory |
Directory where log files should be saved |
config.file |
Path to config file |
job.name.prefix |
Prefix for job names on the cluster |
quiet |
Logical indicating whether to print commands to screen rather than submit the job |
email |
Email address that should be notified when job finishes. If NULL or FALSE, no email is sent |
verify.options |
Logical indicating whether |
Value
None
Examples
run.post.processing(
variant.specification = data.frame(
sample.id = c('a', 'b'),
vcf = c('a.vcf', 'b.vcf'),
caller = c('mutect', 'mutect'),
job.dependency = c('example1', 'example2')
),
output.directory = '.',
quiet = TRUE
)
Perform sample QC by looking at target coverage.
Description
Perform sample QC by looking at target coverage.
Usage
run.target.qc(bam.specification, project.directory,
sample.directories = TRUE, paired = FALSE,
output.subdirectory = FALSE, quiet = FALSE, job.name.prefix = NULL,
verify.options = FALSE, job.group = "target_qc")
Arguments
bam.specification |
Data frame containing details of BAM files to be processed, typically from |
project.directory |
Path to project directory where code and log files should be saved |
sample.directories |
Logical indicating whether output for each sample should be put in its own directory (within output.directory) |
paired |
Logical indicating whether the analysis is paired. This does not affect QC directly, but means normal samples get nested |
output.subdirectory |
If further nesting is required, name of subdirectory. If no further nesting, set to FALSE |
quiet |
Logical indicating whether to print commands to screen rather than submit the job |
job.name.prefix |
Prefix for job names on the cluster |
verify.options |
Logical indicating whether to run verify.varitas.options |
job.group |
Group job should be associated with on cluster |
Get ontarget reads and run coverage quality control
Description
Get ontarget reads and run coverage quality control
Usage
run.target.qc.sample(bam.file, sample.id, output.directory = NULL,
code.directory = NULL, log.directory = NULL, config.file = NULL,
job.dependencies = NULL, job.name = NULL, job.group = NULL,
quiet = FALSE)
Arguments
bam.file |
Path to BAM file |
sample.id |
Sample ID for labelling |
output.directory |
Path to output directory |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
job.name |
Name of job to be submitted |
job.group |
Group job should belong to |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
run.vardict.sample
Description
Run VarDict on a sample. Idea: have a low-level function that simply submits job to Perl, after BAM paths have been found. and output paths already have been decided upon
Usage
run.vardict.sample(tumour.bam, sample.id, paired, proton = FALSE,
normal.bam = NULL, output.directory = NULL, output.filename = NULL,
code.directory = NULL, log.directory = NULL, config.file = NULL,
job.dependencies = NULL, job.name = NULL, job.group = NULL,
quiet = FALSE, verify.options = !quiet)
Arguments
tumour.bam |
Path to tumour sample BAM file. |
sample.id |
Sample ID for labelling |
paired |
Logical indicating whether to do variant calling with a matched normal. |
proton |
Logical indicating whether the data was generated by proton sequencing. Defaults to False (i.e. Illumina) |
normal.bam |
Path to normal BAM file if |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
job.name |
Name of job to be submitted |
job.group |
Group job should belong to |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
verify.options |
Logical indicating whether to run verify.varitas.options |
run.variant.calling
Description
Run variant calling for all samples
Usage
run.variant.calling(bam.specification, output.directory,
variant.callers = c("vardict", "mutect", "varscan", "lofreq", "muse"),
paired = TRUE, proton = FALSE, sample.directories = TRUE,
job.name.prefix = NULL, quiet = FALSE, verify.options = !quiet)
Arguments
bam.specification |
Data frame containing details of BAM files to be processed, typically from |
output.directory |
Path to directory where output should be saved |
variant.callers |
Character vector of variant callers to be used |
paired |
Logical indicating whether to do variant calling with a matched normal |
proton |
Logical indicating whether data was generated by proton sequencing (ignored if running MuTect) |
sample.directories |
Logical indicating whether output for each sample should be put in its own directory (within output.directory) |
job.name.prefix |
Prefix for job names on the cluster |
quiet |
Logical indicating whether to print commands to screen rather than submit the job |
verify.options |
Logical indicating whether to run verify.varitas.options |
Details
Run VarDict on each sample, and annotate the results with ANNOVAR. Files are output to a vardict/ subdirectory within each sample directory.
Value
None
Examples
run.variant.calling(
data.frame(sample.id = c('Z', 'Y'), tumour.bam = c('Z.bam', 'Y.bam')),
output.directory = '.',
variant.caller = c('lofreq', 'mutect'),
quiet = TRUE,
paired = FALSE
)
Run VariTAS pipeline in full.
Description
Run all steps in VariTAS processing pipeline, with appropriate dependencies.
Usage
run.varitas.pipeline(file.details, output.directory, run.name = NULL,
start.stage = c("alignment", "qc", "calling", "annotation", "merging"),
variant.callers = NULL, proton = FALSE, quiet = FALSE,
email = NULL, verify.options = !quiet,
save.specification.files = !quiet)
Arguments
file.details |
Data frame containing details of files to be used during first processing step. Depending on what you want to be the first step in the pipeline, this can either be FASTQ files, BAM files, VCF files, or variant (txt) files. |
output.directory |
Main directory where all files should be saved |
run.name |
Name of pipeline run. Will be added as a prefix to all LSF jobs. |
start.stage |
String indicating which stage pipeline should start at. If starting at a later stage of the pipeline, appropriate input files must be provided. For example, if starting with annotation, VCF files with variant calls must be provided. |
variant.callers |
Vector specifying which variant callers should be run. |
proton |
Logical indicating if data was generated by proton sequencing. Used to set base quality thresholds in variant calling steps. |
quiet |
Logical indicating whether to print commands to screen rather than submit jobs. Defaults to FALSE, can be useful to set to TRUE for testing. |
email |
Email address that should be notified when pipeline finishes. If NULL or FALSE, no email is sent. |
verify.options |
Logical indicating whether to run verify.varitas.options |
save.specification.files |
Logical indicating if specification files should be saved to project directory |
Value
None
Examples
run.varitas.pipeline(
file.details = data.frame(
sample.id = c('1', '2'),
reads = c('1-R1.fastq.gz', '2-R1.fastq.gz'),
mates = c('1-R2.fastq.gz', '2-R2.fastq.gz'),
patient.id = c('P1', 'P1'),
tissue = c('tumour', 'normal')
),
output.directory = '.',
quiet = TRUE,
run.name = "Test",
variant.callers = c('mutect', 'varscan')
)
run.varitas.pipeline.hybrid
Description
Run VariTAS pipeline starting from both VCF files and BAM/ FASTQ files. Useful for processing data from the Ion PGM or MiniSeq where variant calling has been done on the machine, but you are interested in running more variant callers.
Usage
run.varitas.pipeline.hybrid(vcf.specification, output.directory,
run.name = NULL, fastq.specification = NULL,
bam.specification = NULL, variant.callers = c("mutect", "vardict",
"varscan", "lofreq", "muse"), proton = FALSE, quiet = FALSE,
email = NULL, verify.options = !quiet,
save.specification.files = !quiet)
Arguments
vcf.specification |
Data frame containing details of vcf files to be processed. Must contain columns sample.id, vcf, and caller |
output.directory |
Main directory where all files should be saved |
run.name |
Name of pipeline run. Will be added as a prefix to all LSF jobs. |
fastq.specification |
Data frame containing details of FASTQ files to be processed |
bam.specification |
Data frame containing details of BAM files to be processed |
variant.callers |
Vector specifying which variant callers should be run. |
proton |
Logical indicating if data was generated by proton sequencing. Used to set base quality thresholds in variant calling steps. |
quiet |
Logical indicating whether to print commands to screen rather than submit jobs. Defaults to FALSE, can be useful to set to TRUE for testing. |
email |
Email address that should be notified when pipeline finishes. If NULL or FALSE, no email is sent. |
verify.options |
Logical indicating whether to run verify.varitas.options |
save.specification.files |
Logical indicating if specification files should be saved to project directory |
Value
None
Examples
run.varitas.pipeline.hybrid(
bam.specification = data.frame(sample.id = c('Z', 'Y'), tumour.bam = c('Z.bam', 'Y.bam')),
vcf.specification = data.frame(
sample.id = c('a', 'b'),
vcf = c('a.vcf', 'b.vcf'),
caller = c('pgm', 'pgm')
),
output.directory = '.',
quiet = TRUE,
run.name = "Test",
variant.callers = c('mutect', 'varscan')
)
Run VarScan for a sample
Description
Run VarScan for a sample
Usage
run.varscan.sample(tumour.bam, sample.id, paired, normal.bam = NULL,
output.directory = NULL, output.filename = NULL,
code.directory = NULL, log.directory = NULL, config.file = NULL,
job.dependencies = NULL, quiet = FALSE, job.name = NULL,
verify.options = !quiet, job.group = NULL)
Arguments
tumour.bam |
Path to tumour sample BAM file. |
sample.id |
Sample ID for labelling |
paired |
Logical indicating whether to do variant calling with a matched normal. |
normal.bam |
Path to normal BAM file if |
output.directory |
Path to output directory |
output.filename |
Name of resulting VCF file (defaults to SAMPLE_ID.vcf) |
code.directory |
Path to directory where code should be stored |
log.directory |
Path to directory where log files should be stored |
config.file |
Path to config file |
job.dependencies |
Vector with names of job dependencies |
quiet |
Logical indicating whether to print command to screen rather than submit it to the system. Defaults to false, useful for debugging. |
job.name |
Name of job to be submitted |
verify.options |
Logical indicating whether to run verify.varitas.options |
job.group |
Group job should belong to |
save.config
Description
Save current varitas config options to a temporary file, and return filename.
Usage
save.config(output.file = NULL)
Arguments
output.file |
Path to output file. If NULL (default), the config file will be saved as a temporary file. |
Value
Path to config file
Save coverage statistics to multi-worksheet Excel file.
Description
Save coverage statistics to multi-worksheet Excel file.
Usage
save.coverage.excel(project.directory, file.name, overwrite = TRUE)
Arguments
project.directory |
Path to project directory |
file.name |
Name of output file |
overwrite |
Logical indicating whether to overwrite existing file if it exists. |
Value
None
Save variants to Excel.
Description
Makes an Excel workbook with variant calls. If filters are provided, these will be saved to an additional worksheet within the same file.
Usage
save.variants.excel(variants, file.name, filters = NULL,
overwrite = TRUE)
Arguments
variants |
Data frame containing variants |
file.name |
Name of output file |
filters |
Optional list of filters to be saved |
overwrite |
Logical indicating whether to overwrite exiting file if it exists. Defaults to TRUE for consistency with other R functions. |
Set options for varitas pipeline.
Description
Set or overwrite options for the VariTAS pipeline. Nested options should be separated by a dot. For example, to update the reference genome for grch38, use reference_genome.grch38
Usage
set.varitas.options(...)
Arguments
... |
options to set |
Value
None
Examples
## Not run:
set.varitas.options(reference_build = 'grch38');
set.varitas.options(
filters.mutect.min_normal_depth = 10,
filters.vardict.min_normal_depth = 10
);
## End(Not run)
split.on.column
Description
Split data frame on a concatenated column.
Usage
## S3 method for class 'on.column'
split(dat, column, split.character)
Arguments
dat |
Data frame to be processed |
column |
Name of column to split on |
split.character |
Pattern giving character to split column on |
Value
Data frame after splitting on column
sum.dp4
Description
Simply calculates the depth of coverage of the variant allele given a string of DP4 values
Usage
## S3 method for class 'dp4'
sum(dp4.str)
Arguments
dp4.str |
String of DP4 values in the form "1234,1234,1234,1234" |
Run ls command
Description
Runs ls command on system. This is a workaround since list.files can not match patterns based on subdirectory structure.
Usage
system.ls(pattern = "", directory = "", error = FALSE)
Arguments
pattern |
pattern to match files |
directory |
base directory command should be run from |
error |
logical indicating whether to throw an error if no matching founds found. Defaults to False. |
Value
paths returned by ls command
tabular.mean
Description
Calculate the mean of data in tabular format
Usage
tabular.mean(values, frequencies, ...)
Arguments
values |
vector of values |
frequencies |
frequency corresponding to each value |
... |
Additional parameters passed to |
Value
calculated mean
tabular.median
Description
Calculate the median of data in tabular format
Usage
tabular.median(values, frequencies, ...)
Arguments
values |
Vector of values |
frequencies |
Frequency corresponding to each value |
... |
Additional parameters passed to |
Value
calculated median
Make barplot of trinucleotide substitutions
Description
Make barplot of trinucleotide substitutions
Usage
trinucleotide.barplot(variants, file.name)
Arguments
variants |
Data frame with variants |
file.name |
Name of output file |
Value
None
Make barplot of variants per caller
Description
Make barplot of variants per caller
Usage
variant.recurrence.barplot(variants, file.name)
Arguments
variants |
Data frame with variants |
file.name |
Name of output file |
Value
None
Make barplot of variants per caller
Description
Make barplot of variants per caller
Usage
variants.caller.barplot(variants, file.name, group.by = NULL)
Arguments
variants |
Data frame with variants |
file.name |
Name of output file |
group.by |
Optional grouping variable for barplot |
Value
None
Make barplot of variants per sample
Description
Make barplot of variants per sample
Usage
variants.sample.barplot(variants, file.name)
Arguments
variants |
Data frame with variants |
file.name |
Name of output file |
Value
None
Check that sample specification data frame matches expected format, and that all files exist
Description
Check that sample specification data frame matches expected format, and that all files exist
Usage
verify.bam.specification(bam.specification)
Arguments
bam.specification |
Data frame containing columns sample.id and tumour.bam, and optionally a column normal.bam. |
Value
None
verify.bwa.index
Description
Verify that bwa index files exist for a fasta file
Usage
verify.bwa.index(fasta.file, error = FALSE)
Arguments
fasta.file |
Fasta file to check |
error |
Logical indicating whether to throw an (informative) error if verification fails |
Value
index.files.exist Logical indicating if bwa index files were found (only returned if error set to FALSE)
verify.fasta.index
Description
Verify that fasta index files exist for a given fasta file.
Usage
verify.fasta.index(fasta.file, error = FALSE)
Arguments
fasta.file |
Fasta file to check |
error |
Logical indicating whether to throw an (informative) error if verification fails |
Value
faidx.exists Logical indicating if fasta index files were found (only returned if error set to FALSE)
Check that FASTQ specification data frame matches expected format, and that all files exist
Description
Check that FASTQ specification data frame matches expected format, and that all files exist
Usage
verify.fastq.specification(fastq.specification, paired.end = FALSE,
files.ready = FALSE)
Arguments
fastq.specification |
Data frame containing columns sample.id and reads, and optionally a column mates |
paired.end |
Logical indicating whether paired end reads are used |
files.ready |
Logical indicating if the files already exist on disk. If there are job dependencies, this should be set to FALSE. |
Value
None
verify.sequence.dictionary
Description
Verify that sequence dictionary exists for a fasta file.
Usage
verify.sequence.dictionary(fasta.file, error = FALSE)
Arguments
fasta.file |
Fasta file to check |
error |
Logical indicating whether to throw an (informative) error if verification fails |
Value
dict.exists Logical indicating if sequence dictionary files were found (only returned if error set to FALSE)
Check against common errors in the VariTAS options.
Description
Check against common errors in the VariTAS options before launching into pipeline
Usage
verify.varitas.options(stages.to.run = c("alignment", "qc", "calling",
"annotation", "merging"), variant.callers = c("mutect", "vardict",
"ides", "varscan", "lofreq", "muse"), varitas.options = NULL)
Arguments
stages.to.run |
Vector indicating which stages should be run. Defaults to all possible stages. If only running a subset of stages, only checks corresponding to the desired stages are run |
variant.callers |
Vector indicating which variant callers to run. Only used if calling is in |
varitas.options |
Optional file path or list of VariTAS options. |
Value
None
verify.vcf.specification
Description
Verify that VCF specification data frame fits expected format
Usage
verify.vcf.specification(vcf.specification)
Arguments
vcf.specification |
VCF specification data frame |
Value
None