Type: | Package |
Title: | Subclone Multiplicity Allocation and Somatic Heterogeneity |
Version: | 1.0.0 |
Date: | 2025-02-07 |
Description: | Cluster user-supplied somatic read counts with corresponding allele-specific copy number and tumor purity to infer feasible underlying intra-tumor heterogeneity in terms of number of subclones, multiplicity, and allocation (Little et al. (2019) <doi:10.1186/s13073-019-0643-9>). |
License: | GPL (≥ 3) |
Imports: | Rcpp, stats, smarter, reshape2, ggplot2 |
LinkingTo: | Rcpp, RcppArmadillo |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 2.10) |
RoxygenNote: | 7.2.3 |
Suggests: | knitr, devtools |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2025-02-26 05:10:10 UTC; Admin |
Author: | Paul Little [aut, cre] |
Maintainer: | Paul Little <pllittle321@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-02-27 16:40:06 UTC |
ITH_optim
Description
Performs EM algorithm for a given configuration matrix
Usage
ITH_optim(
my_data,
my_purity,
init_eS,
pi_eps0 = NULL,
my_unc_q = NULL,
max_iter = 4000,
my_epsilon = 1e-06
)
Arguments
my_data |
A R dataframe containing the following columns:
|
my_purity |
A single numeric value of known/estimated purity |
init_eS |
A subclone configuration matrix pre-defined in R
list |
pi_eps0 |
A user-specified parameter denoting the proportion
of loci not explained by the combinations of purity, copy number,
multiplicity, and allocation. If |
my_unc_q |
An optimal initial vector for the unconstrained
|
max_iter |
Positive integer, preferably 1000 or more, setting the maximum number of iterations |
my_epsilon |
Convergence criterion threshold for changes in the log likelihood, preferably 1e-6 or smaller |
Value
If the EM algorithm converges, the output will be a list containing
iter
number of iterations
converge
convergence status
unc_q0
initial unconstrained subclone proportions parameter
unc_q
unconstrained estimate of
q
q
estimated subclone proportions among cancer cells
CN_MA_pi
estimated mixture probabilities of multiplicities and allocations given copy number states
eta
estimated subclone proportion among tumor cells
purity
user-inputted tumor purity
entropy
estimated entropy
infer
A R dataframe containing inferred variant allocations (
infer_A
), multiplicities (infer_M
), cellular prevalences (infer_CP
).ms
model size, number of parameters within parameter space
LL
The observed log likelihood evaluated at maximum likelihood estimates.
AIC = 2 * LL - 2 * ms
Negative AIC, used for model selection
BIC = 2 * LL - ms * log(LOCI)
Negative BIC, used for model selection
LOCI
The number of inputted somatic variants.
A collection of pre-defined subclone configurations.
Description
A R list containing subclone configurations in matrix form for 1 to 5 subclones. For each matrix, each column corresponds to a subclone and each row corresponds to a variant's allocation across all subclones. For example, the first row of each matrix is a vector of 1's to represent clonal variants, variants present in all subclones.
Usage
eS
Format
An object of class list
of length 5.
gen_ITH_RD
Description
Simulates observed alternate and reference read counts
Usage
gen_ITH_RD(DATA, RD)
Arguments
DATA |
The output data.frame from |
RD |
A positive integer for the mean read depth generated from the negative binomial distribution |
Value
A matrix of simulated alternate and reference read counts.
gen_subj_truth
Description
Simulates copy number states, multiplicities, allocations
Usage
gen_subj_truth(mat_eS, maxLOCI, nCN = NULL)
Arguments
mat_eS |
A subclone configuration matrix pre-defined in R list |
maxLOCI |
A positive integer number of simulated somatic variant calls |
nCN |
A positive integer for the number of allelic copy number pairings
to sample from. If |
Value
A list containing the following components:
subj_truth
dataframe of each variant's simulated minor (
CN_1
) and major (CN_2
) copy number states, total copy number (tCN
), subclone allocation (true_A
), multiplicity (true_M
), mutant allele frequency (true_MAF
), and cellular prevalence (true_CP
)purity
tumor purity
eta
the product of tumor purity and subclone proportions
q
vector of subclone proportions
grid_ITH_optim
Description
This function performs a grid search over enumerated
configurations within the pre-defined list eS
Usage
grid_ITH_optim(
my_data,
my_purity,
list_eS,
pi_eps0 = NULL,
trials = 20,
max_iter = 4000,
my_epsilon = 1e-06
)
Arguments
my_data |
A R dataframe containing the following columns:
|
my_purity |
A single numeric value of known/estimated purity |
list_eS |
A nested list of subclone configuration matrices |
pi_eps0 |
A user-specified parameter denoting the proportion
of loci not explained by the combinations of purity, copy number,
multiplicity, and allocation. If |
trials |
Positive integer, number of random initializations of subclone proportions |
max_iter |
Positive integer, preferably 1000 or more, setting the maximum number of iterations |
my_epsilon |
Convergence criterion threshold for changes in the log likelihood, preferably 1e-6 or smaller |
Value
A R list containing two objects. GRID
is a
dataframe where each row denotes a feasible subclone configuration
with corresponding subclone proportion estimates q
and
somatic variant allocations alloc
. INFER
is a list
where INFER[[i]]
corresponds to the i
-th row or
model of GRID
.
vis_GRID
Description
A simple visualization of SMASH's grid of solutions
Usage
vis_GRID(GRID)
Arguments
GRID |
The |
Value
A ggplot object for data visualization