Type: | Package |
Title: | Test Similarity Between Binary Data using Jaccard/Tanimoto Coefficients |
Version: | 0.1.0 |
Date: | 2018-06-06 |
Author: | Neo Christopher Chung <nchchung@gmail.com>, Błażej Miasojedow <bmiasojedow@gmail.com>, Michał Startek <M.Startek@mimuw.edu.pl>, Anna Gambin <aniag@mimuw.edu.pl> |
Maintainer: | Neo Christopher Chung <nchchung@gmail.com> |
Description: | Calculate statistical significance of Jaccard/Tanimoto similarity coefficients for binary data. |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | Rcpp (≥ 0.12.6), qvalue, dplyr, magrittr |
LinkingTo: | Rcpp |
NeedsCompilation: | yes |
SystemRequirements: | C++11 |
RoxygenNote: | 6.0.1 |
Packaged: | 2018-06-10 01:52:22 UTC; nc |
Repository: | CRAN |
Date/Publication: | 2018-06-14 17:53:00 UTC |
Compute a Jaccard/Tanimoto similarity coefficient
Description
Compute a Jaccard/Tanimoto similarity coefficient
Usage
jaccard(x, y, center = FALSE, px = NULL, py = NULL)
Arguments
x |
a binary vector (e.g., fingerprint) |
y |
a binary vector (e.g., fingerprint) |
center |
whether to center the Jaccard/Tanimoto coefficient by its expectation |
px |
probability of successes in |
py |
probability of successes in |
Value
jaccard
returns a Jaccard/Tanimoto coefficient.
Examples
set.seed(1234)
x = rbinom(100,1,.5)
y = rbinom(100,1,.5)
jaccard(x,y)
Compute an expected Jaccard/Tanimoto similarity coefficient under independence
Description
Compute an expected Jaccard/Tanimoto similarity coefficient under independence
Usage
jaccard.ev(x, y, px = NULL, py = NULL)
Arguments
x |
a binary vector (e.g., fingerprint) |
y |
a binary vector (e.g., fingerprint) |
px |
probability of successes in |
py |
probability of successes in |
Value
jaccard.ev
returns an expected value.
Examples
set.seed(1234)
x = rbinom(100,1,.5)
y = rbinom(100,1,.5)
jaccard.ev(x,y)
Compute p-value using an extreme value distribution
Description
Rahman et al. (2014) proposes a method to compute a p-value of a Jaccard/Tanimoto coefficient using an extreme value distribution. Their paper provides the following description: The mean (mu) and s.d. (sigma) of the similarity scores are used to define the z score, z = (Tw - mu)/sigma. For the purpose of calculating the P value, only hits with T > 0 are considered. The P value w is derived from the z score using an extreme value distribution P = 1 - exp(-e-z*pi/sqrt(6) - G'(1)), where the Euler=Mascheroni constant G'(1)=0.577215665.
Usage
jaccard.rahman(j)
Arguments
j |
a numeric vector of observed Jaccard coefficients (uncentered) |
Value
jaccard.rahman
returns a numeric vector of p-values
References
Rahman, Cuesta, Furnham, Holliday, and Thornton (2014) EC-BLAST: a tool to automatically search and compare enzyme reactions. Nature Methods, 11(2) http://www.nature.com/nmeth/journal/v11/n2/full/nmeth.2803.html
Test for Jaccard/Tanimoto similarity coefficients
Description
Compute statistical significance of Jaccard/Tanimoto similarity coefficients between binary vectors, using four different methods.
Usage
jaccard.test(x, y, method = "mca", px = NULL, py = NULL, verbose = TRUE,
...)
Arguments
x |
a binary vector (e.g., fingerprint) |
y |
a binary vector (e.g., fingerprint) |
method |
a method to compute a p-value ( |
px |
probability of successes in |
py |
probability of successes in |
verbose |
whether to print progress messages |
... |
optional arguments for specific computational methods |
Details
There exist four methods to compute p-values of Jaccard/Tanimoto similarity coefficients:
mca
, bootstrap
, asymptotic
, and exact
. This is simply a wrapper function for
corresponding four functions in this package: jaccard.test.mca, jaccard.test.bootstrap, jaccard.test.asymptotic, and jaccard.test.exact.
We recommand using either mca
or bootstrap
methods,
since the exact
solution is slow for a moderately large vector and asymptotic
approximation may be inaccurate depending on the input vector size.
The bootstrap method uses resampling with replacement binary vectors to compute a p-value (see optional arguments).
The mca
method uses the measure concentration algorithm that estimates the multinomial distribution with a known error bound (specified by an optional argument accuracy
).
Value
jaccard.test
returns a list mainly consisting of
statistics |
centered Jaccard/Tanimoto similarity coefficient |
pvalue |
p-value |
expectation |
expectation |
Optional arguments for method="bootstrap"
- fix
whether to fix (i.e., not resample)
x
and/ory
- B
a total bootstrap iteration
- seed
a seed for a random number generator
Optional arguments for method="mca"
- accuracy
an error bound on approximating a multinomial distribution
- error.type
an error type on approximating a multinomial distribution (
"average"
,"upper"
,"lower"
)- seed
a seed for the random number generator.
See Also
jaccard.test.bootstrap jaccard.test.mca jaccard.test.exact jaccard.test.asymptotic
Examples
set.seed(1234)
x = rbinom(100,1,.5)
y = rbinom(100,1,.5)
jaccard.test(x,y,method="bootstrap")
jaccard.test(x,y,method="mca")
jaccard.test(x,y,method="exact")
jaccard.test(x,y,method="asymptotic")
Compute p-value using an asymptotic approximation
Description
Compute statistical significance of Jaccard/Tanimoto similarity coefficients.
Usage
jaccard.test.asymptotic(x, y, px = NULL, py = NULL, verbose = TRUE)
Arguments
x |
a binary vector (e.g., fingerprint) |
y |
a binary vector (e.g., fingerprint) |
px |
probability of successes in |
py |
probability of successes in |
verbose |
whether to print progress messages |
Value
jaccard.test.asymptotic
returns a list consisting of
statistics |
centered Jaccard/Tanimoto similarity coefficient |
pvalue |
p-value |
expectation |
expectation |
Examples
set.seed(1234)
x = rbinom(100,1,.5)
y = rbinom(100,1,.5)
jaccard.test.asymptotic(x,y)
Compute p-value using the bootstrap procedure
Description
Compute statistical significance of Jaccard/Tanimoto similarity coefficients.
Usage
jaccard.test.bootstrap(x, y, px = NULL, py = NULL, verbose = TRUE,
fix = "x", B = 1000, seed = NULL)
Arguments
x |
a binary vector (e.g., fingerprint) |
y |
a binary vector (e.g., fingerprint) |
px |
probability of successes in |
py |
probability of successes in |
verbose |
whether to print progress messages |
fix |
whether to fix (i.e., not resample) |
B |
a total bootstrap iteration |
seed |
a seed for a random number generator |
Value
jaccard.test.bootstrap
returns a list consisting of
statistics |
centered Jaccard/Tanimoto similarity coefficient |
pvalue |
p-value |
expectation |
expectation |
Examples
set.seed(1234)
x = rbinom(100,1,.5)
y = rbinom(100,1,.5)
jaccard.test.bootstrap(x,y,B=500)
Compute p-value using the exact solution
Description
Compute statistical significance of Jaccard/Tanimoto similarity coefficients.
Usage
jaccard.test.exact(x, y, px = NULL, py = NULL, verbose = TRUE)
Arguments
x |
a binary vector (e.g., fingerprint) |
y |
a binary vector (e.g., fingerprint) |
px |
probability of successes in |
py |
probability of successes in |
verbose |
whether to print progress messages |
Value
jaccard.test.exact
returns a list consisting of
statistics |
centered Jaccard/Tanimoto similarity coefficient |
pvalue |
p-value |
expectation |
expectation |
Examples
set.seed(1234)
x = rbinom(100,1,.5)
y = rbinom(100,1,.5)
jaccard.test.exact(x,y)
Compute p-value using the Measure Concentration Algorithm
Description
Compute statistical significance of Jaccard/Tanimoto similarity coefficients.
Usage
jaccard.test.mca(x, y, px = NULL, py = NULL, accuracy = 1e-05,
error.type = "average", verbose = TRUE)
Arguments
x |
a binary vector (e.g., fingerprint) |
y |
a binary vector (e.g., fingerprint) |
px |
probability of successes in |
py |
probability of successes in |
accuracy |
an error bound on approximating a multinomial distribution |
error.type |
an error type on approximating a multinomial distribution ("average", "upper", "lower") |
verbose |
whether to print progress messages |
Value
jaccard.test.mca
returns a list consisting of
statistics |
centered Jaccard/Tanimoto similarity coefficient |
pvalue |
p-value |
expectation |
expectation |
Examples
set.seed(1234)
x = rbinom(100,1,.5)
y = rbinom(100,1,.5)
jaccard.test.mca(x,y,accuracy = 1e-05)
Pair-wise tests for Jaccard/Tanimoto similarity coefficients
Description
Given a data matrix, it computes pair-wise Jaccard/Tanimoto similarity coefficients
and p-values among rows (variables). For fine controls, use "jaccard.test"
.
Usage
jaccard.test.pairwise(dat, method = "mca", verbose = TRUE,
compute.qvalue = TRUE, ...)
Arguments
dat |
a data matrix |
method |
a method to compute a p-value ( |
verbose |
whether to print progress messages |
compute.qvalue |
whether to compute q-values |
... |
optional arguments for specific computational methods |
Value
jaccard.test.pairwise
returns a list of matrices
statistics |
Jaccard/Tanimoto similarity coefficients |
pvalues |
p-values |
qvalues |
q-values |