Title: | Calculate Pairwise Distances |
Version: | 0.0.5 |
Description: | A common framework for calculating distance matrices. |
Depends: | R (≥ 3.2.2) |
License: | GPL-2 | GPL-3 [expanded from: GPL] |
URL: | https://github.com/blasern/rdist |
BugReports: | https://github.com/blasern/rdist/issues |
Encoding: | UTF-8 |
LazyData: | true |
LinkingTo: | Rcpp, RcppArmadillo |
Imports: | Rcpp, methods |
RoxygenNote: | 7.1.0 |
Suggests: | testthat |
NeedsCompilation: | yes |
Packaged: | 2020-05-04 12:51:18 UTC; nbl003 |
Author: | Nello Blaser [aut, cre] |
Maintainer: | Nello Blaser <nello.blaser@uib.no> |
Repository: | CRAN |
Date/Publication: | 2020-05-04 16:00:02 UTC |
Farthest point sampling
Description
Farthest point sampling returns a reordering of the metric space P = p_1, ..., p_k, such that each p_i is the farthest point from the first i-1 points.
Usage
farthest_point_sampling(
mat,
metric = "precomputed",
k = nrow(mat),
initial_point_index = 1L,
return_clusters = FALSE
)
Arguments
mat |
Original distance matrix |
metric |
Distance metric to use (either "precomputed" or a metric from |
k |
Number of points to sample |
initial_point_index |
Index of p_1 |
return_clusters |
Should the indices of the closest farthest points be returned? |
Examples
# generate data
df <- matrix(runif(200), ncol = 2)
dist_mat <- pdist(df)
# farthest point sampling
fps <- farthest_point_sampling(dist_mat)
fps2 <- farthest_point_sampling(df, metric = "euclidean")
all.equal(fps, fps2)
# have a look at the fps distance matrix
rdist(df[fps[1:5], ])
dist_mat[fps, fps][1:5, 1:5]
Metric and triangle inequality
Description
Does the distance matric come from a metric
Usage
is_distance_matrix(mat, tolerance = .Machine$double.eps^0.5)
triangle_inequality(mat, tolerance = .Machine$double.eps^0.5)
Arguments
mat |
The matrix to evaluate |
tolerance |
Differences smaller than tolerance are not reported. |
Examples
data <- matrix(rnorm(20), ncol = 2)
dm <- pdist(data)
is_distance_matrix(dm)
triangle_inequality(dm)
dm[1, 2] <- 1.1 * dm[1, 2]
is_distance_matrix(dm)
Product metric
Description
Returns the p-product metric of two metric spaces. Works for output of 'rdist', 'pdist' or 'cdist'.
Usage
product_metric(..., p = 2)
Arguments
... |
Distance matrices or dist objects |
p |
The power of the Minkowski distance |
Examples
# generate data
df <- matrix(runif(200), ncol = 2)
# distance matrices
dist_mat <- pdist(df)
dist_1 <- pdist(df[, 1])
dist_2 <- pdist(df[, 2])
# product distance matrix
dist_prod <- product_metric(dist_1, dist_2)
# check equality
all.equal(dist_mat, dist_prod)
rdist: an R package for distances
Description
rdist
provide a common framework to calculate distances. There are three main functions:
-
rdist
computes the pairwise distances between observations in one matrix and returns adist
object, -
pdist
computes the pairwise distances between observations in one matrix and returns amatrix
, and -
cdist
computes the distances between observations in two matrices and returns amatrix
.
In particular the cdist
function is often missing in other distance functions. All
calculations involving NA
values will consistently return NA
.
Usage
rdist(X, metric = "euclidean", p = 2L)
pdist(X, metric = "euclidean", p = 2)
cdist(X, Y, metric = "euclidean", p = 2)
Arguments
X , Y |
A matrix |
metric |
The distance metric to use |
p |
The power of the Minkowski distance |
Details
Available distance measures are (written for two vectors v and w):
-
"euclidean"
:\sqrt{\sum_i(v_i - w_i)^2}
-
"minkowski"
:(\sum_i|v_i - w_i|^p)^{1/p}
-
"manhattan"
:\sum_i(|v_i-w_i|)
-
"maximum"
or"chebyshev"
:\max_i(|v_i-w_i|)
-
"canberra"
:\sum_i(\frac{|v_i-w_i|}{|v_i|+|w_i|})
-
"angular"
:\cos^{-1}(cor(v, w))
-
"correlation"
:\sqrt{\frac{1-cor(v, w)}{2}}
-
"absolute_correlation"
:\sqrt{1-|cor(v, w)|^2}
-
"hamming"
:(\sum_i v_i \neq w_i) / \sum_i 1
-
"jaccard"
:(\sum_i v_i \neq w_i) / \sum_i 1_{v_i \neq 0 \cup w_i \neq 0}
Any function that defines a distance between two vectors.