Title: | Similarity and Distance Quantification Between Probability Functions |
---|---|
Description: | Computes 46 optimized distance and similarity measures for comparing probability functions (Drost (2018) <doi:10.21105/joss.00765>). These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a core framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions. |
Authors: | Hajk-Georg Drost [aut, cre] , Jakub Nowosad [ctb] |
Maintainer: | Hajk-Georg Drost <[email protected]> |
License: | GPL-2 |
Version: | 0.9.0 |
Built: | 2024-11-12 21:23:04 UTC |
Source: | https://github.com/drostlab/philentropy |
The lowlevel function for computing the additive_symm_chi_sq distance.
additive_symm_chi_sq(P, Q, testNA)
additive_symm_chi_sq(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
additive_symm_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
additive_symm_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the avg distance.
avg(P, Q, testNA)
avg(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
avg(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
avg(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the bhattacharyya distance.
bhattacharyya(P, Q, testNA, unit, epsilon)
bhattacharyya(P, Q, testNA, unit, epsilon)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of |
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Hajk-Georg Drost
bhattacharyya(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2", epsilon = 0.00001)
bhattacharyya(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2", epsilon = 0.00001)
This function implements an interface to the kernel density estimation functions provided by the KernSmooth package.
binned.kernel.est( data, kernel = "normal", bandwidth = NULL, canonical = FALSE, scalest = "minim", level = 2L, gridsize = 401L, range.data = range(data), truncate = TRUE )
binned.kernel.est( data, kernel = "normal", bandwidth = NULL, canonical = FALSE, scalest = "minim", level = 2L, gridsize = 401L, range.data = range(data), truncate = TRUE )
data |
a numeric vector containing the sample on which the kernel density estimate is to be constructed. |
kernel |
character string specifying the smoothing kernel |
bandwidth |
the kernel bandwidth smoothing parameter. |
canonical |
a logical value indicating whether canonically scaled kernels should be used |
scalest |
estimate of scale.
|
level |
number of levels of functional estimation used in the plug-in rule. |
gridsize |
the number of equally-spaced points over which binning is performed to obtain kernel functional approximation. |
range.data |
vector containing the minimum and maximum values of |
truncate |
logical value indicating whether data with x values outside the range specified by |
Hajk-Georg Drost
Matt Wand (2015). KernSmooth: Functions for Kernel Smoothing Supporting Wand & Jones (1995). R package version 2.23-14.
Henry Deng and Hadley Wickham (2011). Density estimation in R. http://vita.had.co.nz/papers/density-estimation.pdf.
The lowlevel function for computing the canberra distance.
canberra(P, Q, testNA)
canberra(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
canberra(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
canberra(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Compute Shannon's Conditional-Entropy based on the chain rule based on a given joint-probability vector
and
probability vector
.
CE(xy, y, unit = "log2")
CE(xy, y, unit = "log2")
xy |
a numeric joint-probability vector |
y |
a numeric probability vector |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
This function might be useful to fastly compute Shannon's Conditional-Entropy for any given joint-probability vector and probability vector.
Shannon's Conditional-Entropy in bit.
Note that the probability vector P(Y) must be the probability
distribution of random variable Y ( P(Y) for which H(Y) is computed ) and
furthermore used for the chain rule computation of .
Hajk-Georg Drost
Shannon, Claude E. 1948. "A Mathematical Theory of Communication". Bell System Technical Journal 27 (3): 379-423.
CE(1:10/sum(1:10),1:10/sum(1:10))
CE(1:10/sum(1:10),1:10/sum(1:10))
The lowlevel function for computing the chebyshev distance.
chebyshev(P, Q, testNA)
chebyshev(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
chebyshev(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
chebyshev(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the clark_sq distance.
clark_sq(P, Q, testNA)
clark_sq(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
clark_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
clark_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the cosine_dist distance.
cosine_dist(P, Q, testNA)
cosine_dist(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
cosine_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
cosine_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the czekanowski distance.
czekanowski(P, Q, testNA)
czekanowski(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
czekanowski(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
czekanowski(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the dice_dist distance.
dice_dist(P, Q, testNA)
dice_dist(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
dice_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
dice_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
This functions computes the distance/dissimilarity between two sets of probability density functions.
dist_many_many( dists1, dists2, method, p = NA_real_, testNA = TRUE, unit = "log", epsilon = 1e-05 )
dist_many_many( dists1, dists2, method, p = NA_real_, testNA = TRUE, unit = "log", epsilon = 1e-05 )
dists1 |
a numeric matrix storing distributions in its rows. |
dists2 |
a numeric matrix storing distributions in its rows. |
method |
a character string indicating whether the distance measure that should be computed. |
p |
power of the Minkowski distance. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
A matrix of distance values
set.seed(2020-08-20) M1 <- t(replicate(10, sample(1:10, size = 10) / 55)) M2 <- t(replicate(10, sample(1:10, size = 10) / 55)) result <- dist_many_many(M1, M2, method = "euclidean", testNA = FALSE)
set.seed(2020-08-20) M1 <- t(replicate(10, sample(1:10, size = 10) / 55)) M2 <- t(replicate(10, sample(1:10, size = 10) / 55)) result <- dist_many_many(M1, M2, method = "euclidean", testNA = FALSE)
This functions computes the distance/dissimilarity between one probability density functions and a set of probability density functions.
dist_one_many( P, dists, method, p = NA_real_, testNA = TRUE, unit = "log", epsilon = 1e-05 )
dist_one_many( P, dists, method, p = NA_real_, testNA = TRUE, unit = "log", epsilon = 1e-05 )
P |
a numeric vector storing the first distribution. |
dists |
a numeric matrix storing distributions in its rows. |
method |
a character string indicating whether the distance measure that should be computed. |
p |
power of the Minkowski distance. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
A vector of distance values
set.seed(2020-08-20) P <- 1:10 / sum(1:10) M <- t(replicate(100, sample(1:10, size = 10) / 55)) dist_one_many(P, M, method = "euclidean", testNA = FALSE)
set.seed(2020-08-20) P <- 1:10 / sum(1:10) M <- t(replicate(100, sample(1:10, size = 10) / 55)) dist_one_many(P, M, method = "euclidean", testNA = FALSE)
This functions computes the distance/dissimilarity between two probability density functions.
dist_one_one( P, Q, method, p = NA_real_, testNA = TRUE, unit = "log", epsilon = 1e-05 )
dist_one_one( P, Q, method, p = NA_real_, testNA = TRUE, unit = "log", epsilon = 1e-05 )
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
method |
a character string indicating whether the distance measure that should be computed. |
p |
power of the Minkowski distance. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
A single distance value
P <- 1:10 / sum(1:10) Q <- 20:29 / sum(20:29) dist_one_one(P, Q, method = "euclidean", testNA = FALSE)
P <- 1:10 / sum(1:10) Q <- 20:29 / sum(20:29) dist_one_one(P, Q, method = "euclidean", testNA = FALSE)
This function computes all distance values between two probability density functions that are available in getDistMethods
and returns a vector storing the corresponding distance measures. This vector is named distance diversity vector.
dist.diversity(x, p, test.na = FALSE, unit = "log2")
dist.diversity(x, p, test.na = FALSE, unit = "log2")
x |
a numeric |
p |
power of the Minkowski distance. |
test.na |
a boolean value indicating whether input vectors should be tested for NA values. Faster computations if |
unit |
a character string specifying the logarithm unit that should be used to compute distances that depend on log computations. Options are:
|
Hajk-Georg Drost
dist.diversity(rbind(1:10/sum(1:10), 20:29/sum(20:29)), p = 2, unit = "log2")
dist.diversity(rbind(1:10/sum(1:10), 20:29/sum(20:29)), p = 2, unit = "log2")
This functions computes the distance/dissimilarity between two probability density functions.
distance( x, method = "euclidean", p = NULL, test.na = TRUE, unit = "log", epsilon = 1e-05, est.prob = NULL, use.row.names = FALSE, as.dist.obj = FALSE, diag = FALSE, upper = FALSE, mute.message = FALSE )
distance( x, method = "euclidean", p = NULL, test.na = TRUE, unit = "log", epsilon = 1e-05, est.prob = NULL, use.row.names = FALSE, as.dist.obj = FALSE, diag = FALSE, upper = FALSE, mute.message = FALSE )
x |
a numeric |
method |
a character string indicating whether the distance measure that should be computed. |
p |
power of the Minkowski distance. |
test.na |
a boolean value indicating whether input vectors should be tested for |
unit |
a character string specifying the logarithm unit that should be used to compute distances that depend on log computations. |
epsilon |
a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
est.prob |
method to estimate probabilities from input count vectors such as non-probability vectors. Default:
|
use.row.names |
a logical value indicating whether or not row names from
the input matrix shall be used as rownames and colnames of the output distance matrix. Default value is |
as.dist.obj |
shall the return value or matrix be an object of class |
diag |
if |
upper |
if |
mute.message |
a logical value indicating whether or not messages printed by |
Here a distance is defined as a quantitative degree of how far two mathematical objects are apart from eachother (Cha, 2007).
This function implements the following distance/similarity measures to quantify the distance between probability density functions:
L_p Minkowski family
Euclidean :
Manhattan :
Minkowski :
Chebyshev :
L_1 family
Sorensen :
Gower :
Soergel :
Kulczynski d :
Canberra :
Lorentzian :
Intersection family
Intersection :
Non-Intersection :
Wave Hedges :
Czekanowski :
Motyka :
Kulczynski s :
Tanimoto : ; equivalent to Soergel
Ruzicka : ; equivalent to 1 - Tanimoto = 1 - Soergel
Inner Product family
Inner Product :
Harmonic mean :
Cosine :
Kumar-Hassebrook (PCE) :
Jaccard : ; equivalent to 1 - Kumar-Hassebrook
Dice :
Squared-chord family
Fidelity :
Bhattacharyya :
Hellinger :
Matusita :
Squared-chord :
Squared L_2 family (^2 squared family)
Squared Euclidean :
Pearson ^2 :
Neyman ^2 :
Squared ^2 :
Probabilistic Symmetric ^2 :
Divergence : ^2 :
Clark :
Additive Symmetric ^2 :
Shannon's entropy family
Kullback-Leibler :
Jeffreys :
K divergence :
Topsoe :
Jensen-Shannon :
Jensen difference :
Combinations
Taneja :
Kumar-Johnson :
Avg(L_1, L_n) :
In cases where x
specifies a count matrix, the argument est.prob
can be selected to first estimate probability vectors
from input count vectors and second compute the corresponding distance measure based on the estimated probability vectors.
The following probability estimation methods are implemented in this function:
est.prob = "empirical"
: relative frequencies of counts.
The following results are returned depending on the dimension of x
:
in case nrow(x)
= 2 : a single distance value.
in case nrow(x)
> 2 : a distance matrix
storing distance values for all pairwise probability vector comparisons.
According to the reference in some distance measure computations invalid computations can occur when dealing with 0 probabilities.
In these cases the convention is treated as follows:
division by zero - case 0/0
: when the divisor and dividend become zero, 0/0
is treated as 0
.
division by zero - case n/0
: when only the divisor becomes 0
, the corresponsning 0
is replaced by a small .
log of zero - case 0 * log(0)
: is treated as 0
.
log of zero - case log(0)
: zero is replaced by a small .
Hajk-Georg Drost
Sung-Hyuk Cha. (2007). Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences 4: 1.
getDistMethods
, estimate.probability
, dist.diversity
# Simple Examples # receive a list of implemented probability distance measures getDistMethods() ## compute the euclidean distance between two probability vectors distance(rbind(1:10/sum(1:10), 20:29/sum(20:29)), method = "euclidean") ## compute the euclidean distance between all pairwise comparisons of probability vectors ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39)) distance(ProbMatrix, method = "euclidean") # compute distance matrix without testing for NA values in the input matrix distance(ProbMatrix, method = "euclidean", test.na = FALSE) # alternatively use the colnames of the input data for the rownames and colnames # of the output distance matrix ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39)) rownames(ProbMatrix) <- paste0("Example", 1:3) distance(ProbMatrix, method = "euclidean", use.row.names = TRUE) # Specialized Examples CountMatrix <- rbind(1:10, 20:29, 30:39) ## estimate probabilities from a count matrix distance(CountMatrix, method = "euclidean", est.prob = "empirical") ## compute the euclidean distance for count data ## NOTE: some distance measures are only defined for probability values, distance(CountMatrix, method = "euclidean") ## compute the Kullback-Leibler Divergence with different logarithm bases: ### case: unit = log (Default) distance(ProbMatrix, method = "kullback-leibler", unit = "log") ### case: unit = log2 distance(ProbMatrix, method = "kullback-leibler", unit = "log2") ### case: unit = log10 distance(ProbMatrix, method = "kullback-leibler", unit = "log10")
# Simple Examples # receive a list of implemented probability distance measures getDistMethods() ## compute the euclidean distance between two probability vectors distance(rbind(1:10/sum(1:10), 20:29/sum(20:29)), method = "euclidean") ## compute the euclidean distance between all pairwise comparisons of probability vectors ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39)) distance(ProbMatrix, method = "euclidean") # compute distance matrix without testing for NA values in the input matrix distance(ProbMatrix, method = "euclidean", test.na = FALSE) # alternatively use the colnames of the input data for the rownames and colnames # of the output distance matrix ProbMatrix <- rbind(1:10/sum(1:10), 20:29/sum(20:29),30:39/sum(30:39)) rownames(ProbMatrix) <- paste0("Example", 1:3) distance(ProbMatrix, method = "euclidean", use.row.names = TRUE) # Specialized Examples CountMatrix <- rbind(1:10, 20:29, 30:39) ## estimate probabilities from a count matrix distance(CountMatrix, method = "euclidean", est.prob = "empirical") ## compute the euclidean distance for count data ## NOTE: some distance measures are only defined for probability values, distance(CountMatrix, method = "euclidean") ## compute the Kullback-Leibler Divergence with different logarithm bases: ### case: unit = log (Default) distance(ProbMatrix, method = "kullback-leibler", unit = "log") ### case: unit = log2 distance(ProbMatrix, method = "kullback-leibler", unit = "log2") ### case: unit = log10 distance(ProbMatrix, method = "kullback-leibler", unit = "log10")
The lowlevel function for computing the divergence_sq distance.
divergence_sq(P, Q, testNA)
divergence_sq(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
divergence_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
divergence_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
This function takes a numeric count vector and returns estimated probabilities of the corresponding counts.
The following probability estimation methods are implemented in this function:
method = "empirical"
: generates the relative frequency of the data x/sum(x)
.
estimate.probability(x, method = "empirical")
estimate.probability(x, method = "empirical")
x |
a numeric vector storing count values. |
method |
a character string specifying the estimation method tht should be used to estimate probabilities from input counts. |
a numeric probability vector.
Hajk-Georg Drost
# generate a count vector x <- runif(100) # generate a probability vector from corresponding counts x.prob <- estimate.probability(x, method = 'empirical')
# generate a count vector x <- runif(100) # generate a probability vector from corresponding counts x.prob <- estimate.probability(x, method = 'empirical')
The lowlevel function for computing the euclidean distance.
euclidean(P, Q, testNA)
euclidean(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
euclidean(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
euclidean(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the fidelity distance.
fidelity(P, Q, testNA)
fidelity(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
fidelity(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
fidelity(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
distance
This function returns the names of the methods that can be applied
to compute distances between probability density functions using the distance
function.
getDistMethods()
getDistMethods()
Hajk-Georg Drost
getDistMethods()
getDistMethods()
This function computes the Generalized Jensen-Shannon Divergence of a probability matrix.
gJSD(x, unit = "log2", weights = NULL, est.prob = NULL)
gJSD(x, unit = "log2", weights = NULL, est.prob = NULL)
x |
a probability matrix. |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
weights |
a numeric vector specifying the weights for each distribution in |
est.prob |
method to estimate probabilities from input count vectors such as non-probability vectors. Default:
|
Function to compute the Generalized Jensen-Shannon Divergence
where denote the weights selected for the probability vectors
P_1,...,P_n
and H(P_i)
denotes the Shannon Entropy of probability vector P_i
.
The Jensen-Shannon divergence between all possible combinations of comparisons.
Hajk-Georg Drost
# define input probability matrix Prob <- rbind(1:10/sum(1:10), 20:29/sum(20:29), 30:39/sum(30:39)) # compute the Generalized JSD comparing the PS probability matrix gJSD(Prob) # Generalized Jensen-Shannon Divergence between three vectors using different log bases gJSD(Prob, unit = "log2") # Default gJSD(Prob, unit = "log") gJSD(Prob, unit = "log10") # Jensen-Shannon Divergence Divergence between count vectors P.count and Q.count P.count <- 1:10 Q.count <- 20:29 R.count <- 30:39 x.count <- rbind(P.count, Q.count, R.count) gJSD(x.count, est.prob = "empirical")
# define input probability matrix Prob <- rbind(1:10/sum(1:10), 20:29/sum(20:29), 30:39/sum(30:39)) # compute the Generalized JSD comparing the PS probability matrix gJSD(Prob) # Generalized Jensen-Shannon Divergence between three vectors using different log bases gJSD(Prob, unit = "log2") # Default gJSD(Prob, unit = "log") gJSD(Prob, unit = "log10") # Jensen-Shannon Divergence Divergence between count vectors P.count and Q.count P.count <- 1:10 Q.count <- 20:29 R.count <- 30:39 x.count <- rbind(P.count, Q.count, R.count) gJSD(x.count, est.prob = "empirical")
The lowlevel function for computing the gower distance.
gower(P, Q, testNA)
gower(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
gower(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
gower(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Compute the Shannon's Entropy based on a
given probability vector
.
H(x, unit = "log2")
H(x, unit = "log2")
x |
a numeric probability vector |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
This function might be useful to fastly compute Shannon's Entropy for any given probability vector.
a numeric value representing Shannon's Entropy in bit.
Hajk-Georg Drost
Shannon, Claude E. 1948. "A Mathematical Theory of Communication". Bell System Technical Journal 27 (3): 379-423.
H(1:10/sum(1:10))
H(1:10/sum(1:10))
The lowlevel function for computing the harmonic_mean_dist distance.
harmonic_mean_dist(P, Q, testNA)
harmonic_mean_dist(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
harmonic_mean_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
harmonic_mean_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the hellinger distance.
hellinger(P, Q, testNA)
hellinger(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
hellinger(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
hellinger(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the inner_product distance.
inner_product(P, Q, testNA)
inner_product(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
inner_product(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
inner_product(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the intersection_dist distance.
intersection_dist(P, Q, testNA)
intersection_dist(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
intersection_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
intersection_dist(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the jaccard distance.
jaccard(P, Q, testNA)
jaccard(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
jaccard(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
jaccard(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
This funciton computes Shannon's Joint-Entropy based on a given joint-probability vector
.
JE(x, unit = "log2")
JE(x, unit = "log2")
x |
a numeric joint-probability vector |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
a numeric value representing Shannon's Joint-Entropy in bit.
Hajk-Georg Drost
Shannon, Claude E. 1948. "A Mathematical Theory of Communication". Bell System Technical Journal 27 (3): 379-423.
H
, CE
, KL
, JSD
, gJSD
, distance
JE(1:100/sum(1:100))
JE(1:100/sum(1:100))
The lowlevel function for computing the jeffreys distance.
jeffreys(P, Q, testNA, unit, epsilon)
jeffreys(P, Q, testNA, unit, epsilon)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Hajk-Georg Drost
jeffreys(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2", epsilon = 0.00001)
jeffreys(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2", epsilon = 0.00001)
The lowlevel function for computing the jensen_difference distance.
jensen_difference(P, Q, testNA, unit)
jensen_difference(P, Q, testNA, unit)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
Hajk-Georg Drost
jensen_difference(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
jensen_difference(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
The lowlevel function for computing the jensen_shannon distance.
jensen_shannon(P, Q, testNA, unit)
jensen_shannon(P, Q, testNA, unit)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
Hajk-Georg Drost
jensen_shannon(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
jensen_shannon(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
This function computes a divergence matrix or divergence value based on the Jensen-Shannon Divergence with equal weights.
Please be aware that when aiming to compute the Jensen-Shannon Distance (rather than Divergence), you will need to apply the link{sqrt}
on the JSD()
output.
JSD(x, test.na = TRUE, unit = "log2", est.prob = NULL)
JSD(x, test.na = TRUE, unit = "log2", est.prob = NULL)
x |
a numeric |
test.na |
a boolean value specifying whether input vectors shall be tested for NA values. |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
est.prob |
method to estimate probabilities from input count vectors such as non-probability vectors. Default:
|
Function to compute the Jensen-Shannon Divergence JSD(P || Q) between two
probability distributions P and Q with equal weights =
=
.
The Jensen-Shannon Divergence JSD(P || Q) between two probability distributions P and Q is defined as:
where denotes the mid-point of the probability
vectors P and Q, and KL(P || R), KL(Q || R) denote the Kullback-Leibler
Divergence of P and R, as well as Q and R.
General properties of the Jensen-Shannon Divergence:
1)
JSD is non-negative.
2)
JSD is a symmetric measure JSD(P || Q) = JSD(Q || P).
3)
JSD = 0, if and only if P = Q.
a divergence value or matrix based on JSD computations.
Hajk-Georg Drost
Lin J. 1991. "Divergence Measures Based on the Shannon Entropy". IEEE Transactions on Information Theory. (33) 1: 145-151.
Endres M. and Schindelin J. E. 2003. "A new metric for probability distributions". IEEE Trans. on Info. Thy. (49) 3: 1858-1860.
# Jensen-Shannon Divergence between P and Q P <- 1:10/sum(1:10) Q <- 20:29/sum(20:29) x <- rbind(P,Q) JSD(x) # Jensen-Shannon Divergence between P and Q using different log bases JSD(x, unit = "log2") # Default JSD(x, unit = "log") JSD(x, unit = "log10") # Jensen-Shannon Divergence Divergence between count vectors P.count and Q.count P.count <- 1:10 Q.count <- 20:29 x.count <- rbind(P.count,Q.count) JSD(x.count, est.prob = "empirical") # Example: Divergence Matrix using JSD-Divergence Prob <- rbind(1:10/sum(1:10), 20:29/sum(20:29), 30:39/sum(30:39)) # compute the KL matrix of a given probability matrix JSDMatrix <- JSD(Prob) # plot a heatmap of the corresponding JSD matrix heatmap(JSDMatrix)
# Jensen-Shannon Divergence between P and Q P <- 1:10/sum(1:10) Q <- 20:29/sum(20:29) x <- rbind(P,Q) JSD(x) # Jensen-Shannon Divergence between P and Q using different log bases JSD(x, unit = "log2") # Default JSD(x, unit = "log") JSD(x, unit = "log10") # Jensen-Shannon Divergence Divergence between count vectors P.count and Q.count P.count <- 1:10 Q.count <- 20:29 x.count <- rbind(P.count,Q.count) JSD(x.count, est.prob = "empirical") # Example: Divergence Matrix using JSD-Divergence Prob <- rbind(1:10/sum(1:10), 20:29/sum(20:29), 30:39/sum(30:39)) # compute the KL matrix of a given probability matrix JSDMatrix <- JSD(Prob) # plot a heatmap of the corresponding JSD matrix heatmap(JSDMatrix)
The lowlevel function for computing the k_divergence distance.
k_divergence(P, Q, testNA, unit)
k_divergence(P, Q, testNA, unit)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
Hajk-Georg Drost
k_divergence(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
k_divergence(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
This function computes the Kullback-Leibler divergence of two probability distributions P and Q.
KL(x, test.na = TRUE, unit = "log2", est.prob = NULL, epsilon = 1e-05)
KL(x, test.na = TRUE, unit = "log2", est.prob = NULL, epsilon = 1e-05)
x |
a numeric |
test.na |
a boolean value indicating whether input vectors should be tested for NA values. |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
est.prob |
method to estimate probabilities from a count vector. Default: est.prob = NULL. |
epsilon |
a small value to address cases in the KL computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
where H(P,Q) denotes the joint entropy of the probability distributions P and Q and H(P) denotes the entropy of probability distribution P. In case P = Q then KL(P,Q) = 0 and in case P != Q then KL(P,Q) > 0.
The KL divergence is a non-symmetric measure of the directed divergence between two probability distributions P and Q. It only fulfills the positivity property of a distance metric.
Because of the relation KL(P||Q) = H(P,Q) - H(P), the Kullback-Leibler divergence of two probability distributions P and Q is also named Cross Entropy of two probability distributions P and Q.
The Kullback-Leibler divergence of probability vectors.
Hajk-Georg Drost
Cover Thomas M. and Thomas Joy A. 2006. Elements of Information Theory. John Wiley & Sons.
# Kulback-Leibler Divergence between P and Q P <- 1:10/sum(1:10) Q <- 20:29/sum(20:29) x <- rbind(P,Q) KL(x) # Kulback-Leibler Divergence between P and Q using different log bases KL(x, unit = "log2") # Default KL(x, unit = "log") KL(x, unit = "log10") # Kulback-Leibler Divergence between count vectors P.count and Q.count P.count <- 1:10 Q.count <- 20:29 x.count <- rbind(P.count,Q.count) KL(x, est.prob = "empirical") # Example: Distance Matrix using KL-Distance Prob <- rbind(1:10/sum(1:10), 20:29/sum(20:29), 30:39/sum(30:39)) # compute the KL matrix of a given probability matrix KLMatrix <- KL(Prob) # plot a heatmap of the corresponding KL matrix heatmap(KLMatrix)
# Kulback-Leibler Divergence between P and Q P <- 1:10/sum(1:10) Q <- 20:29/sum(20:29) x <- rbind(P,Q) KL(x) # Kulback-Leibler Divergence between P and Q using different log bases KL(x, unit = "log2") # Default KL(x, unit = "log") KL(x, unit = "log10") # Kulback-Leibler Divergence between count vectors P.count and Q.count P.count <- 1:10 Q.count <- 20:29 x.count <- rbind(P.count,Q.count) KL(x, est.prob = "empirical") # Example: Distance Matrix using KL-Distance Prob <- rbind(1:10/sum(1:10), 20:29/sum(20:29), 30:39/sum(30:39)) # compute the KL matrix of a given probability matrix KLMatrix <- KL(Prob) # plot a heatmap of the corresponding KL matrix heatmap(KLMatrix)
The lowlevel function for computing the kulczynski_d distance.
kulczynski_d(P, Q, testNA, epsilon)
kulczynski_d(P, Q, testNA, epsilon)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Hajk-Georg Drost
kulczynski_d(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, epsilon = 0.00001)
kulczynski_d(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, epsilon = 0.00001)
The lowlevel function for computing the kullback_leibler_distance distance.
kullback_leibler_distance(P, Q, testNA, unit, epsilon)
kullback_leibler_distance(P, Q, testNA, unit, epsilon)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Hajk-Georg Drost
kullback_leibler_distance(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2", epsilon = 0.00001)
kullback_leibler_distance(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2", epsilon = 0.00001)
The lowlevel function for computing the kumar_hassebrook distance.
kumar_hassebrook(P, Q, testNA)
kumar_hassebrook(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
kumar_hassebrook(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
kumar_hassebrook(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the kumar_johnson distance.
kumar_johnson(P, Q, testNA, epsilon)
kumar_johnson(P, Q, testNA, epsilon)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Hajk-Georg Drost
kumar_johnson(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, epsilon = 0.00001)
kumar_johnson(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, epsilon = 0.00001)
This function computed the linear correlation between two vectors or a correlation matrix for an input matrix.
The following methods to compute linear correlations are implemented in this function:
lin.cor(x, y = NULL, method = "pearson", test.na = FALSE)
lin.cor(x, y = NULL, method = "pearson", test.na = FALSE)
x |
a numeric |
y |
a numeric |
method |
the method to compute the linear correlation between |
test.na |
a boolean value indicating whether input data should be checked for |
method = "pearson"
: Pearson's correlation coefficient (centred).
method = "pearson2"
: Pearson's uncentred correlation coefficient.
method = "sq_pearson"
. Squared Pearson's correlation coefficient.
method = "kendall"
: Kendall's correlation coefficient.
method = "spearman"
: Spearman's correlation coefficient.
Further Details:
Pearson's correlation coefficient (centred) :
Hajk-Georg Drost
The low-level function for computing the lorentzian distance.
lorentzian(P, Q, testNA, unit)
lorentzian(P, Q, testNA, unit)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
Hajk-Georg Drost
lorentzian(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
lorentzian(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
The lowlevel function for computing the manhattan distance.
manhattan(P, Q, testNA)
manhattan(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
manhattan(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
manhattan(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the matusita distance.
matusita(P, Q, testNA)
matusita(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
matusita(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
matusita(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
Compute Shannon's Mutual Information based on the identity based on a given joint-probability vector
and probability vectors
and
.
MI(x, y, xy, unit = "log2")
MI(x, y, xy, unit = "log2")
x |
a numeric probability vector |
y |
a numeric probability vector |
xy |
a numeric joint-probability vector |
unit |
a character string specifying the logarithm unit that shall be used to compute distances that depend on log computations. |
This function might be useful to fastly compute Shannon's Mutual Information for any given joint-probability vector and probability vectors.
Shannon's Mutual Information in bit.
Hajk-Georg Drost
Shannon, Claude E. 1948. "A Mathematical Theory of Communication". Bell System Technical Journal 27 (3): 379-423.
MI( x = 1:10/sum(1:10), y = 20:29/sum(20:29), xy = 1:10/sum(1:10) )
MI( x = 1:10/sum(1:10), y = 20:29/sum(20:29), xy = 1:10/sum(1:10) )
The lowlevel function for computing the minkowski distance.
minkowski(P, Q, n, testNA)
minkowski(P, Q, n, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
n |
index for the minkowski exponent. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
minkowski(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), n = 2, testNA = FALSE)
minkowski(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), n = 2, testNA = FALSE)
The lowlevel function for computing the motyka distance.
motyka(P, Q, testNA)
motyka(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
motyka(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
motyka(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the neyman_chi_sq distance.
neyman_chi_sq(P, Q, testNA, epsilon)
neyman_chi_sq(P, Q, testNA, epsilon)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Hajk-Georg Drost
neyman_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, epsilon = 0.00001)
neyman_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, epsilon = 0.00001)
The lowlevel function for computing the pearson_chi_sq distance.
pearson_chi_sq(P, Q, testNA, epsilon)
pearson_chi_sq(P, Q, testNA, epsilon)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Hajk-Georg Drost
pearson_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, epsilon = 0.00001)
pearson_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, epsilon = 0.00001)
The lowlevel function for computing the prob_symm_chi_sq distance.
prob_symm_chi_sq(P, Q, testNA)
prob_symm_chi_sq(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
prob_symm_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
prob_symm_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the ruzicka distance.
ruzicka(P, Q, testNA)
ruzicka(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
ruzicka(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
ruzicka(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the soergel distance.
soergel(P, Q, testNA)
soergel(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
soergel(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
soergel(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the sorensen distance.
sorensen(P, Q, testNA)
sorensen(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
sorensen(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
sorensen(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the squared_chi_sq distance.
squared_chi_sq(P, Q, testNA)
squared_chi_sq(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
squared_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
squared_chi_sq(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the squared_chord distance.
squared_chord(P, Q, testNA)
squared_chord(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
squared_chord(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
squared_chord(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the squared_euclidean distance.
squared_euclidean(P, Q, testNA)
squared_euclidean(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
squared_euclidean(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
squared_euclidean(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the taneja distance.
taneja(P, Q, testNA, unit, epsilon)
taneja(P, Q, testNA, unit, epsilon)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
epsilon |
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by |
Hajk-Georg Drost
taneja(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2", epsilon = 0.00001)
taneja(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2", epsilon = 0.00001)
The lowlevel function for computing the tanimoto distance.
tanimoto(P, Q, testNA)
tanimoto(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
tanimoto(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
tanimoto(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
The lowlevel function for computing the topsoe distance.
topsoe(P, Q, testNA, unit)
topsoe(P, Q, testNA, unit)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
unit |
type of
|
Hajk-Georg Drost
topsoe(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
topsoe(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE, unit = "log2")
The lowlevel function for computing the wave_hedges distance.
wave_hedges(P, Q, testNA)
wave_hedges(P, Q, testNA)
P |
a numeric vector storing the first distribution. |
Q |
a numeric vector storing the second distribution. |
testNA |
a logical value indicating whether or not distributions shall be checked for |
Hajk-Georg Drost
wave_hedges(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)
wave_hedges(P = 1:10/sum(1:10), Q = 20:29/sum(20:29), testNA = FALSE)