Title: | Evolutionary Transcriptomics |
---|---|
Description: | Investigate the evolution of biological processes by capturing evolutionary signatures in transcriptomes (Drost et al. (2018) <doi:10.1093/bioinformatics/btx835>). This package aims to provide a transcriptome analysis environment to quantify the average evolutionary age of genes contributing to a transcriptome of interest. |
Authors: | Hajk-Georg Drost [aut, cre] , Jaruwatana Sodai Lotharukpong [aut, ctb] , Nikola Kalábová [aut, ctb] , Filipa Martins Costa [aut, ctb], Kristian K Ullrich [aut, ctb] , Stefan Manolache [aut, ctb] |
Maintainer: | Hajk-Georg Drost <[email protected]> |
License: | GPL-2 |
Version: | 2.0.0 |
Built: | 2024-11-21 18:33:41 UTC |
Source: | https://github.com/drostlab/mytai |
This function performs the split-apply-combine methodology on Phylostrata or Divergence Strata stored within the input ExpressionSet.
This function is very useful to perform any phylostratum or divergence-stratum specific analysis.
age.apply(ExpressionSet, FUN, ..., as.list = FALSE)
age.apply(ExpressionSet, FUN, ..., as.list = FALSE)
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
FUN |
a function to be performed on the corresponding expression matrix of each phylostratum or divergence-stratum. |
... |
additional arguments of FUN. |
as.list |
a boolean value specifying whether the output format shall be a matrix or a list object. |
This function uses the split
function to subset the expression matrix into
phylostratum specific sub-matrices. Internally using lapply
, any function can
be performed to the sub-matrices. The return value of this function is a numeric matrix storing
the return values by FUN
for each phylostratum and each developmental stage s.
Note that the input FUN
must be an function that can be applied to a matrix (e.g., colMeans
or RE
).
In case you use an anonymous function you could use function(x) apply(x , 2 , var)
as an example to compute the variance of each phylostratum and each
developmental stage s.
Either a numeric matrix storing the return values of the applied function for each age class or a numeric list storing the return values of the applied function for each age class in a list.
Hajk-Georg Drost
split
, tapply
, lapply
, RE
, REMatrix
# source the example dataset data(PhyloExpressionSetExample) # Example 1 # get the relative expression profiles for each phylostratum age.apply(PhyloExpressionSetExample, RE) # this is analogous to REMatrix(PhyloExpressionSetExample) # Example 2 # compute the mean expression profiles for each phylostratum age.apply(PhyloExpressionSetExample, colMeans) # Example 3 # compute the variance profiles for each phylostratum age.apply(PhyloExpressionSetExample, function(x) apply(x , 2 , var)) # Example 4 # compute the range for each phylostratum # Note: in this case, the range() function returns 2 values for each phylostratum # and each developmental stage, hence one should use the argument 'as.list = TRUE' # to make sure that the results are returned properly age.apply(PhyloExpressionSetExample, function(x) apply(x , 2 , range), as.list = TRUE)
# source the example dataset data(PhyloExpressionSetExample) # Example 1 # get the relative expression profiles for each phylostratum age.apply(PhyloExpressionSetExample, RE) # this is analogous to REMatrix(PhyloExpressionSetExample) # Example 2 # compute the mean expression profiles for each phylostratum age.apply(PhyloExpressionSetExample, colMeans) # Example 3 # compute the variance profiles for each phylostratum age.apply(PhyloExpressionSetExample, function(x) apply(x , 2 , var)) # Example 4 # compute the range for each phylostratum # Note: in this case, the range() function returns 2 values for each phylostratum # and each developmental stage, hence one should use the argument 'as.list = TRUE' # to make sure that the results are returned properly age.apply(PhyloExpressionSetExample, function(x) apply(x , 2 , range), as.list = TRUE)
This function computes the TAI for a row permutated PhyloExpressionSet or DivergenceExpressionSet.
One can specify the number of permutations which corresponds to the number of TAI
or TDI
profiles
that are being returned as data matrix. The function then returns a TAI
or TDI
matrix holding
the TAI
or TDI
profiles of the permutated PhyloExpressionSets or DivergenceExpressionSets. This procedure
can be used for building test statistics based on the TAI
or TDI
profiles.
bootMatrix(ExpressionSet, permutations = 1000)
bootMatrix(ExpressionSet, permutations = 1000)
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
permutations |
a numeric value specifying the number of permutations to be performed. |
The sampled TAI
or TDI
matrix samples the phylostratum or divergence-stratum vector of
a given PhyloExpressionSet or DivergenceExpressionSet and computes the corresponding TAI
or TDI
profiles
of the randomly assigned phylostrata or divergence-strata. This sampling is then performed N times, yielding N randomly sampled TAI
or TDI
profiles.
This random TAI
or TDI
profile matrix can then be used to perform statistical tests
(such as the FlatLineTest
, ReductiveHourglassTest
, or EarlyConservationTest
) based on the significance of TAI
or TDI
patterns.
a numeric matrix representing N randomly permuted TAI
or TDI
profiles.
Hajk-Georg Drost
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
FlatLineTest
, ReductiveHourglassTest
, EarlyConservationTest
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet using 100 permutations randomTAI.Matrix <- bootMatrix(PhyloExpressionSetExample, permutations = 100) # example DivergenceExpressionSet using 100 permutations randomTDI.Matrix <- bootMatrix(DivergenceExpressionSetExample, permutations = 100)
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet using 100 permutations randomTAI.Matrix <- bootMatrix(PhyloExpressionSetExample, permutations = 100) # example DivergenceExpressionSet using 100 permutations randomTDI.Matrix <- bootMatrix(DivergenceExpressionSetExample, permutations = 100)
This function computes the transcriptome evolutionary index (TEI) using permuted strata values.
bootTEI( ExpressionSet, Phylostratum = NULL, permutations = 100, split = 1e+05, showprogress = TRUE, threads = 1 )
bootTEI( ExpressionSet, Phylostratum = NULL, permutations = 100, split = 1e+05, showprogress = TRUE, threads = 1 )
ExpressionSet |
expression object with rownames as GeneID (dgCMatrix) or standard PhyloExpressionSet object. |
Phylostratum |
a named vector representing phylostratum per GeneID with names as GeneID (not used if Expression is PhyloExpressionSet). |
permutations |
a numeric value specifying the number of permutations to be performed. |
split |
specify number of columns to split |
showprogress |
boolean if progressbar should be shown |
threads |
specify number of threads |
The strata values are sampled and the global TEI
is calculated N times.
a numeric matrix storing the TEI values based on permuted strata.
Kristian K Ullrich
Domazet-Loso T. and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample, package = "myTAI") # computing partial TEI contribution per gene bM <- bootTEI(PhyloExpressionSetExample)
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample, package = "myTAI") # computing partial TEI contribution per gene bM <- bootTEI(PhyloExpressionSetExample)
This function takes an ExpressionSet
object storing either a constant or variable number
of biological or technical replicates per stage and collapses replicate expression levels using a defined FUN
(window function).
CollapseReplicates(ExpressionSet, nrep, FUN, stage.names = NULL)
CollapseReplicates(ExpressionSet, nrep, FUN, stage.names = NULL)
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
nrep |
either a numeric value specifying the constant number of replicates per stage or a numeric vector specifying the variable number of replicates for each stage position. |
FUN |
a window function (e.g., |
stage.names |
a character vector specifying the new names of collapsed stages. |
Hajk-Georg Drost
data(PhyloExpressionSetExample) # combine the expression levels of the 2 replicates (const) per stage # using mean as window function and rename new stages: "S1","S2","S3" CollapseReplicates(ExpressionSet = PhyloExpressionSetExample[1:5,1:8], nrep = 2, FUN = mean, stage.names = c("S1","S2","S3")) # combine the expression levels of the 2 replicates (stage one), 2 replicates (stage two), # and 3 replicates (stage three) using mean as window function # and rename new stages: "S1","S2","S3" CollapseReplicates(ExpressionSet = PhyloExpressionSetExample[1:5,1:9], nrep = c(2,2,3), FUN = mean, stage.names = c("S1","S2","S3"))
data(PhyloExpressionSetExample) # combine the expression levels of the 2 replicates (const) per stage # using mean as window function and rename new stages: "S1","S2","S3" CollapseReplicates(ExpressionSet = PhyloExpressionSetExample[1:5,1:8], nrep = 2, FUN = mean, stage.names = c("S1","S2","S3")) # combine the expression levels of the 2 replicates (stage one), 2 replicates (stage two), # and 3 replicates (stage three) using mean as window function # and rename new stages: "S1","S2","S3" CollapseReplicates(ExpressionSet = PhyloExpressionSetExample[1:5,1:9], nrep = c(2,2,3), FUN = mean, stage.names = c("S1","S2","S3"))
In case a PhyloExpressionSet or DivergenceExpressionSet stores replicates for each developmental stage or experiment, this function allows to compute the p-values quantifying the statistical significance of the underlying pattern for all combinations of replicates.
CombinatorialSignificance( ExpressionSet, replicates, TestStatistic = "FlatLineTest", permutations = 1000, parallel = FALSE )
CombinatorialSignificance( ExpressionSet, replicates, TestStatistic = "FlatLineTest", permutations = 1000, parallel = FALSE )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
replicates |
a numeric vector storing the number of replicates within each developmental stage or experiment. In case replicate stores only one value, then the function assumes that each developmental stage or experiment stores the same number of replicates. |
TestStatistic |
a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern.
Default is |
permutations |
a numeric value specifying the number of permutations to be performed for the |
parallel |
a boolean value specifying whether parallel processing (multicore processing) shall be performed. |
The intention of this analysis is to validate that there exists no sequence of replicates (for all possible combination of replicates) that results in a non-significant pattern, when the initial pattern with combined replicates was shown to be significant.
A small Example:
Assume PhyloExpressionSet stores 3 developmental stages with 3 replicates measured for each stage. The 9 replicates in total are denoted as: 1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 3.1, 3.2, 3.3. Now the function computes the statistical significance of each pattern derived by the corresponding combination of replicates, e.g.
1.1, 2.1, 3.1 -> p-value for combination 1
1.1, 2.2, 3.1 -> p-value for combination 2
1.1, 2.3, 3.1 -> p-value for combination 3
1.2, 2.1, 3.1 -> p-value for combination 4
1.2, 2.1, 3.1 -> p-value for combination 5
1.2, 2.1, 3.1 -> p-value for combination 6
1.3, 2.1, 3.1 -> p-value for combination 7
1.3, 2.2, 3.1 -> p-value for combination 8
1.3, 2.3, 3.1 -> p-value for combination 9
This procedure yields 27 p-values for the (
) replicate combinations.
Note, that in case you have a large amount of stages/experiments and a large amount of replicates
the computation time will increase by . For 11 stages and 4 replicates, 4^11 = 4194304 p-values have to be computed.
Each p-value computation itself is based on a permutation test running with 1000 or more permutations. Be aware that this might take some time.
The p-value vector returned by this function can then be used to plot the p-values to see
whether an critical value is exeeded or not (e.g.
).
The function receives a standard PhyloExpressionSet or DivergenceExpressionSet object and a vector storing the number of replicates present in each stage or experiment. Based on these arguments the function computes all possible replicate combinations using the expand.grid
function and performs a permutation test (either a FlatLineTest
for each replicate combination. The permutation parameter of this function specifies the number of permutations that shall be performed for each permutation test. When all p-values are computed, a numeric vector storing the corresponding p-values for each replicate combination is returned.
In other words, for each replicate combination present in the PhyloExpressionSet or DivergenceExpressionSet object, the TAI or TDI pattern of the corresponding replicate combination is tested for its statistical significance based on the underlying test statistic.
This function is also able to perform all computations in parallel using multicore processing. The underlying statistical tests are written in C++ and optimized for fast computations.
a numeric vector storing the p-values returned by the underlying test statistic for all possible replicate combinations.
Hajk-Georg Drost
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
## Not run: # load a standard PhyloExpressionSet data(PhyloExpressionSetExample) # we assume that the PhyloExpressionSetExample # consists of 3 developmental stages # and 2 replicates for stage 1, 3 replicates for stage 2, # and 2 replicates for stage 3 # FOR REAL ANALYSES PLEASE USE: permutations = 1000 or 10000 # BUT NOTE THAT THIS TAKES MUCH MORE COMPUTATION TIME p.vector <- CombinatorialSignificance(ExpressionSet = PhyloExpressionSetExample, replicates = c(2,3,2), TestStatistic = "FlatLineTest", permutations = 1000, parallel = FALSE) ## End(Not run)
## Not run: # load a standard PhyloExpressionSet data(PhyloExpressionSetExample) # we assume that the PhyloExpressionSetExample # consists of 3 developmental stages # and 2 replicates for stage 1, 3 replicates for stage 2, # and 2 replicates for stage 3 # FOR REAL ANALYSES PLEASE USE: permutations = 1000 or 10000 # BUT NOTE THAT THIS TAKES MUCH MORE COMPUTATION TIME p.vector <- CombinatorialSignificance(ExpressionSet = PhyloExpressionSetExample, replicates = c(2,3,2), TestStatistic = "FlatLineTest", permutations = 1000, parallel = FALSE) ## End(Not run)
Detect differentially expressed genes (DEGs) in a standard ExpressionSet
object.
DiffGenes( ExpressionSet, nrep, method = "foldchange", p.adjust.method = NULL, comparison = NULL, alpha = NULL, filter.method = NULL, n = NULL, stage.names = NULL )
DiffGenes( ExpressionSet, nrep, method = "foldchange", p.adjust.method = NULL, comparison = NULL, alpha = NULL, filter.method = NULL, n = NULL, stage.names = NULL )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
nrep |
either a numeric value specifying the constant number of replicates per stage or a numeric vector specifying the variable number of replicates for each stage position. |
method |
method to detect differentially expressed genes. |
p.adjust.method |
p value correction method that is passed to
If |
comparison |
a character string specifying whether genes having fold-change or p-values
below, above, or below AND above (both) the |
alpha |
a numeric value specifying the cut-off value above which Genes fulfilling the corresponding fold-change, log-fold-change, or p-value should be retained and returned by |
filter.method |
a method how to |
n |
a numeric value for |
stage.names |
a character vector specifying the new names of collapsed stages. |
All methods to perform detection of differentially expressed genes assume that your input dataset has been normalized before passing it to DiffGenes. For RNA-Seq data DiffGenes assumes that the libraries have been normalized to have the same size, i.e., to have the same expected column sum under the null hypothesis.
Available methods for the detection of differentially expressed genes:
method = "foldchange"
: ratio of replicate geometric means between developmental stages.
Here, the DiffGenes functions assumes that absolute expression levels are stored in your input ExpresisonSet
.
method = "log-foldchange"
: difference of replicate arithmetic means between developmental stages. Here, the DiffGenes functions assumes that log2a
transformed expression levels are stored in your input ExpresisonSet
.
method = "t.test"
: Welch t.test between replicate expression levels of two samples.
method = "wilcox.test"
: Wilcoxon Rank Sum Test between replicate expression levels of two samples.
Exclude non differentially expressed genes from the result dataset:
When specifying the alpha
argument you furthermore, need to specify the filter.method
to decide how non differentially expressed genes should be classified in multiple sample comparisons and which genes should be retained in the final dataset returned by DiffGenes
. In other words, all genes < alpha
based on the following filter.method
are removed from the result dataset.
Following extraction criteria are implemented in this function:
const
: all genes that have at least one sample comparison that undercuts or exceeds the alpha
value cut.off
will be excluded from the ExpressionSet
. Hence, for a 7 stage ExpressionSet
genes passing the alpha
threshold in 6 stages will be retained in the ExpressionSet
.
min-set
: genes passing the alpha
value in ceiling(n/2)
stages will be retained in the ExpressionSet
, where n is the number of stages in the ExpressionSet
.
n-set
: genes passing the alpha
value in n
stages will be retained in the ExpressionSet
. Here, the argument n
needs to be specified.
In case input ExpressionSet
objects store 0 values, internally all expression levels are
shifted by +1
to allow sufficient fold-change and p-value computations. Additionally, a warning
is printed to the console in case expression levels have been automatically shifted.
Hajk-Georg Drost
data(PhyloExpressionSetExample) # Detection of DEGs using the fold-change measure DEGs <- DiffGenes(ExpressionSet = PhyloExpressionSetExample[ ,1:8], nrep = 2, comparison = "below", method = "foldchange", stage.names = c("S1","S2","S3")) head(DEGs) # Detection of DEGs using the log-fold-change measure # when choosing method = "log-foldchange" it is assumed that # your input expression matrix stores log2 expression levels log.DEGs <- DiffGenes(ExpressionSet = tf(PhyloExpressionSetExample[1:5,1:8],log2), nrep = 2, comparison = "below", method = "log-foldchange", stage.names = c("S1","S2","S3")) head(log.DEGs) # Remove fold-change values < 2 from the dataset: ## first have a look at the range of fold-change values of all genes apply(DEGs[ , 3:8],2,range) # now remove genes undercutting the alpha = 2 threshold # hence, remove genes having p-values <= 0.05 in at # least one sample comparison DEGs.alpha <- DiffGenes(ExpressionSet = PhyloExpressionSetExample[1:250 ,1:8], nrep = 2, method = "t.test", alpha = 0.05, comparison = "above", filter.method = "n-set", n = 1, stage.names = c("S1","S2","S3")) # now again have a look at the range and find # that fold-change values of 2 are the min value apply(DEGs.alpha[ , 3:5],2,range) # now check whether each example has at least one stage with a p-value <= 0.05 head(DEGs.alpha)
data(PhyloExpressionSetExample) # Detection of DEGs using the fold-change measure DEGs <- DiffGenes(ExpressionSet = PhyloExpressionSetExample[ ,1:8], nrep = 2, comparison = "below", method = "foldchange", stage.names = c("S1","S2","S3")) head(DEGs) # Detection of DEGs using the log-fold-change measure # when choosing method = "log-foldchange" it is assumed that # your input expression matrix stores log2 expression levels log.DEGs <- DiffGenes(ExpressionSet = tf(PhyloExpressionSetExample[1:5,1:8],log2), nrep = 2, comparison = "below", method = "log-foldchange", stage.names = c("S1","S2","S3")) head(log.DEGs) # Remove fold-change values < 2 from the dataset: ## first have a look at the range of fold-change values of all genes apply(DEGs[ , 3:8],2,range) # now remove genes undercutting the alpha = 2 threshold # hence, remove genes having p-values <= 0.05 in at # least one sample comparison DEGs.alpha <- DiffGenes(ExpressionSet = PhyloExpressionSetExample[1:250 ,1:8], nrep = 2, method = "t.test", alpha = 0.05, comparison = "above", filter.method = "n-set", n = 1, stage.names = c("S1","S2","S3")) # now again have a look at the range and find # that fold-change values of 2 are the min value apply(DEGs.alpha[ , 3:5],2,range) # now check whether each example has at least one stage with a p-value <= 0.05 head(DEGs.alpha)
A standard DivergenceExpressionSet is a data.frame
consisting of a standardized sequence of columns
to store the age information for each gene and its corresponding gene expression profile.
The standard is defined as follows:
Divergencestratum | GeneID | Expression-level 1 | ... | Expression-level N
This example DivergenceExpressionSet dataset covers 7 developmental stages of Arabidopsis thaliana embryo development. The initial gene expression dataset was published by Xiang et al., 2011 (see references section) and was then used by Quint et al., 2012 (see references section) to assign sequence divergence values (divergence strata) to each gene expression profile.
a standard DivergenceExpressionSet object.
Hajk-Georg Drost
http://www.plantphysiol.org/content/156/1/346/suppl/DC1
Quint M et al. 2012. "A transcriptomic hourglass in plant embryogenesis". Nature (490): 98-101. Supplementary Table 2 : http://www.nature.com/nature/journal/v490/n7418/full/nature11394.html
Xiang D et al. 2011. "Genome-Wide Analysis Reveals Gene Expression and Metabolic Network Dynamics during Embryo Development in Arabidopsis". Plant Physiology (156): 346-356. Supplemental Table 1 : http://www.plantphysiol.org/content/156/1/346/suppl/DC1
The Reductive Early Conservation Test aims to statistically evaluate the
existence of a monotonically increasing phylotranscriptomic pattern based on TAI
or TDI
computations.
The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern)
does not follow an early conservation like pattern. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does
indeed follow an early conservation (low-high-high) shape.
EarlyConservationTest( ExpressionSet, modules = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
EarlyConservationTest( ExpressionSet, modules = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
modules |
a list storing three elements: early, mid, and late. Each element expects a numeric
vector specifying the developmental stages or experiments that correspond to each module.
For example, |
permutations |
a numeric value specifying the number of permutations to be performed for the |
lillie.test |
a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed to quantify the goodness of fit. |
plotHistogram |
a boolean value specifying whether a Lillifor's Kolmogorov-Smirnov-Test shall be performed to test the goodness of fit of the approximated distribution, as well as additional plots quantifying the significance of the observed phylotranscriptomic pattern. |
runs |
specify the number of runs to be performed for goodness of fit computations, in case |
parallel |
performing |
gof.warning |
a logical value indicating whether non significant goodness of fit results should be printed as warning. Default is |
custom.perm.matrix |
a custom |
The reductive early conservation test is a permutation test based on the following test statistic.
(1) A set of developmental stages is partitioned into three modules - early, mid, and late - based on prior biological knowledge.
(2) The mean TAI
or TDI
value for each of the three modules T_early, T_mid, and T_late are computed.
(3) The two differences D1 = T_mid - T_early and D2 = T_late - T_early are calculated.
(4) The minimum D_min of D1 and D2 is computed as final test statistic of the reductive hourglass test.
In order to determine the statistical significance of an observed minimum difference D_min
the following permutation test was performed. Based on the bootMatrix
D_min
is calculated from each of the permuted TAI
or TDI
profiles,
approximated by a Gaussian distribution with method of moments estimated parameters returned by fitdist
,
and the corresponding p-value is computed by pnorm
given the estimated parameters of the Gaussian distribution.
The goodness of fit for the random vector D_min is statistically quantified by an Lilliefors (Kolmogorov-Smirnov) test
for normality.
In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:
(1) A Cullen and Frey skewness-kurtosis plot generated by descdist
.
This plot illustrates which distributions seem plausible to fit the resulting permutation vector D_min.
In the case of the reductive early conservation test a normal distribution seemed plausible.
(2) A histogram of D_min combined with the density plot is plotted. D_min is then fitted by a normal distribution.
The corresponding parameters are estimated by moment matching estimation using the fitdist
function.
(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.
(4) A barplot showing the number of cases in which the underlying goodness of fit (returned by Lilliefors (Kolmogorov-Smirnov) test
for normality) has shown to be significant (TRUE
) or not significant (FALSE
).
This allows to quantify the permutation bias and their implications on the goodness of fit.
a list object containing the list elements:
p.value
: the p-value quantifying the statistical significance (low-high-high pattern) of the given phylotranscriptomics pattern.
std.dev
: the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.
lillie.test
: a boolean value specifying whether the Lillifors KS-Test returned a p-value > 0.05,
which indicates that fitting the permuted scores with a normal distribution seems plausible.
Hajk-Georg Drost
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Piasecka B, Lichocki P, Moretti S, et al. (2013) The hourglass and the early conservation models co-existing patterns of developmental constraints in vertebrates. PLoS Genet. 9(4): e1003476.
ecScore
, bootMatrix
, FlatLineTest
,ReductiveHourglassTest
, ReverseHourglassTest
,
PlotSignature
, LateConservationTest
data(PhyloExpressionSetExample) # perform the early conservation test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late EarlyConservationTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), permutations = 1000) # use your own permutation matrix based on which p-values (EarlyConservationTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) EarlyConservationTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), custom.perm.matrix = custom_perm_matrix)
data(PhyloExpressionSetExample) # perform the early conservation test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late EarlyConservationTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), permutations = 1000) # use your own permutation matrix based on which p-values (EarlyConservationTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) EarlyConservationTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), custom.perm.matrix = custom_perm_matrix)
This function computes the EarlyConservationTest score for a given TAI
or TDI
pattern.
The reductive early conservation test is a permutation test based on the following test statistic.
- A set of developmental stages is partitioned into three modules - early, mid, and late - based on prior biological knowledge.
- The mean TAI
or TDI
value for each of the three modules T_early, T_mid, and T_late are computed.
- The two differences D1 = T_mid - T_early and D2 = T_late - T_early are calculated.
- The minimum D_min of D1 and D2 is computed as final test statistic of the reductive early conservation test.
This function ecScore computes the D_min value for a given TAI
or TDI
stored in the age_vals
argument.
ecScore(age_vals, early, mid, late, profile.warn = FALSE)
ecScore(age_vals, early, mid, late, profile.warn = FALSE)
age_vals |
a numeric vector containing |
early |
a numeric vector storing the numeric stage values that correspond to the early phase of development. |
mid |
a numeric vector storing the numeric stage values that correspond to the middle phase of development. |
late |
a numeric vector storing the numeric stage values that correspond to the late phase of development. |
profile.warn |
a boolean value indicating whether a warning is printed when a low-mid-high pattern isn't followed. |
a numeric value representing the early conservation score.
Hajk-Georg Drost
EarlyConservationTest
, TAI
, TDI
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # Example PhyloExpressionSet: # compute the TAI profile TAIs <- TAI(PhyloExpressionSetExample) # compute the early conservation score for the TAI profile ec_score <- ecScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7) # Example DivergenceExpressionSet: # compute the TDI profile TDIs <- TDI(DivergenceExpressionSetExample) # compute the early conservation score for the TDI profile ec_score <- ecScore(age_vals = TDIs,early = 1:2,mid = 3:5,late = 6:7) # compute ecScore() vector from bootMatrix() apply(bootMatrix(PhyloExpressionSetExample,10),1,ecScore,early = 1:2,mid = 3:5,late = 6:7) # get warning if the expected pattern isn't followed ec_score <- ecScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7,profile.warn=TRUE)
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # Example PhyloExpressionSet: # compute the TAI profile TAIs <- TAI(PhyloExpressionSetExample) # compute the early conservation score for the TAI profile ec_score <- ecScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7) # Example DivergenceExpressionSet: # compute the TDI profile TDIs <- TDI(DivergenceExpressionSetExample) # compute the early conservation score for the TDI profile ec_score <- ecScore(age_vals = TDIs,early = 1:2,mid = 3:5,late = 6:7) # compute ecScore() vector from bootMatrix() apply(bootMatrix(PhyloExpressionSetExample,10),1,ecScore,early = 1:2,mid = 3:5,late = 6:7) # get warning if the expected pattern isn't followed ec_score <- ecScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7,profile.warn=TRUE)
This function computes the significance of enriched (over or underrepresented) Phylostrata or Divergence Strata within an input test.set
based on the fisher.test
. Please concult PlotEnrichment
for details.
EnrichmentTest( ExpressionSet, test.set, use.only.map = FALSE, measure = "log-foldchange", complete.bg = TRUE, epsilon = 1e-05 )
EnrichmentTest( ExpressionSet, test.set, use.only.map = FALSE, measure = "log-foldchange", complete.bg = TRUE, epsilon = 1e-05 )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
test.set |
a character vector storing the gene ids for which PS/DS enrichment analyses should be performed. |
use.only.map |
a logical value indicating whether instead of a standard |
measure |
a character string specifying the measure that should be used to quantify over and under representation of PS/DS. Measures can either be |
complete.bg |
a logical value indicating whether the entire background set of the input ExpressionSet should be considered when performing Fisher's exact test ( |
epsilon |
a small value to shift values by epsilon to avoid log(0) computations. |
Hajk-Georg Drost
data(PhyloExpressionSetExample) set.seed(123) test_set <- sample(PhyloExpressionSetExample[ , 2],1000) E.Result <- EnrichmentTest(ExpressionSet = PhyloExpressionSetExample, test.set = test_set , measure = "log-foldchange") # get the log-fold change table E.Result$enrichment.matrix # get P-values for the enrichment significance for each Phylostratum E.Result$p.values
data(PhyloExpressionSetExample) set.seed(123) test_set <- sample(PhyloExpressionSetExample[ , 2],1000) E.Result <- EnrichmentTest(ExpressionSet = PhyloExpressionSetExample, test.set = test_set , measure = "log-foldchange") # get the log-fold change table E.Result$enrichment.matrix # get P-values for the enrichment significance for each Phylostratum E.Result$p.values
This function takes an ExpressionSet object and removes genes from the gene expression matrix that
have an expression level below, above, or below AND above a defined cut.off
value. Hence, this function allows to remove
genes that have been defined as not expressed or outliers and returns an ExpressionSet
retaining only expressed genes.
Expressed( ExpressionSet, cut.off, method = "const", comparison = "below", n = NULL )
Expressed( ExpressionSet, cut.off, method = "const", comparison = "below", n = NULL )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
cut.off |
a numeric value specifying the expression cut-off to define genes as not expressed ( |
method |
a method defining how to treat gene expression values in multiple stages. The corresponding method that is chosen allows to control the stage-wise fulfillment of the threshold criteria. Options are |
comparison |
a character string specifying whether genes having expression levels
below, above, or below AND above (both) the |
n |
a numeric value for |
This filter function allows users to remove genes from the ExpressionSet
object that undercut or exceed a certain expression level cut.off
.
Following extraction criteria are implemented in this function:
const
: all genes that have at least one stage that undercuts or exceeds the expression cut.off
will be excluded from the ExpressionSet
. Hence, for a 7 stage ExpressionSet
genes passing the expression level cut.off
in 6 stages will be retained in the ExpressionSet
.
min-set
: genes passing the expression level cut.off
in ceiling(n/2)
stages will be retained in the ExpressionSet
, where n is the number of stages in the ExpressionSet
.
n-set
: genes passing the expression level cut.off
in n
stages will be retained in the ExpressionSet
. Here, the argument n
needs to be specified.
Hajk-Georg Drost
data(PhyloExpressionSetExample) # remove genes that have an expression level below 8000 # in at least one developmental stage FilterConst <- Expressed(ExpressionSet = PhyloExpressionSetExample, cut.off = 8000, method = "const", comparison = "below") dim(FilterConst) # check number of retained genes # remove genes that have an expression level below 8000 # in at least 3 developmental stages # (in this case: ceiling(7/2) = 4 stages fulfilling the cut-off criteria) FilterMinSet <- Expressed(ExpressionSet = PhyloExpressionSetExample, cut.off = 8000, method = "min-set", comparison = "below") dim(FilterMinSet) # check number of retained genes # remove genes that have an expression level below 8000 # in at least 5 developmental stages (in this case: n = 2 stages fulfilling the criteria) FilterNSet <- Expressed(ExpressionSet = PhyloExpressionSetExample, cut.off = 8000, method = "n-set", comparison = "below", n = 2) dim(FilterMinSet) # check number of retained genes # remove expression levels that exceed the cut.off criteria FilterMinSet <- Expressed(ExpressionSet = PhyloExpressionSetExample, cut.off = 12000, method = "min-set", comparison = "above") dim(FilterMinSet) # check number of retained genes # remove expression levels that undercut AND exceed the cut.off criteria FilterMinSet <- Expressed(ExpressionSet = PhyloExpressionSetExample, cut.off = c(8000,12000), method = "min-set", comparison = "both") dim(FilterMinSet) # check number of retained genes
data(PhyloExpressionSetExample) # remove genes that have an expression level below 8000 # in at least one developmental stage FilterConst <- Expressed(ExpressionSet = PhyloExpressionSetExample, cut.off = 8000, method = "const", comparison = "below") dim(FilterConst) # check number of retained genes # remove genes that have an expression level below 8000 # in at least 3 developmental stages # (in this case: ceiling(7/2) = 4 stages fulfilling the cut-off criteria) FilterMinSet <- Expressed(ExpressionSet = PhyloExpressionSetExample, cut.off = 8000, method = "min-set", comparison = "below") dim(FilterMinSet) # check number of retained genes # remove genes that have an expression level below 8000 # in at least 5 developmental stages (in this case: n = 2 stages fulfilling the criteria) FilterNSet <- Expressed(ExpressionSet = PhyloExpressionSetExample, cut.off = 8000, method = "n-set", comparison = "below", n = 2) dim(FilterMinSet) # check number of retained genes # remove expression levels that exceed the cut.off criteria FilterMinSet <- Expressed(ExpressionSet = PhyloExpressionSetExample, cut.off = 12000, method = "min-set", comparison = "above") dim(FilterMinSet) # check number of retained genes # remove expression levels that undercut AND exceed the cut.off criteria FilterMinSet <- Expressed(ExpressionSet = PhyloExpressionSetExample, cut.off = c(8000,12000), method = "min-set", comparison = "both") dim(FilterMinSet) # check number of retained genes
This function quantifies the statistical significance of an observed phylotranscriptomic pattern. In detail, the Flat Line Test quantifies any significant deviation of an observed phylotranscriptomic pattern from a flat line.
FlatLineTest( ExpressionSet, permutations = 10000, plotHistogram = FALSE, runs = 10, parallel = FALSE, custom.perm.matrix = NULL )
FlatLineTest( ExpressionSet, permutations = 10000, plotHistogram = FALSE, runs = 10, parallel = FALSE, custom.perm.matrix = NULL )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
permutations |
a numeric value specifying the number of permutations that shall be performed for the FlatLineTest. |
plotHistogram |
a logical value indicating whether a detailed statistical analysis concerning the goodness of fit should be performed. |
runs |
specify the number of runs to be performed for goodness of fit computations. In most cases runs = 100 is a reasonable choice. |
parallel |
performing |
custom.perm.matrix |
a custom |
Internally the function performs N phylotranscriptomics pattern computations (TAI
or TDI
) based on sampled PhyloExpressionSets or DivergenceExpressionSets (see bootMatrix
).
The test statistics is being developed as follows:
The variance V_pattern of the S phylotranscriptomics values defines the test statistic for the FlatLineTest
.
The basic assumption is, that the variance of a flat line should be equivalent to zero for a perfect flat line.
Any deviation from a flat line can be measured with a variance value > 0.
To determine the null distribution of V_p, all PS or DS values within each developmental stage s are randomly permuted, S surrogate phylotranscriptomics values are computed from this permuted dataset, and a surrogate value of V_p is computed from these S phylotranscriptomics values. This permutation process is repeated N times, yielding a histogram of V_p.
After applying a Lilliefors Kolmogorov-Smirnov Test for gamma distribution, V_p is approximated by a gamma distribution.
The two parameters of the gamma distribution are estimated by the function fitdist
from the fitdistrplus package by moment matching estimation.
The fitted gamma distribution is considered the null distribution of V_pattern, and the p-value of the observed value of V_p
is computed from this null distribution.
In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:
(1) A Cullen and Frey skewness-kurtosis plot generated by descdist
).
(2) A histogram of V_p combined with the density plot using the Method of Moments estimated parameters returned by the fitdist
function using a gamma distribution.
(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.
The goodness of fit for the random vector V_p is quantified statistically by an adapted Lilliefors (Kolmogorov-Smirnov) test for gamma distributions.
a list object containing the list elements:
p.value
the p-value quantifying the statistical significance (deviation from a flat line) of the given phylotranscriptomics pattern.
std.dev
the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.
std.dev
the Kolmogorov-Smirnov test satistics for fitting a gamma distribution to the variances of the dataset with permuted phylostrata.
In case there are extreme outlier expression values stored in the dataset (PhyloExpressionSet or DivergenceExpressionSet),
the internal fitdist
function that is based on the bootMatrix
output might return a warning:
"In densfun(x, parm[1], parm[2], ...) : NaNs were produced" which indicates that permutation results caused by extreme outlier expression values
that could not be fitted accordingly. This warning will not be printed out when the corresponding outlier values are extracted from the dataset.
Hajk-Georg Drost
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
M. L. Delignette-Muller, R. Pouillot, J.-B. Denis and C. Dutang (2014), fitdistrplus: help to fit of a parametric distribution to non-censored or censored data.
Cullen AC and Frey HC (1999) Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 81-159.
Evans M, Hastings N and Peacock B (2000) Statistical distributions. John Wiley and Sons Inc.
Sokal RR and Rohlf FJ (1995) Biometry. W.H. Freeman and Company, USA, pp. 111-115.
Juergen Gross and bug fixes by Uwe Ligges (2012). nortest: Tests for Normality. R package version 1.0-2.
http://CRAN.R-project.org/package=nortest
Dallal, G.E. and Wilkinson, L. (1986): An analytic approximation to the distribution of Lilliefors test for normality. The American Statistician, 40, 294-296.
Stephens, M.A. (1974): EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69, 730-737.
http://stackoverflow.com/questions/4290081/fitting-data-to-distributions?rq=1
http://stats.stackexchange.com/questions/45033/can-i-use-kolmogorov-smirnov-test-and-estimate-distribution-parameters
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf
TAI
, TDI
, PlotSignature
, bootMatrix
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) # example PhyloExpressionSet using 100 permutations FlatLineTest(PhyloExpressionSetExample, permutations = 100, plotHistogram = FALSE) # use your own permutation matrix based on which p-values (FlatLineTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) FlatLineTest(PhyloExpressionSetExample, custom.perm.matrix = custom_perm_matrix)
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) # example PhyloExpressionSet using 100 permutations FlatLineTest(PhyloExpressionSetExample, permutations = 100, plotHistogram = FALSE) # use your own permutation matrix based on which p-values (FlatLineTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) FlatLineTest(PhyloExpressionSetExample, custom.perm.matrix = custom_perm_matrix)
This function interfaces with the GATAI software which removes the genes that create the PlotSignature
pattern.
GATAI( ExpressionSet, singlecell_data = FALSE, gatai_results_path = tempdir(), result_overwrite = FALSE, condaenv = NULL )
GATAI( ExpressionSet, singlecell_data = FALSE, gatai_results_path = tempdir(), result_overwrite = FALSE, condaenv = NULL )
ExpressionSet |
a standard PhyloExpressionSet, DivergenceExpressionSet or PolymorphismsExpressionSet object. |
singlecell_data |
a logical value indicating whether or not the input |
gatai_results_path |
where shall GATAI results be stored. Default |
result_overwrite |
Logical value indicating whether local results of previous GATAI run should be over-written and re-computed? |
condaenv |
The conda environment to use. For |
Filipa Martins Costa and Hajk-Georg Drost
This function computes the geometric mean of a numeric input vector x
.
geom.mean(x)
geom.mean(x)
x |
a numeric vector for which geometric mean computations shall be performed. |
Hajk-Georg Drost
x <- 1:10 geom.mean(x)
x <- 1:10 geom.mean(x)
This function performs a test to quantify the statistical significance between the global expression level distributions of groups of PS or DS. It therefore, allows users to investigate significant groups of PS or DS that significantly differ in their gene expression level distibution within specific developmental stages or experiments.
GroupDiffs( ExpressionSet, Groups = NULL, legendName = NULL, stat.test = "wilcox.test", gene.set = NULL, ... )
GroupDiffs( ExpressionSet, Groups = NULL, legendName = NULL, stat.test = "wilcox.test", gene.set = NULL, ... )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
Groups |
a list containing the phylostrata or divergence strata that correspond
to the same phylostratum class or divergence class.
For ex. evolutionary old phylostrata: PS1-3 (Class 1)
and evolutionary young phylostrata: PS4-12 (Class 2). In this case,
the list could be assigned as, |
legendName |
a character string specifying whether "PS" or "DS" are used to compute relative expression profiles. |
stat.test |
the statistical test to quantify PS or DS group differences. |
gene.set |
a character vector storing the gene ids for which group specific differences shall be statistically quantified. |
... |
additional plot parameters. |
The purpose of this function is to detect groups of PS or DS that significantly differ in their gene expression
level distributions on a global (transcriptome) level. Since relative expression levels (PlotRE
) or
PS or DS specific mean expression levels (PlotMeans
) are biased by highly expressed genes,
this function allows users to objectively test the significant difference of transcriptome expression between
groups of PS or DS in a specific developmental stage or experiment.
Hajk-Georg Drost
PlotGroupDiffs
, PlotMeans
, PlotRE
, PlotBarRE
, PlotCategoryExpr
data(PhyloExpressionSetExample) # perform a Wilcoxon Rank Sum test to statistically quantify the # difference between PS-Group 1 expression levels versus PS-Group 2 # expression levels GroupDiffs(ExpressionSet = PhyloExpressionSetExample, Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS") # quantify the significant difference of a selected set of genes set.seed(123) ExampleGeneSet <- sample(PhyloExpressionSetExample[ , 2],5000) GroupDiffs(ExpressionSet = PhyloExpressionSetExample, Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS", gene.set = ExampleGeneSet)
data(PhyloExpressionSetExample) # perform a Wilcoxon Rank Sum test to statistically quantify the # difference between PS-Group 1 expression levels versus PS-Group 2 # expression levels GroupDiffs(ExpressionSet = PhyloExpressionSetExample, Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS") # quantify the significant difference of a selected set of genes set.seed(123) ExampleGeneSet <- sample(PhyloExpressionSetExample[ , 2],5000) GroupDiffs(ExpressionSet = PhyloExpressionSetExample, Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS", gene.set = ExampleGeneSet)
This function computes the harmonic mean of a numeric input vector x
.
harm.mean(x)
harm.mean(x)
x |
a numeric vector for which harmonic mean computations shall be performed. |
Hajk-Georg Drost
x <- 1:10 harm.mean(x)
x <- 1:10 harm.mean(x)
Helper function to test if the user has set up GATAI correctly in their environment.
is_installed_gatai()
is_installed_gatai()
The Reductive Late Conservation Test aims to statistically evaluate the
existence of a monotonically decreasing phylotranscriptomic pattern based on TAI
or TDI
computations.
The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern)
does not follow an late conservation like pattern. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does
indeed follow an late conservation (high-high-low) shape.
LateConservationTest( ExpressionSet, modules = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
LateConservationTest( ExpressionSet, modules = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
modules |
a list storing three elements: early, mid, and late. Each element expects a numeric
vector specifying the developmental stages or experiments that correspond to each module.
For example, |
permutations |
a numeric value specifying the number of permutations to be performed for the |
lillie.test |
a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed to quantify the goodness of fit. |
plotHistogram |
a boolean value specifying whether a Lillifor's Kolmogorov-Smirnov-Test shall be performed to test the goodness of fit of the approximated distribution, as well as additional plots quantifying the significance of the observed phylotranscriptomic pattern. |
runs |
specify the number of runs to be performed for goodness of fit computations, in case |
parallel |
performing |
gof.warning |
a logical value indicating whether non significant goodness of fit results should be printed as warning. Default is |
custom.perm.matrix |
a custom |
The reductive late conservation test is a permutation test based on the following test statistic.
(1) A set of developmental stages is partitioned into three modules - early, mid, and late - based on prior biological knowledge.
(2) The mean TAI
or TDI
value for each of the three modules T_early, T_mid, and T_late are computed.
(3) The two differences D1 = T_early - T_late and D2 = T_mid - T_late are calculated.
(4) The minimum D_min of D1 and D2 is computed as final test statistic of the reductive hourglass test.
In order to determine the statistical significance of an observed minimum difference D_min
the following permutation test was performed. Based on the bootMatrix
D_min
is calculated from each of the permuted TAI
or TDI
profiles,
approximated by a Gaussian distribution with method of moments estimated parameters returned by fitdist
,
and the corresponding p-value is computed by pnorm
given the estimated parameters of the Gaussian distribution.
The goodness of fit for the random vector D_min is statistically quantified by an Lilliefors (Kolmogorov-Smirnov) test
for normality.
In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:
(1) A Cullen and Frey skewness-kurtosis plot generated by descdist
.
This plot illustrates which distributions seem plausible to fit the resulting permutation vector D_min.
In the case of the reductive late conservation test a normal distribution seemed plausible.
(2) A histogram of D_min combined with the density plot is plotted. D_min is then fitted by a normal distribution.
The corresponding parameters are estimated by moment matching estimation using the fitdist
function.
(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.
(4) A barplot showing the number of cases in which the underlying goodness of fit (returned by Lilliefors (Kolmogorov-Smirnov) test
for normality) has shown to be significant (TRUE
) or not significant (FALSE
).
This allows to quantify the permutation bias and their implications on the goodness of fit.
a list object containing the list elements:
p.value
: the p-value quantifying the statistical significance (low-high-high pattern) of the given phylotranscriptomics pattern.
std.dev
: the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.
lillie.test
: a boolean value specifying whether the Lillifors KS-Test returned a p-value > 0.05,
which indicates that fitting the permuted scores with a normal distribution seems plausible.
Hajk-Georg Drost and Jaruwatana Sodai Lotharukpong
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Piasecka B, Lichocki P, Moretti S, et al. (2013) The hourglass and the early conservation models co-existing patterns of developmental constraints in vertebrates. PLoS Genet. 9(4): e1003476.
lcScore
, bootMatrix
, FlatLineTest
,ReductiveHourglassTest
, ReverseHourglassTest
, PlotSignature
data(PhyloExpressionSetExample) # perform the late conservation test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late LateConservationTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), permutations = 1000) # use your own permutation matrix based on which p-values (LateConservationTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) LateConservationTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), custom.perm.matrix = custom_perm_matrix)
data(PhyloExpressionSetExample) # perform the late conservation test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late LateConservationTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), permutations = 1000) # use your own permutation matrix based on which p-values (LateConservationTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) LateConservationTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), custom.perm.matrix = custom_perm_matrix)
This function computes the LateConservationTest score for a given TAI
or TDI
pattern.
The reductive late conservation test is a permutation test based on the following test statistic.
- A set of developmental stages is partitioned into three modules - early, mid, and late - based on prior biological knowledge.
- The mean TAI
or TDI
value for each of the three modules T_early, T_mid, and T_late are computed.
- The two differences D1 = T_early - T_late and D2 = T_mid - T_late are calculated.
- The minimum D_min of D1 and D2 is computed as final test statistic of the reductive late conservation test.
This function lcScore computes the D_min value for a given TAI
or TDI
stored in the age_vals
argument.
lcScore(age_vals, early, mid, late, profile.warn = FALSE)
lcScore(age_vals, early, mid, late, profile.warn = FALSE)
age_vals |
a numeric vector containing |
early |
a numeric vector storing the numeric stage values that correspond to the early phase of development. |
mid |
a numeric vector storing the numeric stage values that correspond to the middle phase of development. |
late |
a numeric vector storing the numeric stage values that correspond to the late phase of development. |
profile.warn |
a boolean value indicating whether a warning is printed when a high-mid-low pattern isn't followed. |
a numeric value representing the late conservation score.
Hajk-Georg Drost and Jaruwatana Sodai Lotharukpong
LateConservationTest
, TAI
, TDI
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # Example PhyloExpressionSet: # compute the TAI profile TAIs <- TAI(PhyloExpressionSetExample) # compute the late conservation score for the TAI profile lc_score <- lcScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7) # Example DivergenceExpressionSet: # compute the TDI profile TDIs <- TDI(DivergenceExpressionSetExample) # compute the late conservation score for the TDI profile lc_score <- lcScore(age_vals = TDIs,early = 1:2,mid = 3:5,late = 6:7) # compute lcScore() vector from bootMatrix() apply(bootMatrix(PhyloExpressionSetExample,10),1,lcScore,early = 1:2,mid = 3:5,late = 6:7) # get warning if the expected pattern isn't followed lc_score <- lcScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7,profile.warn=TRUE)
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # Example PhyloExpressionSet: # compute the TAI profile TAIs <- TAI(PhyloExpressionSetExample) # compute the late conservation score for the TAI profile lc_score <- lcScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7) # Example DivergenceExpressionSet: # compute the TDI profile TDIs <- TDI(DivergenceExpressionSetExample) # compute the late conservation score for the TDI profile lc_score <- lcScore(age_vals = TDIs,early = 1:2,mid = 3:5,late = 6:7) # compute lcScore() vector from bootMatrix() apply(bootMatrix(PhyloExpressionSetExample,10),1,lcScore,early = 1:2,mid = 3:5,late = 6:7) # get warning if the expected pattern isn't followed lc_score <- lcScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7,profile.warn=TRUE)
This function matches a Phylostratigraphic Map or Divergence Map only storing unique gene ids with a ExpressionMatrix also storing only unique gene ids.
MatchMap(Map, ExpressionMatrix, remove.duplicates = FALSE, accumulate = NULL)
MatchMap(Map, ExpressionMatrix, remove.duplicates = FALSE, accumulate = NULL)
Map |
a standard Phylostratigraphic Map or Divergence Map object. |
ExpressionMatrix |
a standard ExpressionMatrix object. |
remove.duplicates |
a logical value indicating whether duplicate gene ids should be removed from the data set. |
accumulate |
an accumulation function such as |
In phylotranscriptomics analyses two major techniques are performed to obtain standard Phylostratigraphic map or Divergence map objects.
To obtain a Phylostratigraphic Map, Phylostratigraphy (Domazet-Loso et al., 2007) has to be performed. To obtain a Divergence Map, orthologous gene detection, Ka/Ks computations, and decilation (Quint et al., 2012; Drost et al., 2015) have to be performed.
The resulting standard Phylostratigraphic Map or Divergence Map objects consist of 2 colums storing the phylostratum assignment or divergence stratum assignment of a given gene in column one, and the corresponding gene id of that gene on columns two.
A standard ExpressionMatrix is a common gene expression matrix storing the gene ids or probe ids in the first column, and all experiments/stages/replicates in the following columns.
The MatchMap function takes both standard datasets: Map and ExpressionMatrix to merge both data sets to obtain a standard PhyloExpressionSet or DivergenceExpressionSet object.
This procedure is analogous to merge
, but is customized to the Phylostratigraphic Map, Divergence Map, and ExpressionMatrix standards to allow a faster and more intuitive usage.
In case you work with an ExpressionMatrix that stores multiple expression levels for a unique gene id, you
can specify the accumulation
argument to accumulate these multiple expression levels to obtain
one expression level for one unique gene.
a standard PhyloExpressionSet or DivergenceExpressionSet object.
Hajk-Georg Drost
Domazet-Loso T, Brajkovic J, Tautz D (2007) A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23: 533-9.
Domazet-Loso T, Tautz D (2010) A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature 468: 815-8.
Quint M., Drost H.G., Gabel A., Ullrich K.K., Boenn M., Grosse I. (2012) A transcriptomic hourglass in plant embryogenesis. Nature 490: 98-101.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
# load a standard PhyloExpressionSet data(PhyloExpressionSetExample) # in a standard PhyloExpressionSet, # column one and column two denote a standard # phylostratigraphic map PhyloMap <- PhyloExpressionSetExample[ , 1:2] # look at the phylostratigraphic map standard head(PhyloMap) # in a standard PhyloExpressionSet, column two combined # with column 3 - N denote a standard ExpressionMatrix ExpressionMatrixExample <- PhyloExpressionSetExample[ , c(2,3:9)] # these two data sets shall illustrate an example # phylostratigraphic map that is returned # by a standard phylostratigraphy run, and a expression set # that is the result of expression data analysis # (background correction, normalization, ...) # now we can use the MatchMap function to merge both data sets # to obtain a standard PhyloExpressionSet PES <- MatchMap(PhyloMap, ExpressionMatrixExample) # note that the function returns a head() # of the matched gene ids to enable # the user to find potential mis-matches # the entire procedure is analogous to merge() # with two data sets sharing the same gene ids # as column (primary key) PES_merge <- merge(PhyloMap, ExpressionMatrixExample)
# load a standard PhyloExpressionSet data(PhyloExpressionSetExample) # in a standard PhyloExpressionSet, # column one and column two denote a standard # phylostratigraphic map PhyloMap <- PhyloExpressionSetExample[ , 1:2] # look at the phylostratigraphic map standard head(PhyloMap) # in a standard PhyloExpressionSet, column two combined # with column 3 - N denote a standard ExpressionMatrix ExpressionMatrixExample <- PhyloExpressionSetExample[ , c(2,3:9)] # these two data sets shall illustrate an example # phylostratigraphic map that is returned # by a standard phylostratigraphy run, and a expression set # that is the result of expression data analysis # (background correction, normalization, ...) # now we can use the MatchMap function to merge both data sets # to obtain a standard PhyloExpressionSet PES <- MatchMap(PhyloMap, ExpressionMatrixExample) # note that the function returns a head() # of the matched gene ids to enable # the user to find potential mis-matches # the entire procedure is analogous to merge() # with two data sets sharing the same gene ids # as column (primary key) PES_merge <- merge(PhyloMap, ExpressionMatrixExample)
For each gene i, exclude the corresponding gene i from the global
PhyloExpressionSet or DivergenceExpressionSet and compute the TAI
or TDI
profile for the corresponding global PhyloExpressionSet or DivergenceExpressionSet
with excluded gene i.
This procedure results in a TAI or TDI profile Matrix storing the TAI or TDI profile for each omitted gene i.
omitMatrix(ExpressionSet)
omitMatrix(ExpressionSet)
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
a numeric matrix storing TAI or TDI profile for each omitted gene i.
Hajk-Georg Drost
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet omMatrix_ps <- omitMatrix(PhyloExpressionSetExample) # example DivergenceExpressionSet omMatrix_ds <- omitMatrix(DivergenceExpressionSetExample)
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet omMatrix_ps <- omitMatrix(PhyloExpressionSetExample) # example DivergenceExpressionSet omMatrix_ds <- omitMatrix(DivergenceExpressionSetExample)
This function computes the PairwiseTest score for a given TAI
or TDI
pattern.
The pair test is a permutation test based on the following test statistic.
- A PhyloExpressionSet is partitioned into contrast pairs - contrast1 and contrast2 - based on prior biological knowledge. This prior knowledge could include sexual, ecological and genetic backgrounds.
- The mean TAI
or TDI
value for each of the two contrasts contrast1 and contrast2 are computed.
- The pairwise differences D_constrast = contrast1 - contrast2 is calculated as final test statistic of the pair test,
when the altHypothesis
is specified as "greater". When the altHypothesis
is specified as "less", sign of D_constrast is reversed.
This function pairScore computes the D_contrast value for a given TAI
or TDI
stored in the age_vals
argument.
pairScore(age_vals, contrast1, contrast2, altHypothesis = NULL)
pairScore(age_vals, contrast1, contrast2, altHypothesis = NULL)
age_vals |
a numeric vector containing |
contrast1 |
a numeric vector storing the numeric stage values that correspond to the query contrast. |
contrast2 |
a numeric vector storing the numeric stage values that correspond to the subject contrast. |
altHypothesis |
a character string defining the alternative hypothesis used to quantify the statistical significance in the present phylotranscriptomics pattern. Possible values can be:
|
a numeric value representing the pair score.
Hajk-Georg Drost and Jaruwatana Sodai Lotharukpong
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # Example PhyloExpressionSet: # compute the TAI profile TAIs <- TAI(PhyloExpressionSetExample) # compute the pair score for the first two stages in the TAI profile # we test whether TAI in contrast1 is greater than contrast 2. pair_score <- pairScore(age_vals = TAIs,contrast1 = 1,contrast2 = 2, altHypothesis="greater") # Example DivergenceExpressionSet: # compute the TDI profile TDIs <- TDI(DivergenceExpressionSetExample) # compute the pair score for the first two stages in the TDI profile # we test whether TDI in contrast1 is greater than contrast 2. pair_score <- pairScore(age_vals = TDIs,contrast1 = 1,contrast2 = 2, altHypothesis="greater") # compute pairScore() vector from bootMatrix() apply(bootMatrix(PhyloExpressionSetExample,10),1, pairScore,contrast1 = 1,contrast2 = 2, altHypothesis="greater")
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # Example PhyloExpressionSet: # compute the TAI profile TAIs <- TAI(PhyloExpressionSetExample) # compute the pair score for the first two stages in the TAI profile # we test whether TAI in contrast1 is greater than contrast 2. pair_score <- pairScore(age_vals = TAIs,contrast1 = 1,contrast2 = 2, altHypothesis="greater") # Example DivergenceExpressionSet: # compute the TDI profile TDIs <- TDI(DivergenceExpressionSetExample) # compute the pair score for the first two stages in the TDI profile # we test whether TDI in contrast1 is greater than contrast 2. pair_score <- pairScore(age_vals = TDIs,contrast1 = 1,contrast2 = 2, altHypothesis="greater") # compute pairScore() vector from bootMatrix() apply(bootMatrix(PhyloExpressionSetExample,10),1, pairScore,contrast1 = 1,contrast2 = 2, altHypothesis="greater")
The Pairwise Difference Test aims to statistically evaluate the pairwise
difference in the phylotranscriptomic pattern between two contrasts based on TAI
or TDI
computations.
The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern)
does not differ from the alternative hypothesis (specifying the direction of difference).
A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does
indeed differ.
PairwiseTest( ExpressionSet, modules = NULL, altHypothesis = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
PairwiseTest( ExpressionSet, modules = NULL, altHypothesis = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
modules |
a list storing two elements: contrast1 and contrast2. Each element expects a numeric
vector specifying the developmental stages or experiments that correspond to each module.
For example, |
altHypothesis |
a character string defining the alternative hypothesis (i.e. direction of difference) used to quantify the statistical significance in the present phylotranscriptomics pattern. Possible values can be:
|
permutations |
a numeric value specifying the number of permutations to be performed for the |
lillie.test |
a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed to quantify the goodness of fit. |
plotHistogram |
a boolean value specifying whether a Lillifor's Kolmogorov-Smirnov-Test shall be performed to test the goodness of fit of the approximated distribution, as well as additional plots quantifying the significance of the observed phylotranscriptomic pattern. |
runs |
specify the number of runs to be performed for goodness of fit computations, in case |
parallel |
performing |
gof.warning |
a logical value indicating whether non significant goodness of fit results should be printed as warning. Default is |
custom.perm.matrix |
a custom |
The reductive pairwise difference test is a permutation test based on the following test statistic.
(1) A PhyloExpressionSet is partitioned into contrast pairs - contrast1 and contrast2 - based on prior biological knowledge. This prior knowledge could include sexual, ecological and genetic backgrounds.
(2) The mean TAI
or TDI
value for each of the two contrasts contrast1 and contrast2 are computed.
(3) The pairwise differences D_contrast = contrast1 - contrast2 is calculated as final test statistic of the pairwise test,
when altHypothesis
is specified as "greater". When altHypothesis
is specified as "less", sign of D_contrast is reversed.
In order to determine the statistical significance of an observed pairwise difference D_contrast
the following permutation test was performed. Based on the bootMatrix
D_contrast
is calculated from each of the permuted TAI
or TDI
profiles,
approximated by a Gaussian distribution with method of moments estimated parameters returned by fitdist
,
and the corresponding p-value is computed by pnorm
given the estimated parameters of the Gaussian distribution.
The goodness of fit for the random vector D_contrast is statistically quantified by an Lilliefors (Kolmogorov-Smirnov) test
for normality.
In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:
(1) A Cullen and Frey skewness-kurtosis plot generated by descdist
.
This plot illustrates which distributions seem plausible to fit the resulting permutation vector D_contrast
In the case of the pairwise difference test a normal distribution seemed plausible.
(2) A histogram of D_contrast combined with the density plot is plotted. D_contrast is then fitted by a normal distribution.
The corresponding parameters are estimated by moment matching estimation using the fitdist
function.
(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.
(4) A barplot showing the number of cases in which the underlying goodness of fit (returned by Lilliefors (Kolmogorov-Smirnov) test
for normality) has shown to be significant (TRUE
) or not significant (FALSE
).
This allows to quantify the permutation bias and their implications on the goodness of fit.
a list object containing the list elements:
p.value
: the p-value quantifying the statistical significance of the given phylotranscriptomics pattern.
std.dev
: the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.
lillie.test
: a boolean value specifying whether the Lillifors KS-Test returned a p-value > 0.05,
which indicates that fitting the permuted scores with a normal distribution seems plausible.
Hajk-Georg Drost and Jaruwatana Sodai Lotharukpong
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
Lotharukpong JS et al. (2024) (unpublished)
lcScore
, bootMatrix
, FlatLineTest
,ReductiveHourglassTest
, ReverseHourglassTest
, PlotSignature
data(PhyloExpressionSetExample) # perform the pairwise difference test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to contrast1, # stages 3-7 correspond to contrast 2. # We test whether TAI in contrast1 is greater than contrast 2. PairwiseTest(PhyloExpressionSetExample, modules = list(contrast1 = 1:2, contrast2 = 3:7), altHypothesis = "greater", permutations = 1000) # We can also test whether TAI in contrast1 is less than contrast 2. PairwiseTest(PhyloExpressionSetExample, modules = list(contrast1 = 1:2, contrast2 = 3:7), altHypothesis = "less", permutations = 1000) # if we only want to test whether TAI in stage 1 (contrast 1) is greater than stage 3 (contrast 2). PairwiseTest(PhyloExpressionSetExample, modules = list(contrast1 = 1, contrast2 = 2), altHypothesis = "greater", permutations = 1000) # use your own permutation matrix based on which p-values (PairwiseTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) # We test whether TAI in contrast1 is greater than contrast 2. PairwiseTest(PhyloExpressionSetExample, modules = list(contrast1 = 1:2, contrast2 = 3:7), altHypothesis = "greater", custom.perm.matrix = custom_perm_matrix)
data(PhyloExpressionSetExample) # perform the pairwise difference test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to contrast1, # stages 3-7 correspond to contrast 2. # We test whether TAI in contrast1 is greater than contrast 2. PairwiseTest(PhyloExpressionSetExample, modules = list(contrast1 = 1:2, contrast2 = 3:7), altHypothesis = "greater", permutations = 1000) # We can also test whether TAI in contrast1 is less than contrast 2. PairwiseTest(PhyloExpressionSetExample, modules = list(contrast1 = 1:2, contrast2 = 3:7), altHypothesis = "less", permutations = 1000) # if we only want to test whether TAI in stage 1 (contrast 1) is greater than stage 3 (contrast 2). PairwiseTest(PhyloExpressionSetExample, modules = list(contrast1 = 1, contrast2 = 2), altHypothesis = "greater", permutations = 1000) # use your own permutation matrix based on which p-values (PairwiseTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) # We test whether TAI in contrast1 is greater than contrast 2. PairwiseTest(PhyloExpressionSetExample, modules = list(contrast1 = 1:2, contrast2 = 3:7), altHypothesis = "greater", custom.perm.matrix = custom_perm_matrix)
A standard PhyloExpressionSet is a data.frame
consisting of a standardized sequence of columns
to store the age information for each gene and its corresponding gene expression profile.
The standard is defined as follows:
Phylostratum | GeneID | Expression-level 1 | ... | Expression-level N
This dataset covers 7 developmental stages of Arabidopsis thaliana embryo development. The initial gene expression dataset was published by Xiang et al., 2011 (see references section) and was then used by Quint et al., 2012 (see references section) to assign evolutionary ages to each gene expression profile.
a standard PhyloExpressionSet object.
Hajk-Georg Drost
http://www.plantphysiol.org/content/156/1/346/suppl/DC1
Quint M et al. 2012. "A transcriptomic hourglass in plant embryogenesis". Nature (490): 98-101. Supplementary Table 2 : http://www.nature.com/nature/journal/v490/n7418/full/nature11394.html
Xiang D et al. 2011. "Genome-Wide Analysis Reveals Gene Expression and Metabolic Network Dynamics during Embryo Development in Arabidopsis". Plant Physiology (156): 346-356. Supplemental Table 1 : http://www.plantphysiol.org/content/156/1/346/suppl/DC1
DivergenceExpressionSetExample
This function takes a PhyloExpressionSet or DivergenceExpressionSet object as input and computes
for two or more defined phylostratum (divergence stratum) classes the statistical significance of
the differences of mean relative expression of these two (or more) corresponding phylostratum (divergence stratum) classes.
As test-statistic, the function performs a nonparametric kruskal.test
based on relative expression values stored within each defined phylostratum class.
PlotBarRE( ExpressionSet, Groups = NULL, wLength = 0.1, ratio = FALSE, p.adjust.method = NULL, ... )
PlotBarRE( ExpressionSet, Groups = NULL, wLength = 0.1, ratio = FALSE, p.adjust.method = NULL, ... )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
Groups |
a list containing the phylostrata or divergence-strata that correspond to the same phylostratum class or divergence class.
For ex. evolutionary old phylostrata: PS1-3 (Class 1) and evolutionary young phylostrata: PS4-12 (Class 2).
In this case, the |
wLength |
a numeric value defining the whiskers length above the bars. In case there are numerous different phylostratum classes
a smaller |
ratio |
a boolean value specifying whether the bars in the barplot represent the
mean relative expression level of phylostrata belonging to the same phylostratum class.
In case |
p.adjust.method |
correction method to adjust p-values for multiple comparisons (see |
... |
default graphics parameters. |
In case a large number of developmental stages is included in the input ExpressionSet
,
p-values returned by PlotBarRE
should be adjusted for multiple comparisons which can be done
by specifying the p.adjust.method
argument.
A barplot comparing Phylostratum-Classes by its mean relative expression levels. Significant stages are marked by '*' referring to statistically significant differences:
(1) '*' = P-Value <= 0.05
(2) '**' = P-Value <= 0.005
(3) '***' = P-Value <= 0.0005
Hajk-Georg Drost
Quint M et al. 2012. "A transcriptomic hourglass in plant embryogenesis". Nature (490): 98-101.
Domazet-Loso T. and Tautz D. 2010. "A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns". Nature (468): 815-818.
Myles Hollander and Douglas A. Wolfe (1973), Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 115-120.
RE
, REMatrix
,PlotRE
, kruskal.test
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet PlotBarRE(ExpressionSet = PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12))) # example DivergenceExpressionSet PlotBarRE(ExpressionSet = DivergenceExpressionSetExample, Groups = list(c(1:5), c(6:10))) # Perform PlotBarRE() with p-value adjustment method Benjamini & Hochberg (1995) PlotBarRE(ExpressionSet = PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), p.adjust.method = "BH") # Example: plot ratio # the ratio curve visualizes the ratio between bar 1 / bar 2 # the z - axis shows the corresponding ratio value of bar 1 / bar 2 PlotBarRE(ExpressionSet = PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), ratio = TRUE)
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet PlotBarRE(ExpressionSet = PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12))) # example DivergenceExpressionSet PlotBarRE(ExpressionSet = DivergenceExpressionSetExample, Groups = list(c(1:5), c(6:10))) # Perform PlotBarRE() with p-value adjustment method Benjamini & Hochberg (1995) PlotBarRE(ExpressionSet = PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), p.adjust.method = "BH") # Example: plot ratio # the ratio curve visualizes the ratio between bar 1 / bar 2 # the z - axis shows the corresponding ratio value of bar 1 / bar 2 PlotBarRE(ExpressionSet = PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), ratio = TRUE)
This function visualizes the expression level distribution of each phylostratum during each time point or experiment as boxplot, dot plot, or violin plot enabling users to quantify the age (PS) or divergence (DS) category specific contribution to the corresponding transcriptome.
PlotCategoryExpr( ExpressionSet, legendName, test.stat = TRUE, type = "category-centered", distr.type = "boxplot", y.ticks = 10, log.expr = FALSE, gene.set = NULL )
PlotCategoryExpr( ExpressionSet, legendName, test.stat = TRUE, type = "category-centered", distr.type = "boxplot", y.ticks = 10, log.expr = FALSE, gene.set = NULL )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
legendName |
a character string specifying whether "PS" or "DS" are used to compute relative expression profiles. |
test.stat |
a logical value indicating whether a Benjamini-Hochberg adjusted |
type |
type of age or divergence category comparison. Specifications can be |
distr.type |
format of visualizing age or divergence category specific expression distributions. Either |
y.ticks |
number of y-axis ticks (default |
log.expr |
a logical value specifying whether or not expression levels should internally be log2-transformed before visualization. |
gene.set |
a character vector storing the gene ids for which gene expression levels shall be visualized. |
This way of visualizing the gene expression distribution of each age (PS) or divergence (DS) category during all developmental stages or experiments allows users to detect specific age or divergence categories contributing significant levels of gene expression to the underlying biological process (transcriptome).
This quantification allows users to conclude that genes originating in specific PS or DS contribute significantly more to the overall transcriptome
than other genes originating from different PS or DS categories. More specialized analyses such as PlotMeans
, PlotRE
,
PlotBarRE
, etc. will then allow to study the exact mean expression patterns of these age or divergence categories.
The statistical quantification of differences between expression levels of different age or divergence categories
is done by performing a kruskal.test
with Benjamini & Hochberg p-value adjustment for multiple comparisons.
type = "category-centered"
Here, the kruskal.test
quantifies the differences of gene expression between all combinations of age or divergence categories for each stage or experiment separately. Here, a significant p-value quantifies that there is at least one pairwise comparison for which age or divergence categories significantly differ in their gene expression distribution. This type of analysis allows users to detect stages or experiments that show high diviation between age or divergence category contributions to the overall transcriptome or no significant deviations of age or divergence categories, suggesting equal age or divergence category contributions to the overall transcriptome.
type = "stage-centered"
Here, the kruskal.test
quantifies the differences of gene expression between all stages or experiments for each age or divergence category separately. Hence, the test quantifies whether or not the gene expression distribution of a single age or divergence category significantly changes throughout development or experiments. This type of analysis allows users to detect specific age or divergence categories that significantly change their expression levels throughout development or experiments.
Argument Specifications:
Argument: type
type = "category-centered"
This specification allows users to compare the differences between all age or
divergence categories during the same stage or experiment.
type = "stage-centered"
This specification allows users to compare the differences between all age or
divergence categories between stages or experiments.
Argument: distr.type
distr.type = "boxplot"
This specification allows users to visualize the expression distribution of all PS or DS as boxplot.
distr.type = "violin"
This specification allows users to visualize the expression distribution of all PS or DS as violin plot.
distr.type = "dotplot"
This specification allows users to visualize the expression distribution of all PS or DS as dot plot.
Finally, users can specify a gene.set
(a subset of genes included in the input ExpressioSet
)
for which expression levels should be visualized as boxplot, dotplot, or violinplot.
A boxplot, violin plot, or dot plot visualizing the gene expression levels of different PS or DS categories.
Furthermore, the statistical test results returned from the kruskal.test
are printed
to the console.
(1) '*' = P-Value <= 0.05
(2) '**' = P-Value <= 0.005
(3) '***' = P-Value <= 0.0005
(4) 'n.s.' = not significant = P-Value > 0.05
Hajk-Georg Drost
PlotMeans
, PlotRE
, PlotBarRE
, age.apply
,
pTAI
, pTDI
, pStrata
, pMatrix
, TAI
, TDI
data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) ## Not run: # category-centered visualization of PS specific expression level distributions (log-scale) PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample, legendName = "PS", test.stat = TRUE, type = "category-centered", distr.type = "boxplot", log.expr = TRUE) # stage-centered visualization of PS specific expression level distributions (log-scale) PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample, legendName = "PS", test.stat = TRUE, distr.type = "boxplot", type = "stage-centered", log.expr = TRUE) # category-centered visualization of PS specific expression level distributions (log-scale) # as violoin plot PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample, legendName = "PS", test.stat = TRUE, distr.type = "violin", type = "stage-centered", log.expr = TRUE) # analogous for DivergenceExpressionSets PlotCategoryExpr(ExpressionSet = DivergenceExpressionSetExample, legendName = "DS", test.stat = TRUE, type = "category-centered", distr.type = "boxplot", log.expr = TRUE) # visualize the expression levels of 500 example genes set.seed(234) example.gene.set <- PhyloExpressionSetExample[sample(1:25260,500) , 2] PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample, legendName = "PS", test.stat = TRUE, type = "category-centered", distr.type = "boxplot", log.expr = TRUE, gene.set = example.gene.set) ## End(Not run)
data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) ## Not run: # category-centered visualization of PS specific expression level distributions (log-scale) PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample, legendName = "PS", test.stat = TRUE, type = "category-centered", distr.type = "boxplot", log.expr = TRUE) # stage-centered visualization of PS specific expression level distributions (log-scale) PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample, legendName = "PS", test.stat = TRUE, distr.type = "boxplot", type = "stage-centered", log.expr = TRUE) # category-centered visualization of PS specific expression level distributions (log-scale) # as violoin plot PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample, legendName = "PS", test.stat = TRUE, distr.type = "violin", type = "stage-centered", log.expr = TRUE) # analogous for DivergenceExpressionSets PlotCategoryExpr(ExpressionSet = DivergenceExpressionSetExample, legendName = "DS", test.stat = TRUE, type = "category-centered", distr.type = "boxplot", log.expr = TRUE) # visualize the expression levels of 500 example genes set.seed(234) example.gene.set <- PhyloExpressionSetExample[sample(1:25260,500) , 2] PlotCategoryExpr(ExpressionSet = PhyloExpressionSetExample, legendName = "PS", test.stat = TRUE, type = "category-centered", distr.type = "boxplot", log.expr = TRUE, gene.set = example.gene.set) ## End(Not run)
Function to plot and compare the confidence intervals of Transcriptome Index between transformed and non-transformed expression data
by using bootstrapping appraoches instead of permutation tests used in PlotSignature
.
PlotCIRatio(ExpressionSet, measure, nbootstraps)
PlotCIRatio(ExpressionSet, measure, nbootstraps)
ExpressionSet |
a standard PhyloExpressionSet, DivergenceExpressionSet or PolymorphismsExpressionSet object. |
measure |
type of transcriptome index that shall be computed. E.g. measure = "TAI" (Transcriptome Age Index), measure = "TDI" (Transcriptome Divergence Index), measure = "TPI" (Transcriptome Polymorphism Index). |
nbootstraps |
number of independent bootstraps. |
This function can be used to check potential outliers (e.g. a few exramly highly expressed genes) in transcriptome. Since Transcriptome Index is weigthed mean, it could be easily affectd by outliers. So, we have to check potential outliers in the transcriptome data. Because log or sqrt trandformation can alleviate the effect of outliers, if there are some outliers, we could see the confidence intervals (genetated by bootstrapping) from non-trandformed expression data are much higher and more variable than from log or sqrt trandformed expression data. In order to compare the range of confidence intervals in the same scale, we plotted the ratio of upper to lower confidence interval boundary across development.
Jialin Liu
data("PhyloExpressionSetExample") PlotCIRatio(PhyloExpressionSetExample,"TAI",5)
data("PhyloExpressionSetExample") PlotCIRatio(PhyloExpressionSetExample,"TAI",5)
This function computes the cumulative contribution of each Phylostratum or Divergence Stratum to the global TAI
or TDI
profile.
PlotContribution( ExpressionSet, legendName = NULL, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", y.ticks = 10 )
PlotContribution( ExpressionSet, legendName = NULL, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", y.ticks = 10 )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
legendName |
a character string specifying whether "PS" or "DS" are used to compute relative expression profiles. |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
main title. |
y.ticks |
a numeric value specifying the number of ticks to be drawn on the y-axis. |
Introduced by Domazet-Loso and Tautz (2010), this function allows users to visualize the cumulative contribution of each Phylostratum or Divergence Stratum to the global Transcriptome Age Index or Transcriptome Divergence Index profile to quantify how each Phylostratum or Divergence Stratum influences the profile of the global TAI or TDI pattern.
Hajk-Georg Drost
Domazet-Loso T. and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
pTAI
, pTDI
, TAI
, TDI
, PlotSignature
data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # visualize phylostratum contribution to global TAI PlotContribution(PhyloExpressionSetExample, legendName = "PS") # visualize divergence stratum contribution to global TDI PlotContribution(DivergenceExpressionSetExample, legendName = "DS")
data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # visualize phylostratum contribution to global TAI PlotContribution(PhyloExpressionSetExample, legendName = "PS") # visualize divergence stratum contribution to global TDI PlotContribution(DivergenceExpressionSetExample, legendName = "DS")
This function plots the correlation coefficient between phylostratum values and divergence-stratum values of a given PhyloExpressionSet and DivergenceExpressionSet.
This function can be used to test whether a given PS distribution and DS distribution are linear correlated so that the independence of PS and DS can be assumed for subsequent analyses (Quint et al., 2012).
PlotCorrelation( PhyloExpressionSet, DivergenceExpressionSet, method = "pearson", linearModel = FALSE, xlab = "Phylostratum", ylab = "Divergencestratum" )
PlotCorrelation( PhyloExpressionSet, DivergenceExpressionSet, method = "pearson", linearModel = FALSE, xlab = "Phylostratum", ylab = "Divergencestratum" )
PhyloExpressionSet |
a standard PhyloExpressionSet object. |
DivergenceExpressionSet |
a standard DivergenceExpressionSet object. |
method |
a character string specifying the correlation method to cbe used, e.g. "pearson", "kendall", "spearman". |
linearModel |
a boolean value specifying whether a linear model should be fitted to the data and furthermore, should be visualized in the corresponding plot. |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
a jitter-correlation-plot of PS and DS correlation.
Hajk-Georg Drost
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101. Drost HG et al. (2015) Evidence for Active Maintenance of Phylotranscriptomic Hourglass Patterns in Animal and Plant Embryogenesis. Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012.
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # plot the PS and DS correlation PlotCorrelation(PhyloExpressionSetExample, DivergenceExpressionSetExample, method = "pearson", linearModel = TRUE)
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # plot the PS and DS correlation PlotCorrelation(PhyloExpressionSetExample, DivergenceExpressionSetExample, method = "pearson", linearModel = TRUE)
This function plots the frequency distribution of genes within the corresponding phylostratigraphic map or divergence map and can be used to fastly visualize the PS or DS distribution of a given phylostratum vector or divergence-stratum vector.
PlotDistribution( PhyloExpressionSet, legendName = "PS", as.ratio = FALSE, use.only.map = FALSE, xlab = NULL, ylab = NULL )
PlotDistribution( PhyloExpressionSet, legendName = "PS", as.ratio = FALSE, use.only.map = FALSE, xlab = NULL, ylab = NULL )
PhyloExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
legendName |
a character string specifying whether "PS" or "DS" are visualized. |
as.ratio |
a boolean value specifying whether the relative frequencies instead of absolute frequencies shall be plotted. |
use.only.map |
logical value indicating whether or not a Phylostratigraphic Map or Divergence Map should be passed to the |
xlab |
label of the x-axis. |
ylab |
label of the y-axis. |
The frequency distribution of all genes or a subset of genes might be of interest for subsequent analyses.
For Example:
Filtering genes using gene cluster algorithms can result in different groups (classes) of genes. For each gene group the phylostratum or divergence-stratum distribution can be visualized using this function and can be compared between different groups.
This analysis allows to compare different gene expression profiles (or gene groups in general) based on their evolutionary origins or evolutionary relationships.
a barplot showing the phylostratum distribution or divergence-stratum distribution of a given numeric vector containing PS or DS values.
Hajk-Georg Drost
# load PhyloExpressionSet data(PhyloExpressionSetExample) # plot the phylostratum distribution of a PhyloExpressionSet PlotDistribution(PhyloExpressionSetExample) # plot the relative frequency distribution of a PhyloExpressionSet PlotDistribution(PhyloExpressionSetExample, as.ratio = TRUE) # a example for visualizing the PS distribution for a subset of genes PlotDistribution(PhyloExpressionSetExample[sample(20000,5000) , ], as.ratio = TRUE)
# load PhyloExpressionSet data(PhyloExpressionSetExample) # plot the phylostratum distribution of a PhyloExpressionSet PlotDistribution(PhyloExpressionSetExample) # plot the relative frequency distribution of a PhyloExpressionSet PlotDistribution(PhyloExpressionSetExample, as.ratio = TRUE) # a example for visualizing the PS distribution for a subset of genes PlotDistribution(PhyloExpressionSetExample[sample(20000,5000) , ], as.ratio = TRUE)
This function computes and visualizes the significance of enriched (over or underrepresented) Phylostrata or Divergence Strata within an input test.set
.
PlotEnrichment( ExpressionSet, test.set, use.only.map = FALSE, measure = "log-foldchange", complete.bg = TRUE, legendName = "", over.col = "steelblue", under.col = "midnightblue", epsilon = 1e-05, cex.legend = 1, cex.asterisk = 1, plot.bars = TRUE, p.adjust.method = NULL, ... )
PlotEnrichment( ExpressionSet, test.set, use.only.map = FALSE, measure = "log-foldchange", complete.bg = TRUE, legendName = "", over.col = "steelblue", under.col = "midnightblue", epsilon = 1e-05, cex.legend = 1, cex.asterisk = 1, plot.bars = TRUE, p.adjust.method = NULL, ... )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object (in case |
test.set |
a character vector storing the gene ids for which PS/DS enrichment analyses should be performed. |
use.only.map |
a logical value indicating whether instead of a standard |
measure |
a character string specifying the measure that should be used to quantify over and under representation of PS/DS. Measures can either be |
complete.bg |
a logical value indicating whether the entire background set
of the input |
legendName |
a character string specifying whether "PS" or "DS" are used to compute relative expression profiles. |
over.col |
color of the overrepresentation bars. |
under.col |
color of the underrepresentation bars. |
epsilon |
a small value to shift values by epsilon to avoid log(0) computations. |
cex.legend |
the |
cex.asterisk |
the |
plot.bars |
a logical value specifying whether or not bars should be visualized or whether only |
p.adjust.method |
correction method to adjust p-values for multiple comparisons (see |
... |
default graphics parameters. |
This Phylostratum or Divergence Stratum enrichment analysis is motivated by Sestak and Domazet-Loso (2015) who perform Phylostratum or Divergence Stratum enrichment analyses to correlate organ evolution with the origin of organ specific genes.
In detail this function takes the Phylostratum or Divergence Stratum distribution of all genes stored in the input ExpressionSet
as background set and
the Phylostratum or Divergence Stratum distribution of the test.set
and performes a fisher.test
for each Phylostratum or Divergence Stratum to quantify the statistical significance of over- or underrepresentated Phylostrata or Divergence Strata within the set of selected test.set
genes.
To visualize the odds or log-odds of over or underrepresented genes within the test.set
the following procedure is performed:
N_ij denotes the number of genes in group j and deriving from PS i, with i = 1, .. , n and where j = 1 denotes the background set and j = 2 denotes the test.set
N_i. denotes the total number of genes within PS i
N_.j denotes the total number of genes within group j
N_.. is the total number of genes within all groups j and all PS i
f_ij = N_ij / N_.. and g_ij = f_ij / f_.j denote relative frequencies between groups
f_i. denotes the between group sum of f_ij
The result is the fold-change value (odds) denoted as C = g_i2 / f_i. which is visualized above and below zero.
In case a large number of Phylostrata or Divergence Strata is included in the input
ExpressionSet
, p-values returned by PlotEnrichment
should be adjusted for
multiple comparisons which can be done by specifying the p.adjust.method
argument.
Hajk-Georg Drost
Sestak and Domazet-Loso (2015). Phylostratigraphic Profiles in Zebrafish Uncover Chordate Origins of the Vertebrate Brain. Mol. Biol. Evol. 32(2): 299-312.
data(PhyloExpressionSetExample) set.seed(123) test_set <- sample(PhyloExpressionSetExample[ , 2],10000) ## Examples with complete.bg = TRUE ## Hence: the entire background set of the input ExpressionSet is considered ## when performing Fisher's exact test # measure: log-foldchange PlotEnrichment(ExpressionSet = PhyloExpressionSetExample, test.set = test_set , legendName = "PS", measure = "log-foldchange") # measure: foldchange PlotEnrichment(ExpressionSet = PhyloExpressionSetExample, test.set = test_set , legendName = "PS", measure = "foldchange") ## Examples with complete.bg = FALSE ## Hence: the test.set genes are excluded from the background set before ## Fisher's exact test is performed # measure: log-foldchange PlotEnrichment(ExpressionSet = PhyloExpressionSetExample, test.set = test_set , complete.bg = FALSE, legendName = "PS", measure = "log-foldchange") # measure: foldchange PlotEnrichment(ExpressionSet = PhyloExpressionSetExample, test.set = test_set , complete.bg = FALSE, legendName = "PS", measure = "foldchange")
data(PhyloExpressionSetExample) set.seed(123) test_set <- sample(PhyloExpressionSetExample[ , 2],10000) ## Examples with complete.bg = TRUE ## Hence: the entire background set of the input ExpressionSet is considered ## when performing Fisher's exact test # measure: log-foldchange PlotEnrichment(ExpressionSet = PhyloExpressionSetExample, test.set = test_set , legendName = "PS", measure = "log-foldchange") # measure: foldchange PlotEnrichment(ExpressionSet = PhyloExpressionSetExample, test.set = test_set , legendName = "PS", measure = "foldchange") ## Examples with complete.bg = FALSE ## Hence: the test.set genes are excluded from the background set before ## Fisher's exact test is performed # measure: log-foldchange PlotEnrichment(ExpressionSet = PhyloExpressionSetExample, test.set = test_set , complete.bg = FALSE, legendName = "PS", measure = "log-foldchange") # measure: foldchange PlotEnrichment(ExpressionSet = PhyloExpressionSetExample, test.set = test_set , complete.bg = FALSE, legendName = "PS", measure = "foldchange")
This function simply visualizes the gene expression profiles of
a defined subset of genes stored in the input ExpressionSet
.
PlotGeneSet( ExpressionSet, gene.set, get.subset = FALSE, use.only.map = FALSE, colors = NULL, plot.legend = TRUE, y.ticks = 6, digits.ylab = 4, ... )
PlotGeneSet( ExpressionSet, gene.set, get.subset = FALSE, use.only.map = FALSE, colors = NULL, plot.legend = TRUE, y.ticks = 6, digits.ylab = 4, ... )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
gene.set |
a character vector storing the gene ids for which gene expression profiles shall be visualized. |
get.subset |
a logical value indicating whether or not an |
use.only.map |
a logical value indicating whether instead of a standard |
colors |
colors for gene expression profiles. Default: |
plot.legend |
a logical value indicating whether gene ids should be printed as legend next to the plot. |
y.ticks |
a numeric value specifying the number of ticks to be drawn on the y-axis. |
digits.ylab |
a numeric value specifying the number of digits shown for the expression levels on the y-axis. |
... |
additional parameters passed to |
This function simply visualizes or subsets the gene expression levels of a set of genes
that are stored in the input ExpressionSet
.
Hajk-Georg Drost
SelectGeneSet
, PlotEnrichment
, DiffGenes
data(PhyloExpressionSetExample) # the best parameter setting to visualize this plot: # png("test_png.png",700,400) PlotGeneSet(ExpressionSet = PhyloExpressionSetExample, gene.set = PhyloExpressionSetExample[1:5, 2], type = "l", lty = 1, lwd = 4, xlab = "Ontogeny", ylab = "Expression Level") # dev.off() # In case you would like to work with the expression levels # of selected genes you can specify the 'get.subset' argument: PlotGeneSet(ExpressionSet = PhyloExpressionSetExample, gene.set = PhyloExpressionSetExample[1:5, 2], get.subset = TRUE) # get a gene subset using only a phylostratihraphic map ExamplePSMap <- PhyloExpressionSetExample[ , 1:2] PlotGeneSet(ExpressionSet = ExamplePSMap, gene.set = PhyloExpressionSetExample[1:5, 2], get.subset = TRUE, use.only.map = TRUE)
data(PhyloExpressionSetExample) # the best parameter setting to visualize this plot: # png("test_png.png",700,400) PlotGeneSet(ExpressionSet = PhyloExpressionSetExample, gene.set = PhyloExpressionSetExample[1:5, 2], type = "l", lty = 1, lwd = 4, xlab = "Ontogeny", ylab = "Expression Level") # dev.off() # In case you would like to work with the expression levels # of selected genes you can specify the 'get.subset' argument: PlotGeneSet(ExpressionSet = PhyloExpressionSetExample, gene.set = PhyloExpressionSetExample[1:5, 2], get.subset = TRUE) # get a gene subset using only a phylostratihraphic map ExamplePSMap <- PhyloExpressionSetExample[ , 1:2] PlotGeneSet(ExpressionSet = ExamplePSMap, gene.set = PhyloExpressionSetExample[1:5, 2], get.subset = TRUE, use.only.map = TRUE)
This function performs a test to quantify the statistical significance between the global expression level distributions of groups of PS or DS. It therefore, allows users to investigate significant groups of PS or DS that significantly differ in their gene expression level distibution within specific developmental stages or experiments.
PlotGroupDiffs( ExpressionSet, Groups = NULL, legendName = NULL, stat.test = "wilcox.test", col = c("turquoise3", "magenta3"), plot.type = NULL, gene.set = NULL, ... )
PlotGroupDiffs( ExpressionSet, Groups = NULL, legendName = NULL, stat.test = "wilcox.test", col = c("turquoise3", "magenta3"), plot.type = NULL, gene.set = NULL, ... )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
Groups |
a list containing the phylostrata or divergence strata that correspond
to the same phylostratum class or divergence class.
For ex. evolutionary old phylostrata: PS1-3 (Class 1)
and evolutionary young phylostrata: PS4-12 (Class 2). In this case,
the list could be assigned as, |
legendName |
a character string specifying whether "PS" or "DS" are used to compute relative expression profiles. |
stat.test |
the statistical test to quantify PS or DS group differences. |
col |
colors for the two box plots representing the expression level distributions of selected PS/DS groups. |
plot.type |
the type of plot that shall be drawn to visualized the difference in PS/DS group specific expression . |
gene.set |
a character vector storing the gene ids for which group specific differences shall be statistically quantified. |
... |
additional plot parameters. |
The purpose of this function is to detect groups of PS or DS that significantly differ in their gene expression
level distributions on a global (transcriptome) level. Since relative expression levels (PlotRE
) or
PS or DS specific mean expression levels (PlotMeans
) are biased by highly expressed genes,
this function allows users to objectively test the significant difference of transcriptome expression between
groups of PS or DS in a specific developmental stage or experiment.
In particular, this function divides (for each developmental stage separately) the gene expression levels into two groups: Group1 = genes deriving from selected PS/DS in group 1 and
Group2 = genes deriving from selected PS/DS in group 2. Within each stage the expression level distributions between group 1 and group 2 are statistically quantified using a wilcox.test
.
Hajk-Georg Drost
PlotMeans
, PlotRE
, PlotBarRE
, PlotCategoryExpr
, GroupDiffs
data(PhyloExpressionSetExample) PlotGroupDiffs(ExpressionSet = PhyloExpressionSetExample, Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS", type = "b", lwd = 6, xlab = "Ontogeny") # only receive the p-values without the corresponding plot PlotGroupDiffs(ExpressionSet = PhyloExpressionSetExample, Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS", plot.p.vals = FALSE, type = "b", lwd = 6, xlab = "Ontogeny") # quantify the significant difference of a selected set of genes # only receive the p-values without the corresponding plot set.seed(123) ExampleGeneSet <- sample(PhyloExpressionSetExample[ , 2],5000) PlotGroupDiffs(ExpressionSet = PhyloExpressionSetExample, Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS", plot.p.vals = FALSE, gene.set = ExampleGeneSet) # plot differences as boxplot for each developmental stage PlotGroupDiffs(ExpressionSet = tf(PhyloExpressionSetExample,log2), Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS", plot.type = "boxplot")
data(PhyloExpressionSetExample) PlotGroupDiffs(ExpressionSet = PhyloExpressionSetExample, Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS", type = "b", lwd = 6, xlab = "Ontogeny") # only receive the p-values without the corresponding plot PlotGroupDiffs(ExpressionSet = PhyloExpressionSetExample, Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS", plot.p.vals = FALSE, type = "b", lwd = 6, xlab = "Ontogeny") # quantify the significant difference of a selected set of genes # only receive the p-values without the corresponding plot set.seed(123) ExampleGeneSet <- sample(PhyloExpressionSetExample[ , 2],5000) PlotGroupDiffs(ExpressionSet = PhyloExpressionSetExample, Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS", plot.p.vals = FALSE, gene.set = ExampleGeneSet) # plot differences as boxplot for each developmental stage PlotGroupDiffs(ExpressionSet = tf(PhyloExpressionSetExample,log2), Groups = list(group_1 = 1:3,group_2 = 4:12), legendName = "PS", plot.type = "boxplot")
This function computes for each age category the corresponding mean expression profile.
PlotMeans( ExpressionSet, Groups = NULL, modules = NULL, legendName = "age", xlab = "Ontogeny", ylab = "Mean Expression Level", main = "", y.ticks = 10, adjust.range = TRUE, alpha = 0.008, ... )
PlotMeans( ExpressionSet, Groups = NULL, modules = NULL, legendName = "age", xlab = "Ontogeny", ylab = "Mean Expression Level", main = "", y.ticks = 10, adjust.range = TRUE, alpha = 0.008, ... )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
Groups |
a list containing the age categories for which mean expression levels shall be drawn.
For ex. evolutionary users can compare old phylostrata: PS1-3 (Class 1) and evolutionary young phylostrata: PS4-12 (Class 2).
In this example, the list could be assigned as, |
modules |
a list storing three elements for specifying the modules: early, mid, and late.
Each element expects a numeric vector specifying the developmental stages
or experiments that correspond to each module. For example,
|
legendName |
a character string specifying the legend title. |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
main text. |
y.ticks |
number of ticks that shall be drawn on the y-axis. |
adjust.range |
logical indicating whether or not the y-axis scale shall be adjusted to the same range in case two groups are specified. Default is |
alpha |
transparency of the shaded area (between [0,1]). Default is |
... |
place holder for old version of PlotMeans that was based on base graphics instead of ggplot2. |
This plot may be useful to compare the absolute mean expression levels of each age category across stages.
In different developmental processes different phylostratum or divergence-stratum
classes might be more expressed than others, hence contributing more to the overall
phylotranscriptomics pattern (TAI
or TDI
).
This plot can help to identify the phylostratum or divergence-stratum classes
that contributes most to the overall transcriptome of the given developmental process.
a plot showing mean expression profiles of each age category.
Hajk-Georg Drost
PlotBarRE
, RE
, REMatrix
, PlotRE
### Example using a PhyloExpressionSet ### and DivergenceExpressionSet # load PhyloExpressionSet data(PhyloExpressionSetExample) # load PhyloExpressionSet data(DivergenceExpressionSetExample) # plot evolutionary old PS (PS1-3) vs evolutionary young PS (PS4-12) PlotMeans(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = TRUE) # if users wish to not adjust the y-axis scale when # 2 groups are selected they can specify: adjust.range = FALSE PlotMeans(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = FALSE) # plot conserved DS (DS1-5) vs divergent DS (PS6-10) # NOTE: DS are always defined in the range 1, 2, ... , 10. # Hence, make sure that your groups are within this range! PlotMeans(DivergenceExpressionSetExample, Groups = list(c(1:5), c(6:10)), legendName = "DS", adjust.range = TRUE)
### Example using a PhyloExpressionSet ### and DivergenceExpressionSet # load PhyloExpressionSet data(PhyloExpressionSetExample) # load PhyloExpressionSet data(DivergenceExpressionSetExample) # plot evolutionary old PS (PS1-3) vs evolutionary young PS (PS4-12) PlotMeans(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = TRUE) # if users wish to not adjust the y-axis scale when # 2 groups are selected they can specify: adjust.range = FALSE PlotMeans(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = FALSE) # plot conserved DS (DS1-5) vs divergent DS (PS6-10) # NOTE: DS are always defined in the range 1, 2, ... , 10. # Hence, make sure that your groups are within this range! PlotMeans(DivergenceExpressionSetExample, Groups = list(c(1:5), c(6:10)), legendName = "DS", adjust.range = TRUE)
This function computes for each age category the corresponding median expression profile.
PlotMedians( ExpressionSet, Groups = NULL, legendName = "age", xlab = "Ontogeny", ylab = "Median Expression Level", main = "", y.ticks = 10, adjust.range = TRUE )
PlotMedians( ExpressionSet, Groups = NULL, legendName = "age", xlab = "Ontogeny", ylab = "Median Expression Level", main = "", y.ticks = 10, adjust.range = TRUE )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
Groups |
a list containing the age categories for which median expression levels shall be drawn.
For ex. evolutionary users can compare old phylostrata: PS1-3 (Class 1) and evolutionary young phylostrata: PS4-12 (Class 2).
In this example, the list could be assigned as, |
legendName |
a character string specifying the legend title. |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
main text. |
y.ticks |
number of ticks that shall be drawn on the y-axis. |
adjust.range |
logical indicating whether or not the y-axis scale shall be adjusted to the same range in case two groups are specified. Default is |
This plot may be useful to compare the absolute median expression levels of each age category across stages.
In different developmental processes different phylostratum or divergence-stratum
classes might be more expressed than others, hence contributing more to the overall
phylotranscriptomics pattern (TAI
or TDI
).
This plot can help to identify the phylostratum or divergence-stratum classes
that contributes most to the overall transcriptome of the given developmental process.
a plot showing median expression profiles of each age category.
Hajk-Georg Drost
PlotBarRE
, RE
, REMatrix
, PlotRE
### Example using a PhyloExpressionSet ### and DivergenceExpressionSet # load PhyloExpressionSet data(PhyloExpressionSetExample) # load PhyloExpressionSet data(DivergenceExpressionSetExample) # plot evolutionary old PS (PS1-3) vs evolutionary young PS (PS4-12) PlotMedians(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = TRUE) # if users wish to not adjust the y-axis scale when # 2 groups are selected they can specify: adjust.range = FALSE PlotMedians(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = FALSE) # plot conserved DS (DS1-5) vs divergent DS (PS6-10) # NOTE: DS are always defined in the range 1, 2, ... , 10. # Hence, make sure that your groups are within this range! PlotMedians(DivergenceExpressionSetExample, Groups = list(c(1:5), c(6:10)), legendName = "DS", adjust.range = TRUE)
### Example using a PhyloExpressionSet ### and DivergenceExpressionSet # load PhyloExpressionSet data(PhyloExpressionSetExample) # load PhyloExpressionSet data(DivergenceExpressionSetExample) # plot evolutionary old PS (PS1-3) vs evolutionary young PS (PS4-12) PlotMedians(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = TRUE) # if users wish to not adjust the y-axis scale when # 2 groups are selected they can specify: adjust.range = FALSE PlotMedians(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = FALSE) # plot conserved DS (DS1-5) vs divergent DS (PS6-10) # NOTE: DS are always defined in the range 1, 2, ... , 10. # Hence, make sure that your groups are within this range! PlotMedians(DivergenceExpressionSetExample, Groups = list(c(1:5), c(6:10)), legendName = "DS", adjust.range = TRUE)
Function to plot the TAI or TDI of a given PhyloExpressionSet or DivergenceExpressionSet object.
This function plot the TAI
or TDI
of a given PhyloExpressionSet or DivergenceExpressionSet object.
PlotPattern( ExpressionSet, TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, lillie.test = FALSE, digits.ylab = 4, p.value = TRUE, shaded.area = FALSE, y.ticks = 4, custom.perm.matrix = NULL, ... )
PlotPattern( ExpressionSet, TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, lillie.test = FALSE, digits.ylab = 4, p.value = TRUE, shaded.area = FALSE, y.ticks = 4, custom.perm.matrix = NULL, ... )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
TestStatistic |
a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern.
Possible values can be: |
modules |
a list storing three elements for the |
permutations |
a numeric value specifying the number of permutations to be performed for the |
lillie.test |
a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed. |
digits.ylab |
a numeric value specifying the number of digits shown for the TAI or TDI values on the y-axis. |
p.value |
a boolean value specifying whether the p-value of the test statistic shall be printed within the plot area. |
shaded.area |
a boolean value specifying whether a shaded area shall be drawn for the developmental stages defined to be the presumptive phylotypic period. |
y.ticks |
a numeric value specifying the number of ticks to be drawn on the y-axis. |
custom.perm.matrix |
a custom |
... |
default plot parameters. |
This function computes a permutation test quantifying the statistical significance of the prensent phylotranscriptomics pattern.
The user can choose between the FlatLineTest
, ReductiveHourglassTest
, or EarlyConservationTest
.
The FlatLineTest
tests for any significant deviation from a flat line.
Each period or stage that significantly deviates from a flat line, might be governed by stronger selective pressure (in terms of natural selection) compared to other stages or periods of development.
The ReductiveHourglassTest
specificly tests for the statistical significance of a molecular hourglass pattern (high-low-high pattern) with prior biological knowlegde.
The corresponding p-value that is printed within the plot (by default) specifies the statistical significance of the chosen test statistic.
The EarlyConservationTest
specificly tests for the statistical significance of a low-high-high pattern (monotonically increasing function)
with prior biological knowlegde.
The corresponding p-value that is printed within the plot (by default) specifies the statistical significance of the chosen test statistic.
The x-axis denotes the developmental series (time course / experiments / ontogeny) of the input ExpressionSet and the y-axis
denotes the corresponding mean transcriptome age value (TAI
or TDI
) of the given ExpressionSet.
Furthermore, the grey lines above and below the actual phylotranscriptomics pattern denotes the standard deviations of TAI
or TDI
values that have been estimated from the bootMatrix
.
A low mean transcriptome age value denotes an evolutionary older transcriptome being active during the corresponding periods,
whereas a high mean transcriptome age value denotes an evolutionary younger transcriptome being active during the corresponding periods.
For mean transcriptome divergence, a low mean transcriptome divergence value denotes a more conserved transcriptome
being active (between two species), whereas a high mean transcriptome divergence value denotes a more divergent transcriptome
being active (between two species) - in terms of protein-sequence substitution rates.
This function is useful to fastly plot the TAI
or TDI
profile of a given PhyloExpressionSet or DivergenceExpressionSet object and
the statistical significance of the corresponding pattern.
Internally the function calls several graphics functions,
such as plot
, axis
, and legend
.
For the ellipsis argument ...
all graphics specific arguments can be defined.
Internally the function specific arguments for e.g. plot
, axis
,
and legend
will be detected and are passed to the corresponding function.
Hence, when calling the function PlotPattern
, one can specify arguments
for plot
and axis
and legend
as ...
.
In case prior biological knowledge is present for a specific period of development,
the shaded.area
argument can be set to TRUE
and the function will use
the values stored in the mid
argument to draw a shaded area within the corresponding period of development.
a plot visualizing the phylotranscriptomic pattern of a given PhyloExpressionSet or DivergenceExpressionSet object.
The corresponding p-value of the test statistic is named as follows:
p_flt
= p-value of the corresponding FlatLineTest
p_rht
= p-value of the corresponding ReductiveHourglassTest
p_ect
= p-value of the corresponding EarlyConservationTest
Hajk-Georg Drost
Domazet-Loso T and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012.
TAI
, TDI
, FlatLineTest
,
ReductiveHourglassTest
, EarlyConservationTest
,
PlotSignature
# load PhyloExpressionSet data(PhyloExpressionSetExample) # only visualize the TAI profile without any test statistics... # this is equavalent to performing: plot(TAI(PhyloExpressionSetExample), type = "l", lwd = 6) PlotPattern(ExpressionSet = PhyloExpressionSetExample, TestStatistic = NULL, type = "l", xlab = "Ontogeny", ylab = "TAI", lwd = 9) # the simplest example of plotting the TAI profile of a given PhyloExpressionSet: # In this case (default) the FlatLineTest will be performed to quantify # the statistical significance of the present TAI pattern and will be drawn as 'p = ... ' # in the plot PlotPattern(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "FlatLineTest", permutations = 100, type = "l", xlab = "Ontogeny", ylab = "TAI", lwd = 9) # an example performing the ReductiveHourglassTest and printing the p-value # and shaded area of the presumptive phylotypic period into the plot # Here the 'p = ...' denotes the p-value that is returned by the ReductiveHourglassTest PlotPattern( ExpressionSet = PhyloExpressionSetExample, TestStatistic = "ReductiveHourglassTest", modules = list(early = 1:2,mid = 3:5,late = 6:7), permutations = 100, p.value = TRUE, shaded.area = TRUE, xlab = "Ontogeny", ylab = "TAI", type = "l", lwd = 9) # testing for early conservation model PlotPattern( ExpressionSet = PhyloExpressionSetExample, TestStatistic = "EarlyConservationTest", modules = list(early = 1:2,mid = 3:5,late = 6:7), permutations = 100, p.value = TRUE, shaded.area = TRUE, xlab = "Ontogeny", ylab = "TAI", type = "l", lwd = 9) # use your own permutation matrix custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) PlotPattern(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "FlatLineTest", custom.perm.matrix = custom_perm_matrix, type = "l", xlab = "Ontogeny", ylab = "TAI", lwd = 9)
# load PhyloExpressionSet data(PhyloExpressionSetExample) # only visualize the TAI profile without any test statistics... # this is equavalent to performing: plot(TAI(PhyloExpressionSetExample), type = "l", lwd = 6) PlotPattern(ExpressionSet = PhyloExpressionSetExample, TestStatistic = NULL, type = "l", xlab = "Ontogeny", ylab = "TAI", lwd = 9) # the simplest example of plotting the TAI profile of a given PhyloExpressionSet: # In this case (default) the FlatLineTest will be performed to quantify # the statistical significance of the present TAI pattern and will be drawn as 'p = ... ' # in the plot PlotPattern(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "FlatLineTest", permutations = 100, type = "l", xlab = "Ontogeny", ylab = "TAI", lwd = 9) # an example performing the ReductiveHourglassTest and printing the p-value # and shaded area of the presumptive phylotypic period into the plot # Here the 'p = ...' denotes the p-value that is returned by the ReductiveHourglassTest PlotPattern( ExpressionSet = PhyloExpressionSetExample, TestStatistic = "ReductiveHourglassTest", modules = list(early = 1:2,mid = 3:5,late = 6:7), permutations = 100, p.value = TRUE, shaded.area = TRUE, xlab = "Ontogeny", ylab = "TAI", type = "l", lwd = 9) # testing for early conservation model PlotPattern( ExpressionSet = PhyloExpressionSetExample, TestStatistic = "EarlyConservationTest", modules = list(early = 1:2,mid = 3:5,late = 6:7), permutations = 100, p.value = TRUE, shaded.area = TRUE, xlab = "Ontogeny", ylab = "TAI", type = "l", lwd = 9) # use your own permutation matrix custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) PlotPattern(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "FlatLineTest", custom.perm.matrix = custom_perm_matrix, type = "l", xlab = "Ontogeny", ylab = "TAI", lwd = 9)
This function computes for each age category the corresponding relative expression profile.
For each age category the corresponding relative expression profile is being computed as follows:
where and
denote the minimum/maximum
mean
expression level
of phylostratum j over developmental stages s. This linear transformation corresponds to
a shift by and a subsequent shrinkage by
.
As a result, the relative expression level
of developmental stage s
with minimum
is 0, the relative expression level
of the developmental
stage s with maximum
is 1, and the relative expression levels
of
all other stages s range between 0 and 1, accordingly.
PlotRE( ExpressionSet, Groups = NULL, modules = NULL, legendName = "age", xlab = "Ontogeny", ylab = "Relative Expression Level", main = "", y.ticks = 10, adjust.range = TRUE, alpha = 0.008, ... )
PlotRE( ExpressionSet, Groups = NULL, modules = NULL, legendName = "age", xlab = "Ontogeny", ylab = "Relative Expression Level", main = "", y.ticks = 10, adjust.range = TRUE, alpha = 0.008, ... )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
Groups |
a list containing the age categories for which mean expression levels shall be drawn.
For ex. evolutionary users can compare old phylostrata: PS1-3 (Class 1) and evolutionary young phylostrata: PS4-12 (Class 2).
In this example, the list could be assigned as, |
modules |
a list storing three elements for specifying the modules: early, mid, and late.
Each element expects a numeric vector specifying the developmental stages
or experiments that correspond to each module. For example,
|
legendName |
a character string specifying the legend title. |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
main text. |
y.ticks |
number of ticks that shall be drawn on the y-axis. |
adjust.range |
logical indicating whether or not the y-axis scale shall be adjusted to the same range in case two groups are specified. Default is |
alpha |
transparency of the shaded area (between [0,1]). Default is |
... |
place holder for old version of PlotRE that was based on base graphics instead of ggplot2. |
Studying the relative expression profiles of each phylostratum or divergence-stratum enables the detection of common gene expression patterns shared by several phylostrata or divergence-strata.
Finding similar relative expression profiles among phylostrata or divergence-strata suggests that phylostrata or divergence-strata sharing a similar relative expression profile are regulated by similar gene regulatory elements. Hence, these common phylostrata or divergence-strata might govern similar processes in the given developmental time course.
a plot showing the relative expression profiles of phylostrata or divergence-strata belonging to the same group.
Hajk-Georg Drost
Domazet-Loso T and Tautz D. 2010. "A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns". Nature (468): 815-818.
Quint M et al. 2012. "A transcriptomic hourglass in plant embryogenesis". Nature (490): 98-101.
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet PlotRE(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS") # or you can choose any combination of groups PlotRE(PhyloExpressionSetExample, Groups = list(c(1,7,9), c(2:6,8,10:12)), legendName = "PS") # example DivergenceExpressionSet PlotRE(DivergenceExpressionSetExample, Groups = list(c(1:5), c(6:10)), legendName = "DS")
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet PlotRE(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS") # or you can choose any combination of groups PlotRE(PhyloExpressionSetExample, Groups = list(c(1,7,9), c(2:6,8,10:12)), legendName = "PS") # example DivergenceExpressionSet PlotRE(DivergenceExpressionSetExample, Groups = list(c(1:5), c(6:10)), legendName = "DS")
This function performs several quality checks to validate the biological variation between replicates and stages (experiments).
PlotReplicateQuality( ExpressionSet, nrep, FUN = function(x) log(stats::var(x)), legend.pos = "topleft", stage.names = NULL, ... )
PlotReplicateQuality( ExpressionSet, nrep, FUN = function(x) log(stats::var(x)), legend.pos = "topleft", stage.names = NULL, ... )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
nrep |
either a numeric value specifying the constant number of replicates per stage or a numeric vector specifying the variable number of replicates for each stage position. |
FUN |
a function that should be applied to quantify the variablity or quality of replicates. The default function is the log(var(x)) quantifying the variance between replicates. |
legend.pos |
the position of the legend, e.g. 'topleft', or 'topright' (see |
stage.names |
a character vector specifying the stage names. |
... |
additional graphics parameters. |
The following quality checks can be performed:
Quantification of variability between replicates as density function.
Hajk-Georg Drost
data(PhyloExpressionSetExample) # visualize log(var(x)) between replicates for each gene and developmental stage PlotReplicateQuality(ExpressionSet = PhyloExpressionSetExample[1:5000 , 1:8], nrep = 2, legend.pos = "topright", ylim = c(0,0.2), lwd = 6)
data(PhyloExpressionSetExample) # visualize log(var(x)) between replicates for each gene and developmental stage PlotReplicateQuality(ExpressionSet = PhyloExpressionSetExample[1:5000 , 1:8], nrep = 2, legend.pos = "topright", ylim = c(0,0.2), lwd = 6)
This function visualizes the PS or DS distribution of a selected set of genes as histogram.
PlotSelectedAgeDistr( ExpressionSet, gene.set, legendName = NULL, as.ratio = FALSE, use.only.map = FALSE, col = "turquoise4", xlab = NULL, ylab = NULL )
PlotSelectedAgeDistr( ExpressionSet, gene.set, legendName = NULL, as.ratio = FALSE, use.only.map = FALSE, col = "turquoise4", xlab = NULL, ylab = NULL )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
gene.set |
a character vector storing the gene ids for which gene expression profiles shall be visualized. |
legendName |
a character string specifying whether "PS" or "DS" are are visualized. |
as.ratio |
logical value indicating whether or not relative frequencies shall be visualized. |
use.only.map |
logical value indicating whether or not a Phylostratigraphic Map or Divergence Map should be passed to the |
col |
colour of the bars. |
xlab |
label of the x-axis. |
ylab |
label of the y-axis. |
Hajk-Georg Drost
data(PhyloExpressionSetExample) # generate an example gene set set.seed(123) ExGeneSet <- sample(PhyloExpressionSetExample[ , 2], 5000) # gene count example PlotSelectedAgeDistr(ExpressionSet = PhyloExpressionSetExample, gene.set = ExGeneSet, legendName = "PS", as.ratio = TRUE) # relative gene count example PlotSelectedAgeDistr(ExpressionSet = PhyloExpressionSetExample, gene.set = ExGeneSet, legendName = "PS", as.ratio = FALSE)
data(PhyloExpressionSetExample) # generate an example gene set set.seed(123) ExGeneSet <- sample(PhyloExpressionSetExample[ , 2], 5000) # gene count example PlotSelectedAgeDistr(ExpressionSet = PhyloExpressionSetExample, gene.set = ExGeneSet, legendName = "PS", as.ratio = TRUE) # relative gene count example PlotSelectedAgeDistr(ExpressionSet = PhyloExpressionSetExample, gene.set = ExGeneSet, legendName = "PS", as.ratio = FALSE)
Main function to visualize transcriptome indices.
PlotSignature( ExpressionSet, measure = "TAI", TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, lillie.test = FALSE, p.value = TRUE, shaded.area = FALSE, custom.perm.matrix = NULL, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", lwd = 4, alpha = 0.1, y.ticks = 10 )
PlotSignature( ExpressionSet, measure = "TAI", TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, lillie.test = FALSE, p.value = TRUE, shaded.area = FALSE, custom.perm.matrix = NULL, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", lwd = 4, alpha = 0.1, y.ticks = 10 )
ExpressionSet |
a standard PhyloExpressionSet, DivergenceExpressionSet or PolymorphismsExpressionSet object. |
measure |
type of transcriptome index that shall be computed. E.g.
|
TestStatistic |
a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern. Possible values can be:
|
modules |
a list storing three elements for the
|
permutations |
a numeric value specifying the number of permutations to be performed for the |
lillie.test |
a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed. |
p.value |
a boolean value specifying whether the p-value of the test statistic shall be printed as a subtitle. |
shaded.area |
a boolean value specifying whether a shaded area shall be drawn for the developmental stages defined to be the presumptive phylotypic period. |
custom.perm.matrix |
a custom |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
figure title. |
lwd |
line width. |
alpha |
transparency of the shaded area (between [0,1]). Default is |
y.ticks |
number of ticks on the y-axis. Default is |
This function substitutes the functionality of the PlotPattern
function
and is based on ggplot2 insead of base R graphics.
The following transcriptome indices can be computed and visualized with this function:
Transcriptome Age Index (TAI
)
Transcriptome Divergence Index (TDI
)
Transcriptome Polymorphism Index (TPI
)
Hajk-Georg Drost
data(PhyloExpressionSetExample) # plot TAI pattern and perform flat line test PlotSignature(PhyloExpressionSetExample, measure = "TAI", permutations = 100, TestStatistic = "FlatLineTest", ylab = "Transcriptome Age Index")
data(PhyloExpressionSetExample) # plot TAI pattern and perform flat line test PlotSignature(PhyloExpressionSetExample, measure = "TAI", permutations = 100, TestStatistic = "FlatLineTest", ylab = "Transcriptome Age Index")
Main function to visualize transcriptome indices after GATAI gene removal .
PlotSignatureGATAIGeneRemoval( ExpressionSet, measure = "TAI", TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, lillie.test = FALSE, p.value = TRUE, shaded.area = FALSE, custom.perm.matrix = NULL, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", lwd = 4, alpha = 0.1, y.ticks = 10, ... )
PlotSignatureGATAIGeneRemoval( ExpressionSet, measure = "TAI", TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, lillie.test = FALSE, p.value = TRUE, shaded.area = FALSE, custom.perm.matrix = NULL, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", lwd = 4, alpha = 0.1, y.ticks = 10, ... )
ExpressionSet |
a standard PhyloExpressionSet, DivergenceExpressionSet or PolymorphismsExpressionSet object. |
measure |
type of transcriptome index that shall be computed. E.g.
|
TestStatistic |
a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern. Possible values can be:
|
modules |
a list storing three elements for the
|
permutations |
a numeric value specifying the number of permutations to be performed for the |
lillie.test |
a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed. |
p.value |
a boolean value specifying whether the p-value of the test statistic shall be printed as a subtitle. |
shaded.area |
a boolean value specifying whether a shaded area shall be drawn for the developmental stages defined to be the presumptive phylotypic period. |
custom.perm.matrix |
a custom |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
figure title. |
lwd |
line width. |
alpha |
transparency of the shaded area (between [0,1]). Default is |
y.ticks |
number of ticks on the y-axis. Default is |
... |
parameters passed on to |
Filipa Martins Costa and Hajk-Georg Drost
PlotSignatureGeneQuantiles is used to investigate the robustness of a transcriptomic index signal to removing the most highly expressed genes from the given expression set. The resulting perturbed signal is plotted for different quantile probability thresholds defining the set of top level genes.
PlotSignatureGeneQuantiles( ExpressionSet, quantiles = c(1, 0.99, 0.95, 0.9, 0.8), gene.selection.criterion = "mean", measure = "TAI", TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, p.value = TRUE, shaded.area = FALSE, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", lwd = 4, alpha = 0.1, y.ticks = 3 )
PlotSignatureGeneQuantiles( ExpressionSet, quantiles = c(1, 0.99, 0.95, 0.9, 0.8), gene.selection.criterion = "mean", measure = "TAI", TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, p.value = TRUE, shaded.area = FALSE, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", lwd = 4, alpha = 0.1, y.ticks = 3 )
ExpressionSet |
a standard PhyloExpressionSet, DivergenceExpressionSet or PolymorphismsExpressionSet object. |
quantiles |
a numeric vector of quantile probabilities (between [0,1]) |
gene.selection.criterion |
a string defining the criterion by which genes should be selected Possible values can be:
|
measure |
type of transcriptome index that shall be computed. E.g.
|
TestStatistic |
a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern. Possible values can be:
|
modules |
a list storing three elements for the
|
permutations |
a numeric value specifying the number of permutations to be performed for the |
p.value |
a boolean value specifying whether the p-value of the test statistic shall be printed within the legend, for each expression set. |
shaded.area |
a boolean value specifying whether a shaded area shall be drawn for the developmental stages defined to be the presumptive phylotypic period. |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
figure title. |
lwd |
line width. |
alpha |
transparency of the shaded area and error ribbon (between [0,1]). Default is |
y.ticks |
number of ticks on the y-axis. Default is |
a ggplot object visualising the transcriptome index of the expression set, together with its standard deviation per stage, obtained by permuting the gene ages. For each quantile probability threshold, the resulting perturbed signal is shown as a separate profile. The profiles are shown on the same axes, so that they can be readily compared. Optionally, the p-value of each profile, with respect to the choice of statistic, is shown.
Stefan Manolache
data(PhyloExpressionSetExample) # Flat line test, select top expressed genes PlotSignatureGeneQuantiles(ExpressionSet = PhyloExpressionSetExample, quantiles=c(1.0, 0.99, 0.95, 0.90, 0.80), main="Excluding top level genes by total expression using different thresholds", gene.selection.criterion="mean") # Flat line test, select top genes by variance of expression PlotSignatureGeneQuantiles(ExpressionSet = PhyloExpressionSetExample, main="Excluding top level genes by variance of expression using different thresholds", quantiles=c(1.0, 0.99, 0.95, 0.90, 0.80), gene.selection.criterion="variance")
data(PhyloExpressionSetExample) # Flat line test, select top expressed genes PlotSignatureGeneQuantiles(ExpressionSet = PhyloExpressionSetExample, quantiles=c(1.0, 0.99, 0.95, 0.90, 0.80), main="Excluding top level genes by total expression using different thresholds", gene.selection.criterion="mean") # Flat line test, select top genes by variance of expression PlotSignatureGeneQuantiles(ExpressionSet = PhyloExpressionSetExample, main="Excluding top level genes by variance of expression using different thresholds", quantiles=c(1.0, 0.99, 0.95, 0.90, 0.80), gene.selection.criterion="variance")
PlotSignatureMultiple is used to compare multiple transcriptomic index profiles (e.g. TAI) over a common process of interest (e.g. a developmental process). The main use case is visualising how removing different subsets of genes from the expression set can perturb the transcriptome index signal.
PlotSignatureMultiple( ExpressionSets, set.labels, measure = "TAI", TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, p.value = TRUE, shaded.area = FALSE, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", legend.title = "Expression Sets", lwd = 4, alpha = 0.1, y.ticks = 3 )
PlotSignatureMultiple( ExpressionSets, set.labels, measure = "TAI", TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, p.value = TRUE, shaded.area = FALSE, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", legend.title = "Expression Sets", lwd = 4, alpha = 0.1, y.ticks = 3 )
ExpressionSets |
a list of PhyloExpressionSet, DivergenceExpressionSet or PolymorphismsExpressionSet objects. |
set.labels |
a character vector of labels, one for each given expression set |
measure |
type of transcriptome index that shall be computed. E.g.
|
TestStatistic |
a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern. Possible values can be:
|
modules |
a list storing three elements for the
|
permutations |
a numeric value specifying the number of permutations to be performed for the |
p.value |
a boolean value specifying whether the p-value of the test statistic shall be printed within the legend, for each expression set. |
shaded.area |
a boolean value specifying whether a shaded area shall be drawn for the developmental stages defined to be the presumptive phylotypic period. |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
figure title. |
legend.title |
legend title. |
lwd |
line width. |
alpha |
transparency of the shaded area and error ribbon (between [0,1]). Default is |
y.ticks |
number of ticks on the y-axis. Default is |
a ggplot object visualising the transcriptome index of each given expression set, together with its standard deviation per stage, obtained by permuting the gene ages. The profiles are shown on the same axes, so that they can be readily compared. Optionally, the p-value of each profile, with respect to the choice of statistic, is shown.
Stefan Manolache
data(PhyloExpressionSetExample) # remove top 1% expressed genes genes.top_expression <- TopExpressionGenes(PhyloExpressionSetExample, p=.99) PhyloExpressionSetExample.top_removed <- subset(PhyloExpressionSetExample, !(GeneID %in% genes.top_expression)) expression_sets = list(PhyloExpressionSetExample, PhyloExpressionSetExample.top_removed) set_labels = c("100%", "99%") # Flat line test PlotSignatureMultiple(ExpressionSets = expression_sets, set.labels = set_labels, TestStatistic="FlatLineTest", main = "A. thaliana embryogenesis", legend.title = "Top Expressed Genes Quantile") # Reductive hourglass test PlotSignatureMultiple(ExpressionSets = expression_sets, set.labels = set_labels, TestStatistic="ReductiveHourglassTest", main = "A. thaliana embryogenesis", legend.title = "Top Expressed Genes Quantile", modules=list(early=1:2, mid=3:5, late=6:7), shaded.area=TRUE)
data(PhyloExpressionSetExample) # remove top 1% expressed genes genes.top_expression <- TopExpressionGenes(PhyloExpressionSetExample, p=.99) PhyloExpressionSetExample.top_removed <- subset(PhyloExpressionSetExample, !(GeneID %in% genes.top_expression)) expression_sets = list(PhyloExpressionSetExample, PhyloExpressionSetExample.top_removed) set_labels = c("100%", "99%") # Flat line test PlotSignatureMultiple(ExpressionSets = expression_sets, set.labels = set_labels, TestStatistic="FlatLineTest", main = "A. thaliana embryogenesis", legend.title = "Top Expressed Genes Quantile") # Reductive hourglass test PlotSignatureMultiple(ExpressionSets = expression_sets, set.labels = set_labels, TestStatistic="ReductiveHourglassTest", main = "A. thaliana embryogenesis", legend.title = "Top Expressed Genes Quantile", modules=list(early=1:2, mid=3:5, late=6:7), shaded.area=TRUE)
PlotSignatureTransformed aims to statistically evaluate the
stability of ReductiveHourglassTest
, FlatLineTest
,
ReverseHourglassTest
, EarlyConservationTest
, or
LateConservationTest
(all based on TAI
or TDI
computations) against different
data transformations AND plot the resulting TAI profiles using PlotSignature
.
The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern)
does not support the chosen test. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does
indeed support the chosen test.
PlotSignatureTransformed( ExpressionSet, measure = "TAI", TestStatistic = "FlatLineTest", transforms = c("none", "sqrt", "log2", "rank", "squared"), modules = NULL, permutations = 1000, pseudocount = 1, p.value = TRUE, shaded.area = FALSE, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", lwd = 4, alpha = 0.1, y.ticks = 3 )
PlotSignatureTransformed( ExpressionSet, measure = "TAI", TestStatistic = "FlatLineTest", transforms = c("none", "sqrt", "log2", "rank", "squared"), modules = NULL, permutations = 1000, pseudocount = 1, p.value = TRUE, shaded.area = FALSE, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", lwd = 4, alpha = 0.1, y.ticks = 3 )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
measure |
type of transcriptome index that shall be computed. E.g.
|
TestStatistic |
a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern. Possible values can be:
|
transforms |
a character vector of any valid function that transforms gene expression levels. Available options are:
|
modules |
a list storing three elements for the
|
permutations |
a numeric value specifying the number of permutations to be performed for the |
pseudocount |
any valid number to add to the expression matrix prior to log transformations. |
p.value |
a boolean value specifying whether the p-value of the test statistic shall be printed within the plot area. |
shaded.area |
a boolean value specifying whether a shaded area shall be drawn for the developmental stages defined to be the presumptive phylotypic period. |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
figure title. |
lwd |
line width. |
alpha |
transparency of the shaded area (between [0,1]). Default is |
y.ticks |
number of ticks on the y-axis. Default is |
Visualisation and assessment for the stability of data transforms on the permutation test of choice.
For details, please consult the main function PlotSignature
,
as well as tf
, ReductiveHourglassTest
,
FlatLineTest
, ReverseHourglassTest
,
LateConservationTest
or EarlyConservationTest
.
In most cases, users can replace PlotSignature
simply with
PlotSignatureTransformed
to obtain the multi-panel plot with different
transformations to visualise the stability of the pattern observed with PlotSignature
.
a ggplot object containing the following information
for each visualised transcriptome indices:
p.value
: the p-value quantifying the statistical significance (depending on the chosen test) of the given phylotranscriptomics pattern.
Jaruwatana Sodai Lotharukpong
Lotharukpong JS et al. (2023) (unpublished)
PlotSignature
,tfStability
,
FlatLineTest
, ReverseHourglassTest
,
EarlyConservationTest
, ReductiveHourglassTest
,
LateConservationTest
## Not run: data(PhyloExpressionSetExample) # Flat line test PlotSignatureTransformed(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "FlatLineTest", transforms = c("none", "log2", "sqrt", "rank", "squared")) # Reductive hourglass test PlotSignatureTransformed(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "ReductiveHourglassTest", transforms = c("none", "log2", "sqrt", "rank", "squared"), modules = list(early = 1:2, mid = 3:5, late = 6:7)) library(DESeq2) PlotSignatureTransformed(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "ReductiveHourglassTest", transforms = c("none", "log2", "sqrt", "vst", "rank", "squared"), modules = list(early = 1:2, mid = 3:5, late = 6:7)) ## End(Not run)
## Not run: data(PhyloExpressionSetExample) # Flat line test PlotSignatureTransformed(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "FlatLineTest", transforms = c("none", "log2", "sqrt", "rank", "squared")) # Reductive hourglass test PlotSignatureTransformed(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "ReductiveHourglassTest", transforms = c("none", "log2", "sqrt", "rank", "squared"), modules = list(early = 1:2, mid = 3:5, late = 6:7)) library(DESeq2) PlotSignatureTransformed(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "ReductiveHourglassTest", transforms = c("none", "log2", "sqrt", "vst", "rank", "squared"), modules = list(early = 1:2, mid = 3:5, late = 6:7)) ## End(Not run)
This function computes for each age category the corresponding variance expression profile.
PlotVars( ExpressionSet, Groups = NULL, legendName = "age", xlab = "Ontogeny", ylab = "Variance(Expression Level)", main = "", y.ticks = 10, adjust.range = TRUE )
PlotVars( ExpressionSet, Groups = NULL, legendName = "age", xlab = "Ontogeny", ylab = "Variance(Expression Level)", main = "", y.ticks = 10, adjust.range = TRUE )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
Groups |
a list containing the age categories for which variance expression levels shall be drawn.
For ex. evolutionary users can compare old phylostrata: PS1-3 (Class 1) and evolutionary young phylostrata: PS4-12 (Class 2).
In this example, the list could be assigned as, |
legendName |
a character string specifying the legend title. |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
main text. |
y.ticks |
number of ticks that shall be drawn on the y-axis. |
adjust.range |
logical indicating whether or not the y-axis scale shall be adjusted to the same range in case two groups are specified. Default is |
This plot may be useful to compare the absolute variance expression levels of each age category across stages.
In different developmental processes different phylostratum or divergence-stratum
classes might be more expressed than others, hence contributing more to the overall
phylotranscriptomics pattern (TAI
or TDI
).
This plot can help to identify the phylostratum or divergence-stratum classes
that contributes most to the overall transcriptome of the given developmental process.
a plot showing variance expression profiles of each age category.
Hajk-Georg Drost
PlotBarRE
, RE
, REMatrix
, PlotRE
### Example using a PhyloExpressionSet ### and DivergenceExpressionSet # load PhyloExpressionSet data(PhyloExpressionSetExample) # load PhyloExpressionSet data(DivergenceExpressionSetExample) # plot evolutionary old PS (PS1-3) vs evolutionary young PS (PS4-12) PlotVars(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = TRUE) # if users wish to not adjust the y-axis scale when # 2 groups are selected they can specify: adjust.range = FALSE PlotVars(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = FALSE) # plot conserved DS (DS1-5) vs divergent DS (PS6-10) # NOTE: DS are always defined in the range 1, 2, ... , 10. # Hence, make sure that your groups are within this range! PlotVars(DivergenceExpressionSetExample, Groups = list(c(1:5), c(6:10)), legendName = "DS", adjust.range = TRUE)
### Example using a PhyloExpressionSet ### and DivergenceExpressionSet # load PhyloExpressionSet data(PhyloExpressionSetExample) # load PhyloExpressionSet data(DivergenceExpressionSetExample) # plot evolutionary old PS (PS1-3) vs evolutionary young PS (PS4-12) PlotVars(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = TRUE) # if users wish to not adjust the y-axis scale when # 2 groups are selected they can specify: adjust.range = FALSE PlotVars(PhyloExpressionSetExample, Groups = list(c(1:3), c(4:12)), legendName = "PS", adjust.range = FALSE) # plot conserved DS (DS1-5) vs divergent DS (PS6-10) # NOTE: DS are always defined in the range 1, 2, ... , 10. # Hence, make sure that your groups are within this range! PlotVars(DivergenceExpressionSetExample, Groups = list(c(1:5), c(6:10)), legendName = "DS", adjust.range = TRUE)
This function computes the partial TAI
or TDI
values for each single gene in a PhyloExpressionSet or DivergenceExpressionSet object.
In detail, each gene gets a TAI contribution profile or TDI contribution profile.
or
where or
is the partial TAI or TDI value of gene i,
and
is the phylostratum or divergence-stratum of gene i.
pMatrix(ExpressionSet)
pMatrix(ExpressionSet)
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
The partial TAI or TDI matrix can be used to perform different cluster analyses
and also gives an overall impression of the contribution of each gene to the global TAI
or TDI
pattern.
a numeric matrix storing the partial TAI or TDI values for each gene in the corresponding PhyloExpressionSet or DivergenceExpressionSet.
Hajk-Georg Drost
Domazet-Loso T and Tautz D. 2010. "A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns". Nature (468): 815-818.
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet PTM_ps <- pMatrix(PhyloExpressionSetExample) # example DivergenceExpressionSet PTM_ds <- pMatrix(DivergenceExpressionSetExample) # boxplot of the pMatrix boxplot(pMatrix(PhyloExpressionSetExample),outline = FALSE) # boxplot of the pMatrix using log2 transformed expression levels boxplot(pMatrix(tf(PhyloExpressionSetExample,log2)))
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet PTM_ps <- pMatrix(PhyloExpressionSetExample) # example DivergenceExpressionSet PTM_ds <- pMatrix(DivergenceExpressionSetExample) # boxplot of the pMatrix boxplot(pMatrix(PhyloExpressionSetExample),outline = FALSE) # boxplot of the pMatrix using log2 transformed expression levels boxplot(pMatrix(tf(PhyloExpressionSetExample,log2)))
This function computes the partial transcriptome evolutionary index (TEI) values for each single gene.
In detail, each gene gets a TEI contribution profile.
where is the partial TEI value of gene i,
and
is the phylostratum of gene i.
pMatrixTEI( ExpressionSet, Phylostratum = NULL, split = 1e+05, showprogress = TRUE, threads = 1 )
pMatrixTEI( ExpressionSet, Phylostratum = NULL, split = 1e+05, showprogress = TRUE, threads = 1 )
ExpressionSet |
expression object with rownames as GeneID (dgCMatrix) or standard PhyloExpressionSet object. |
Phylostratum |
a named vector representing phylostratum per GeneID with names as GeneID (not used if Expression is PhyloExpressionSet). |
split |
specify number of columns to split |
showprogress |
boolean if progressbar should be shown |
threads |
specify number of threads |
The partial TEI matrix can be used to perform different cluster
analyses and also gives an overall impression of the contribution of each
gene to the global TEI
pattern.
a numeric sparse matrix storing the partial TEI values for each gene.
Kristian K Ullrich
Domazet-Loso T. and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample, package = "myTAI") # computing partial TEI contribution per gene pMT <- pMatrixTEI(PhyloExpressionSetExample)
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample, package = "myTAI") # computing partial TEI contribution per gene pMT <- pMatrixTEI(PhyloExpressionSetExample)
This function computes the partial TAI
or TDI
values for all Phylostrata or Divergence Strata.
pStrata(ExpressionSet)
pStrata(ExpressionSet)
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
Hajk-Georg Drost
data(PhyloExpressionSetExample) # compute partial TAI values for each Phylostratum partialStrata <- pStrata(PhyloExpressionSetExample) # show that colSums of pStrata is equavalent to the TAI values all.equal(colSums(partialStrata),TAI(PhyloExpressionSetExample)) # show that colSums of pStrata is equavalent to colSums of pMatrix(PhyloExpressionSetExample) all.equal(colSums(partialStrata),colSums(pMatrix(PhyloExpressionSetExample)))
data(PhyloExpressionSetExample) # compute partial TAI values for each Phylostratum partialStrata <- pStrata(PhyloExpressionSetExample) # show that colSums of pStrata is equavalent to the TAI values all.equal(colSums(partialStrata),TAI(PhyloExpressionSetExample)) # show that colSums of pStrata is equavalent to colSums of pMatrix(PhyloExpressionSetExample) all.equal(colSums(partialStrata),colSums(pMatrix(PhyloExpressionSetExample)))
This function computes the partial transcriptome evolutionary index (TEI) values combined for each strata.
In detail, each gene gets a TEI contribution profile.
where is the partial TEI value of gene i,
and
is the phylostratum of gene i.
values are combined per
.
pStrataTEI( ExpressionSet, Phylostratum = NULL, split = 1e+05, showprogress = TRUE, threads = 1 )
pStrataTEI( ExpressionSet, Phylostratum = NULL, split = 1e+05, showprogress = TRUE, threads = 1 )
ExpressionSet |
expression object with rownames as GeneID (dgCMatrix) or standard PhyloExpressionSet object. |
Phylostratum |
a named vector representing phylostratum per GeneID with names as GeneID (not used if Expression is PhyloExpressionSet). |
split |
specify number of columns to split |
showprogress |
boolean if progressbar should be shown |
threads |
specify number of threads |
The partial TEI values combined per strata give an overall
impression of the contribution of each
strata to the global TEI
pattern.
a numeric matrix storing the summed partial TEI values for each strata.
Kristian K Ullrich
Domazet-Loso T. and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample, package = "myTAI") # computing partial TEI contribution per gene pS <- pStrataTEI(PhyloExpressionSetExample)
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample, package = "myTAI") # computing partial TEI contribution per gene pS <- pStrataTEI(PhyloExpressionSetExample)
This function takes a standard ExpressionSet object and computes the partial contribution of the different phylostrata (ps) to the global Transcriptome Age Index profile.
pTAI(ExpressionSet)
pTAI(ExpressionSet)
ExpressionSet |
a standard PhyloExpressionSet object. |
This way of computing the partial contribution of the different phylostrata (ps) to the global Transcriptome Age Index profile was introduced by Domazet-Loso and Tautz, 2010. This function (pTAI
) computes the partial TAI contribution for each phylostratum and each developmental stage and returns a data matrix storing the partial TAI contribution value for each phylostratum and each developmental stage.
Hajk-Georg Drost
Domazet-Loso T. and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
pTDI
, TAI
, TDI
, PlotPattern
data(PhyloExpressionSetExample) # get the partial contribution of phylostrata to the global # TAI pattern pTAI(PhyloExpressionSetExample)
data(PhyloExpressionSetExample) # get the partial contribution of phylostrata to the global # TAI pattern pTAI(PhyloExpressionSetExample)
This function takes a standard ExpressionSet object and computes the partial contribution of the different divergence strata (ds) to the global Transcriptome Divergence Index profile.
pTDI(ExpressionSet)
pTDI(ExpressionSet)
ExpressionSet |
a standard DivergenceExpressionSet object. |
This function (pTDI
) computes the partial TDI contribution for each phylostratum and each developmental stage and returns a data matrix storing the partial TDI contribution value for each divergence and each developmental stage.
Hajk-Georg Drost
Domazet-Loso T. and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Drost HG et al. (2015). Evidence for Active Maintenance of Phylotranscriptomic Hourglass Patterns in Animal and Plant Embryogenesis. Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012.
pTAI
, TAI
, TDI
, PlotPattern
data(DivergenceExpressionSetExample) # get the partial contribution of divergence strata to the global # TDI pattern pTAI(DivergenceExpressionSetExample)
data(DivergenceExpressionSetExample) # get the partial contribution of divergence strata to the global # TDI pattern pTAI(DivergenceExpressionSetExample)
computes the phylogenetically based transcriptome evolutionary index (TEI) shuffling the strata for permutation statistic
rcpp_boottei_parallel(expression, ps, permutations, ncores = 1L)
rcpp_boottei_parallel(expression, ps, permutations, ncores = 1L)
expression |
ExpressionSet as sparseMatrix |
ps |
named Phylostratum |
permutations |
number of permutations |
ncores |
number of cores |
sparseMatrix
Kristian K Ullrich
## load example PhyloExpressionSetExample data("PhyloExpressionSetExample", package="myTAI") ## convert into sparseMatrix - rownames GeneID spmat <- as(data.matrix(PhyloExpressionSetExample[,-c(1,2)]), "sparseMatrix") rownames(spmat) <- PhyloExpressionSetExample$GeneID ## create named Phylostratum vector ps <- setNames(PhyloExpressionSetExample$Phylostratum, PhyloExpressionSetExample$GeneID) ## get permutations rcpp_boottei_parallel(spmat, ps, 100, 1)
## load example PhyloExpressionSetExample data("PhyloExpressionSetExample", package="myTAI") ## convert into sparseMatrix - rownames GeneID spmat <- as(data.matrix(PhyloExpressionSetExample[,-c(1,2)]), "sparseMatrix") rownames(spmat) <- PhyloExpressionSetExample$GeneID ## create named Phylostratum vector ps <- setNames(PhyloExpressionSetExample$Phylostratum, PhyloExpressionSetExample$GeneID) ## get permutations rcpp_boottei_parallel(spmat, ps, 100, 1)
computes the partial transcriptome evolutionary index (TEI) values for each single gene
rcpp_pMatrix_parallel(expression, ps, ncores = 1L)
rcpp_pMatrix_parallel(expression, ps, ncores = 1L)
expression |
ExpressionSet as sparseMatrix |
ps |
named Phylostratum |
ncores |
number of cores |
sparseMatrix
Kristian K Ullrich
## load example PhyloExpressionSetExample data("PhyloExpressionSetExample", package="myTAI") ## convert into sparseMatrix - rownames GeneID spmat <- as(data.matrix(PhyloExpressionSetExample[,-c(1,2)]), "sparseMatrix") rownames(spmat) <- PhyloExpressionSetExample$GeneID ## create named Phylostratum vector ps <- setNames(PhyloExpressionSetExample$Phylostratum, PhyloExpressionSetExample$GeneID) ## get pMatrix rcpp_pMatrix_parallel(spmat, ps)
## load example PhyloExpressionSetExample data("PhyloExpressionSetExample", package="myTAI") ## convert into sparseMatrix - rownames GeneID spmat <- as(data.matrix(PhyloExpressionSetExample[,-c(1,2)]), "sparseMatrix") rownames(spmat) <- PhyloExpressionSetExample$GeneID ## create named Phylostratum vector ps <- setNames(PhyloExpressionSetExample$Phylostratum, PhyloExpressionSetExample$GeneID) ## get pMatrix rcpp_pMatrix_parallel(spmat, ps)
computes the partial transcriptome evolutionary index (TEI) values combined into strata
rcpp_pStrata_parallel(expression, ps, psgroup, ncores = 1L)
rcpp_pStrata_parallel(expression, ps, psgroup, ncores = 1L)
expression |
ExpressionSet as sparseMatrix |
ps |
named Phylostratum |
psgroup |
ordered unique Phylostratum |
ncores |
number of cores |
sparseMatrix
Kristian K Ullrich
## load example PhyloExpressionSetExample data("PhyloExpressionSetExample", package="myTAI") ## convert into sparseMatrix - rownames GeneID spmat <- as(data.matrix(PhyloExpressionSetExample[,-c(1,2)]), "sparseMatrix") rownames(spmat) <- PhyloExpressionSetExample$GeneID ## create named Phylostratum vector ps <- setNames(PhyloExpressionSetExample$Phylostratum, PhyloExpressionSetExample$GeneID) psgroup <- sort(unique(ps)) ## get pStrata rcpp_pStrata_parallel(spmat, ps, psgroup)
## load example PhyloExpressionSetExample data("PhyloExpressionSetExample", package="myTAI") ## convert into sparseMatrix - rownames GeneID spmat <- as(data.matrix(PhyloExpressionSetExample[,-c(1,2)]), "sparseMatrix") rownames(spmat) <- PhyloExpressionSetExample$GeneID ## create named Phylostratum vector ps <- setNames(PhyloExpressionSetExample$Phylostratum, PhyloExpressionSetExample$GeneID) psgroup <- sort(unique(ps)) ## get pStrata rcpp_pStrata_parallel(spmat, ps, psgroup)
computes the phylogenetically based transcriptome evolutionary index (TEI)
rcpp_tei_parallel(expression, ps, ncores = 1L)
rcpp_tei_parallel(expression, ps, ncores = 1L)
expression |
ExpressionSet as sparseMatrix |
ps |
named Phylostratum |
ncores |
number of cores |
list
Kristian K Ullrich
## load example sequence data data("PhyloExpressionSetExample", package="myTAI") spmat <- as(data.matrix(PhyloExpressionSetExample[,-c(1,2)]), "sparseMatrix") rownames(spmat) <- PhyloExpressionSetExample$GeneID ps <- setNames(PhyloExpressionSetExample$Phylostratum, PhyloExpressionSetExample$GeneID) rcpp_tei_parallel(spmat, ps)
## load example sequence data data("PhyloExpressionSetExample", package="myTAI") spmat <- as(data.matrix(PhyloExpressionSetExample[,-c(1,2)]), "sparseMatrix") rownames(spmat) <- PhyloExpressionSetExample$GeneID ps <- setNames(PhyloExpressionSetExample$Phylostratum, PhyloExpressionSetExample$GeneID) rcpp_tei_parallel(spmat, ps)
This function computes the relative expression profiles of any given gene expression set. The relative expression profile is being computed as follows:
where and
denote the minimum/maximum mean expression level
over the developmental stages s. This linear transformation corresponds to a
shift by
and a subsequent shrinkage by
.
As a result, the relative expression level
of developmental stage s
with minimum
is 0, the relative expression level
of the
developmental stage s with maximum
is 1, and the relative expression
levels
of all other stages s range between 0 and 1, accordingly.
RE(ExpressionMatrix)
RE(ExpressionMatrix)
ExpressionMatrix |
a numeric matrix representing a gene expression matrix for which the relative expression profile shall be computed. |
a vector containing the relative expression profile of the correspnding data matrix.
Hajk-Georg Drost
Domazet-Loso T and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) # relative expression profile of PS1 genes RE(PhyloExpressionSetExample[ which(PhyloExpressionSetExample[ , 1] == 1), 3:9 ])
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) # relative expression profile of PS1 genes RE(PhyloExpressionSetExample[ which(PhyloExpressionSetExample[ , 1] == 1), 3:9 ])
The Reductive Hourglass Test aims to statistically evaluate the
existence of a phylotranscriptomic hourglass pattern based on TAI
or TDI
computations.
The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern)
does not follow an hourglass like shape. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does
indeed follow an hourglass (high-low-high) shape.
ReductiveHourglassTest( ExpressionSet, modules = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
ReductiveHourglassTest( ExpressionSet, modules = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
modules |
a list storing three elements: early, mid, and late. Each element expects a numeric
vector specifying the developmental stages or experiments that correspond to each module.
For example, |
permutations |
a numeric value specifying the number of permutations to be performed for the |
lillie.test |
a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed to quantify the goodness of fit. |
plotHistogram |
a boolean value specifying whether a Lillifor's Kolmogorov-Smirnov-Test shall be performed to test the goodness of fit of the approximated distribution, as well as additional plots quantifying the significance of the observed phylotranscriptomic pattern. |
runs |
specify the number of runs to be performed for goodness of fit computations, in case |
parallel |
performing |
gof.warning |
a logical value indicating whether non significant goodness of fit results should be printed as warning. Default is |
custom.perm.matrix |
a custom |
The reductive hourglass test is a permutation test based on the following test statistic.
(1) A set of developmental stages is partitioned into three modules - early, mid, and late - based on prior biological knowledge.
(2) The mean TAI
or TDI
value for each of the three modules T_early, T_mid, and T_late are computed.
(3) The two differences D1 = T_early - T_mid and D2 = T_late - T_mid are calculated.
(4) The minimum D_min of D1 and D2 is computed as final test statistic of the reductive hourglass test.
In order to determine the statistical significance of an observed minimum difference D_min
the following permutation test was performed. Based on the bootMatrix
D_min
is calculated from each of the permuted TAI
or TDI
profiles,
approximated by a Gaussian distribution with method of moments estimated parameters returned by fitdist
,
and the corresponding p-value is computed by pnorm
given the estimated parameters of the Gaussian distribution.
The goodness of fit for the random vector D_min is statistically quantified by an Lilliefors (Kolmogorov-Smirnov) test
for normality.
In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:
(1) A Cullen and Frey skewness-kurtosis plot generated by descdist
.
This plot illustrates which distributions seem plausible to fit the resulting permutation vector D_min.
In the case of the Reductive Hourglass Test a normal distribution seemed plausible.
(2) A histogram of D_min combined with the density plot is plotted. D_min is then fitted by a normal distribution.
The corresponding parameters are estimated by moment matching estimation using the fitdist
function.
(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.
(4) A barplot showing the number of cases in which the underlying goodness of fit (returned by Lilliefors (Kolmogorov-Smirnov) test
for normality) has shown to be significant (TRUE
) or not significant (FALSE
).
This allows to quantify the permutation bias and their implications on the goodness of fit.
a list object containing the list elements:
p.value
: the p-value quantifying the statistical significance (high-low-high pattern) of the given phylotranscriptomics pattern.
std.dev
: the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.
lillie.test
: a boolean value specifying whether the Lillifors KS-Test returned a p-value > 0.05,
which indicates that fitting the permuted scores with a normal distribution seems plausible.
Hajk-Georg Drost
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
M. L. Delignette-Muller, R. Pouillot, J.-B. Denis and C. Dutang (2014), fitdistrplus: help to fit of a parametric distribution to non-censored or censored data.
Cullen AC and Frey HC (1999) Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 81-159.
Evans M, Hastings N and Peacock B (2000) Statistical distributions. John Wiley and Sons Inc.
Sokal RR and Rohlf FJ (1995) Biometry. W.H. Freeman and Company, USA, pp. 111-115.
Juergen Gross and bug fixes by Uwe Ligges (2012). nortest: Tests for Normality. R package version 1.0-2.
http://CRAN.R-project.org/package=nortest
Dallal, G.E. and Wilkinson, L. (1986): An analytic approximation to the distribution of Lilliefors' test for normality. The American Statistician, 40, 294-296.
Stephens, M.A. (1974): EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69, 730-737.
http://stackoverflow.com/questions/4290081/fitting-data-to-distributions?rq=1
http://stats.stackexchange.com/questions/45033/can-i-use-kolmogorov-smirnov-test-and-estimate-distribution-parameters
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf
rhScore
, bootMatrix
, FlatLineTest
,
ReverseHourglassTest
, EarlyConservationTest
, PlotSignature
,
LateConservationTest
data(PhyloExpressionSetExample) # perform the reductive hourglass test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late ReductiveHourglassTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), permutations = 1000) # use your own permutation matrix based on which p-values (ReductiveHourglassTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) ReductiveHourglassTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), custom.perm.matrix = custom_perm_matrix)
data(PhyloExpressionSetExample) # perform the reductive hourglass test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late ReductiveHourglassTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), permutations = 1000) # use your own permutation matrix based on which p-values (ReductiveHourglassTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) ReductiveHourglassTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), custom.perm.matrix = custom_perm_matrix)
This function computes the relative expression profiles of all given phylostrata or divergence-strata within a given PhyloExpressionSet or DivergenceExpressionSet.
REMatrix(ExpressionSet)
REMatrix(ExpressionSet)
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
For each phylostratum or divergence-stratum the corresponding relative expression profile is being computed as follows:
where and
denote the minimum/maximum mean expression level of
phylostratum j over the developmental stages s. This linear transformation corresponds
to a shift by
and a subsequent shrinkage by
.
As a result, the relative expression level
of developmental stage s with minimum
is 0,
the relative expression level
of the developmental stage s with maximum
is 1,
and the relative expression levels
of all other stages s range between 0 and 1, accordingly.
Hajk-Georg Drost
Domazet-Loso T and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet REMatrix(PhyloExpressionSetExample) # example DivergenceExpressionSet REMatrix(DivergenceExpressionSetExample)
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet REMatrix(PhyloExpressionSetExample) # example DivergenceExpressionSet REMatrix(DivergenceExpressionSetExample)
This function reduces the destruction of an hourglass shaped pattern to a single score value.
Based on a given TAI
or TDI
pattern the given vector is being
divided into three developmental modules: early, mid, and late. The corrisponding TAI
or TDI
values in each developmental module are accumulated using the scoringMethod argument ("max-min" or "mean-mean").
In more detail:
(1) for a given TAI
or TDI
vector tai_profile or tdi_profile, we classify each value of tai_profile or tdi_profile into its corresponding developmental module early, mid, or late.
(2) accumulate the tai_profile or tdi_profile values in each developmental module using the arithmetic mean (mean
) in case scoringMethod = "mean-mean", or accumulate the tai_profile or tdi_profile values in each developmental module using max
for the early and late module and min
for the mid module in case scoringMethod = "max-min".
(3) then reduce the three values for each developmental module by computing the difference between: mid - early, and mid - late.
(4) the two difference values are referred to as a_early and a_late.
Each developmental module now has an accumulated representation value which is being reduced to one value using the method argument ("max", "min", or "mean").
Given the two accumulated values for each hourglass module: a_early and a_late, we reduce the two given values by:
"max":
"min":
"mean":
All together this results in a global score S. This global score S is being returned by this function.
reversehourglassScore( age_vals, early, mid, late, method, scoringMethod, profile.warn = FALSE )
reversehourglassScore( age_vals, early, mid, late, method, scoringMethod, profile.warn = FALSE )
age_vals |
a numeric vector containing |
early |
a numeric vector including the numeric stage values that correspond to the early phase of development. |
mid |
a numeric vector including the numeric stage values that correspond to the middle phase of development. |
late |
a numeric vector including the numeric stage values that correspond to the late phase of development. |
method |
to determine the two value reduction value, resulting in the global score S: "max", or "min", or "mean". |
scoringMethod |
method to determine the module accumulation value: "max-min" or "mean-mean". |
profile.warn |
a boolean value indicating whether a warning is printed when the low-high-low pattern isn't followed. |
The gpScore is a heuristic score enabling to construct a test statistic to determine the significance of a present (phylotranscriptomic) hourglass pattern.
a numeric value representing the hourglass destruction score.
Hajk-Georg Drost
Drost et al. (2015), Evidence for active maintenance of phylotranscriptomic hourglass patterns in animal and plant embryogenesis. Mol Bio Evol.
ReverseHourglassTest
, TAI
, TDI
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet: # compute the TAI profile TAIs <- TAI(PhyloExpressionSetExample) # compute the global reverse hourglass destruction score # for the TAIs profile using reduction method: mean(mean-mean) reversehourglass_score <- reversehourglassScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean") # example DivergenceExpressionSet: # compute the TDI profile TDIs <- TDI(DivergenceExpressionSetExample) # compute the global reverse hourglass destruction score for the TDIs profile # using reduction method: mean(mean-mean) reversehourglass_score <- reversehourglassScore(age_vals = TDIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean") # get warning if the expected pattern isn't followed reversehourglass_score <- reversehourglassScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean",profile.warn=TRUE)
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet: # compute the TAI profile TAIs <- TAI(PhyloExpressionSetExample) # compute the global reverse hourglass destruction score # for the TAIs profile using reduction method: mean(mean-mean) reversehourglass_score <- reversehourglassScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean") # example DivergenceExpressionSet: # compute the TDI profile TDIs <- TDI(DivergenceExpressionSetExample) # compute the global reverse hourglass destruction score for the TDIs profile # using reduction method: mean(mean-mean) reversehourglass_score <- reversehourglassScore(age_vals = TDIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean") # get warning if the expected pattern isn't followed reversehourglass_score <- reversehourglassScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean",profile.warn=TRUE)
The Reverse Hourglass Test aims to statistically evaluate the
existence of a reverse hourglass pattern based on TAI
or TDI
computations.
The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern)
does follow an hourglass like shape. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does
rather follow a reverse hourglass (low-high-low) shape.
ReverseHourglassTest( ExpressionSet, modules = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
ReverseHourglassTest( ExpressionSet, modules = NULL, permutations = 1000, lillie.test = FALSE, plotHistogram = FALSE, runs = 10, parallel = FALSE, gof.warning = FALSE, custom.perm.matrix = NULL )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
modules |
a list storing three elements: early, mid, and late. Each element expects a numeric
vector specifying the developmental stages or experiments that correspond to each module.
For example, |
permutations |
a numeric value specifying the number of permutations to be performed for the |
lillie.test |
a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed to quantify the goodness of fit. |
plotHistogram |
a boolean value specifying whether a Lillifor's Kolmogorov-Smirnov-Test shall be performed to test the goodness of fit of the approximated distribution, as well as additional plots quantifying the significance of the observed phylotranscriptomic pattern. |
runs |
specify the number of runs to be performed for goodness of fit computations, in case |
parallel |
performing |
gof.warning |
a logical value indicating whether non significant goodness of fit results should be printed as warning. Default is |
custom.perm.matrix |
a custom |
The reverse hourglass test is a permutation test based on the following test statistic.
(1) A set of developmental stages is partitioned into three modules - early, mid, and late - based on prior biological knowledge.
(2) The mean TAI
or TDI
value for each of the three modules T_early, T_mid, and T_late are computed.
(3) The two differences D1 = T_mid - T_early
and D2 = T_mid - T_late
are calculated.
(4) The minimum D_min
of D1
and D2
is computed as final test statistic of the reductive hourglass test.
In order to determine the statistical significance of an observed minimum difference D_min
the following permutation test was performed. Based on the bootMatrix
D_min
is calculated from each of the permuted TAI
or TDI
profiles,
approximated by a Gaussian distribution with method of moments estimated parameters returned by fitdist
,
and the corresponding p-value is computed by pnorm
given the estimated parameters of the Gaussian distribution.
The goodness of fit for the random vector D_min is statistically quantified by an Lilliefors (Kolmogorov-Smirnov) test for normality.
In case the parameter plotHistogram = TRUE, a multi-plot is generated showing:
(1) A Cullen and Frey skewness-kurtosis plot generated by descdist
.
This plot illustrates which distributions seem plausible to fit the resulting permutation vector D_min.
In the case of the Reverse Hourglass Test a normal distribution seemed plausible.
(2) A histogram of D_min
combined with the density plot is plotted. D_min is then fitted by a normal distribution.
The corresponding parameters are estimated by moment matching estimation using the fitdist
function.
(3) A plot showing the p-values for N independent runs to verify that a specific p-value is biased by a specific permutation order.
(4) A barplot showing the number of cases in which the underlying goodness of fit (returned by Lilliefors (Kolmogorov-Smirnov) test
for normality) has shown to be significant (TRUE
) or not significant (FALSE
).
This allows to quantify the permutation bias and their implications on the goodness of fit.
a list object containing the list elements:
p.value
: the p-value quantifying the statistical significance (low-high-low pattern) of the given phylotranscriptomics pattern.
std.dev
: the standard deviation of the N sampled phylotranscriptomics patterns for each developmental stage S.
lillie.test
: a boolean value specifying whether the Lillifors KS-Test returned a p-value > 0.05,
which indicates that fitting the permuted scores with a normal distribution seems plausible.
Hajk-Georg Drost
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
M. L. Delignette-Muller, R. Pouillot, J.-B. Denis and C. Dutang (2014), fitdistrplus: help to fit of a parametric distribution to non-censored or censored data.
Cullen AC and Frey HC (1999) Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 81-159.
Evans M, Hastings N and Peacock B (2000) Statistical distributions. John Wiley and Sons Inc.
Sokal RR and Rohlf FJ (1995) Biometry. W.H. Freeman and Company, USA, pp. 111-115.
Juergen Gross and bug fixes by Uwe Ligges (2012). nortest: Tests for Normality. R package version 1.0-2.
http://CRAN.R-project.org/package=nortest
Dallal, G.E. and Wilkinson, L. (1986): An analytic approximation to the distribution of Lilliefors' test for normality. The American Statistician, 40, 294-296.
Stephens, M.A. (1974): EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69, 730-737.
http://stackoverflow.com/questions/4290081/fitting-data-to-distributions?rq=1
http://stats.stackexchange.com/questions/45033/can-i-use-kolmogorov-smirnov-test-and-estimate-distribution-parameters
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf
reversehourglassScore
, bootMatrix
, FlatLineTest
, EarlyConservationTest
, PlotSignature
data(PhyloExpressionSetExample) # perform the reductive hourglass test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late ReverseHourglassTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), permutations = 1000) # use your own permutation matrix based on which p-values (ReverseHourglassTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) ReverseHourglassTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), custom.perm.matrix = custom_perm_matrix)
data(PhyloExpressionSetExample) # perform the reductive hourglass test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late ReverseHourglassTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), permutations = 1000) # use your own permutation matrix based on which p-values (ReverseHourglassTest) # shall be computed custom_perm_matrix <- bootMatrix(PhyloExpressionSetExample,100) ReverseHourglassTest(PhyloExpressionSetExample, modules = list(early = 1:2, mid = 3:5, late = 6:7), custom.perm.matrix = custom_perm_matrix)
This function reduces the destruction of an hourglass shaped pattern to a single score value.
Based on a given TAI
or TDI
pattern the given vector is being
divided into three developmental modules: early, mid, and late. The corrisponding TAI
or TDI
values in each developmental module are accumulated using the scoringMethod argument ("max-min" or "mean-mean").
In more detail:
(1) for a given TAI
or TDI
vector tai_profile or tdi_profile, we classify each value of tai_profile or tdi_profile into its corresponding developmental module early, mid, or late.
(2) accumulate the tai_profile or tdi_profile values in each developmental module using the arithmetic mean (mean
) in case scoringMethod = "mean-mean", or accumulate the tai_profile or tdi_profile values in each developmental module using max
for the early and late module and min
for the mid module in case scoringMethod = "max-min".
(3) then reduce the three values for each developmental module by computing the difference between: early - mid, and late - mid.
(4) the two difference values are referred to as a_early and a_late.
Each developmental module now has an accumulated representation value which is being reduced to one value using the method argument ("max", "min", or "mean").
Given the two accumulated values for each hourglass module: a_early and a_late, we reduce the two given values by:
"max":
"min":
"mean":
All together this results in a global score S.
This global score S is being returned by this function rhScore
.
rhScore( age_vals, early, mid, late, method, scoringMethod, profile.warn = FALSE )
rhScore( age_vals, early, mid, late, method, scoringMethod, profile.warn = FALSE )
age_vals |
a numeric vector containing |
early |
a numeric vector including the numeric stage values that correspond to the early phase of development. |
mid |
a numeric vector including the numeric stage values that correspond to the middle phase of development. |
late |
a numeric vector including the numeric stage values that correspond to the late phase of development. |
method |
to determine the two value reduction value, resulting in the global score S: "max", or "min", or "mean". |
scoringMethod |
method to determine the module accumulation value: "max-min" or "mean-mean". |
profile.warn |
a bolean value indicating whether a warning is printed when the high-low-high pattern isn't followed. |
The gpScore is a heuristic score enabling to construct a test statistic to determine the significance of a present (phylotranscriptomic) hourglass pattern.
a numeric value representing the hourglass destruction score.
Hajk-Georg Drost
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
ReductiveHourglassTest
, TAI
, TDI
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet: # compute the TAI profile TAIs <- TAI(PhyloExpressionSetExample) # compute the global hourglass destruction score # for the TAIs profile using reduction method: mean(mean-mean) rh_score <- rhScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean") # example DivergenceExpressionSet: # compute the TDI profile TDIs <- TDI(DivergenceExpressionSetExample) # compute the global hourglass destruction score for the TDIs profile # using reduction method: mean(mean-mean) rh_score <- rhScore(age_vals = TDIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean") # get warning if the expected pattern isn't followed rh_score <- rhScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean",profile.warn=TRUE)
# read standard phylotranscriptomics data data(PhyloExpressionSetExample) data(DivergenceExpressionSetExample) # example PhyloExpressionSet: # compute the TAI profile TAIs <- TAI(PhyloExpressionSetExample) # compute the global hourglass destruction score # for the TAIs profile using reduction method: mean(mean-mean) rh_score <- rhScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean") # example DivergenceExpressionSet: # compute the TDI profile TDIs <- TDI(DivergenceExpressionSetExample) # compute the global hourglass destruction score for the TDIs profile # using reduction method: mean(mean-mean) rh_score <- rhScore(age_vals = TDIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean") # get warning if the expected pattern isn't followed rh_score <- rhScore(age_vals = TAIs,early = 1:2,mid = 3:5,late = 6:7, method = "mean",scoringMethod = "mean-mean",profile.warn=TRUE)
Select a subset of genes stored in the input ExpressionSet
.
SelectGeneSet(ExpressionSet, gene.set, use.only.map = FALSE)
SelectGeneSet(ExpressionSet, gene.set, use.only.map = FALSE)
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
gene.set |
a character vector storing the gene ids for which gene expression profiles shall be visualized. |
use.only.map |
a logical value indicating whether instead of a standard |
This function selects a subset of genes specified in gene.set
stored in the input ExpressionSet
and returns a subset ExpressionSet
.
This function is useful for studying the evolutionary properties of a subset of genes
stored in the ExpressionSet
.
Hajk-Georg Drost
PlotGeneSet
, PlotEnrichment
, DiffGenes
data(PhyloExpressionSetExample) # receive a subset ExpressionSet for the fist 5 genes stored in # the PhyloExpressionSetExample SelectGeneSet(ExpressionSet = PhyloExpressionSetExample, gene.set = PhyloExpressionSetExample[1:5, 2]) # get a gene subset using only a phylostratihraphic map ExamplePSMap <- PhyloExpressionSetExample[ , 1:2] SelectGeneSet(ExpressionSet = ExamplePSMap, gene.set = PhyloExpressionSetExample[1:5, 2], use.only.map = TRUE)
data(PhyloExpressionSetExample) # receive a subset ExpressionSet for the fist 5 genes stored in # the PhyloExpressionSetExample SelectGeneSet(ExpressionSet = PhyloExpressionSetExample, gene.set = PhyloExpressionSetExample[1:5, 2]) # get a gene subset using only a phylostratihraphic map ExamplePSMap <- PhyloExpressionSetExample[ , 1:2] SelectGeneSet(ExpressionSet = ExamplePSMap, gene.set = PhyloExpressionSetExample[1:5, 2], use.only.map = TRUE)
Main function to return the removed genes after GATAI gene removal and plots transcriptome indices pattern before, after GATAI gene removal and by removing random genes (by default same number as GATAI removed).
SignatureGATAIGeneRemoval( ExpressionSet, measure = "TAI", TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, lillie.test = FALSE, p.value = TRUE, shaded.area = FALSE, custom.perm.matrix = NULL, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", lwd = 4, alpha = 0.1, y.ticks = 10, n_random_removal = "gatai", ... )
SignatureGATAIGeneRemoval( ExpressionSet, measure = "TAI", TestStatistic = "FlatLineTest", modules = NULL, permutations = 1000, lillie.test = FALSE, p.value = TRUE, shaded.area = FALSE, custom.perm.matrix = NULL, xlab = "Ontogeny", ylab = "Transcriptome Index", main = "", lwd = 4, alpha = 0.1, y.ticks = 10, n_random_removal = "gatai", ... )
ExpressionSet |
a standard PhyloExpressionSet, DivergenceExpressionSet or PolymorphismsExpressionSet object. |
measure |
type of transcriptome index that shall be computed. E.g.
|
TestStatistic |
a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern. Possible values can be:
|
modules |
a list storing three elements for the
|
permutations |
a numeric value specifying the number of permutations to be performed for the |
lillie.test |
a boolean value specifying whether the Lilliefors Kolmogorov-Smirnov Test shall be performed. |
p.value |
a boolean value specifying whether the p-value of the test statistic shall be printed as a subtitle. |
shaded.area |
a boolean value specifying whether a shaded area shall be drawn for the developmental stages defined to be the presumptive phylotypic period. |
custom.perm.matrix |
a custom |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
main |
figure title. |
lwd |
line width. |
alpha |
transparency of the shaded area (between [0,1]). Default is |
y.ticks |
number of ticks on the y-axis. Default is |
n_random_removal |
number of randomly removed genes for the third plot. Default is |
... |
parameters passed on to |
Filipa Martins Costa and Hajk-Georg Drost
This function computes the phylogenetically based transcriptome age index (TAI) introduced by Domazet-Loso & Tautz, 2010.
TAI(PhyloExpressionSet)
TAI(PhyloExpressionSet)
PhyloExpressionSet |
a standard PhyloExpressionSet object. |
The TAI measure represents the weighted arithmetic mean (expression levels as weights for the phylostratum value) over all evolutionary age categories denoted as phylostra.
where TAI_s denotes the TAI value in developmental stage s,
e_is denotes the gene expression level of gene i in stage s,
and ps_i denotes the corresponding phylostratum of gene i, and N = total number of genes.
Internally the function calls the C++ function cpp_TAI
to speed up TAI computations.
a numeric vector containing the TAI values for all given developmental stages.
Hajk-Georg Drost
Domazet-Loso T. and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
TDI
, PlotPattern
, FlatLineTest
, ReductiveHourglassTest
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample) # computing the TAI profile of a given PhyloExpressionSet object TAIs <- TAI(PhyloExpressionSetExample)
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample) # computing the TAI profile of a given PhyloExpressionSet object TAIs <- TAI(PhyloExpressionSetExample)
This function retrieves category information from NCBI Taxonomy and is able to filter kingdom specific taxids.
taxid(db.path, download = FALSE, update = FALSE, filter = NULL)
taxid(db.path, download = FALSE, update = FALSE, filter = NULL)
db.path |
path to download and store the NCBI Taxonomy |
download |
a logical value specifying whether or not the |
update |
should the local file be updated? Please specify the |
filter |
a character string specifying the kingdom of life for which taxids shall be returned. Options are
|
Hajk-Georg Drost
## Not run: # download categories.dmp file to current working directory # and filter for 'Archea' taxids Archea.taxids <- taxid(db.path = getwd(), filter = "Archea", download = TRUE) # Once the NCBI Taxonomy 'categories.dmp' file is downloaded to your machine ('download = TRUE') # the 'taxid()' function can be proceed on the local 'categories.dmp' file # e.g. filter for Virus taxids Virus.taxids <- taxid(db.path = getwd(), filter = "Viruses") ## End(Not run)
## Not run: # download categories.dmp file to current working directory # and filter for 'Archea' taxids Archea.taxids <- taxid(db.path = getwd(), filter = "Archea", download = TRUE) # Once the NCBI Taxonomy 'categories.dmp' file is downloaded to your machine ('download = TRUE') # the 'taxid()' function can be proceed on the local 'categories.dmp' file # e.g. filter for Virus taxids Virus.taxids <- taxid(db.path = getwd(), filter = "Viruses") ## End(Not run)
This function computes the sequence distance based transcriptome divergence index (TDI) introduced by Quint et al., 2012.
TDI(DivergenceExpressionSet)
TDI(DivergenceExpressionSet)
DivergenceExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
The TDI measure represents the weighted arithmetic mean (expression levels as weights for the divergence-stratum value) over all gene divergence categories denoted as divergence-strata.
where TDI_s denotes the TDI value in developmental stage s, e_is denotes the gene expression level of gene i in stage s, and ds_i denotes the corresponding divergence-stratum of gene i, and N = total number of genes.
Internally the function is written in C++ to speed up TDI computations.
a numeric vector containing the TDI values for all given developmental stages.
Hajk-Georg Drost
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
TAI
, PlotPattern
, FlatLineTest
, ReductiveHourglassTest
# reading a standard DivergenceExpressionSet data(DivergenceExpressionSetExample) # computing the TDI profile of a given DivergenceExpressionSet object TDIs <- TDI(DivergenceExpressionSetExample)
# reading a standard DivergenceExpressionSet data(DivergenceExpressionSetExample) # computing the TDI profile of a given DivergenceExpressionSet object TDIs <- TDI(DivergenceExpressionSetExample)
This function computes the phylogenetically based transcriptome evolutionary index (TEI) similar to Domazet-Loso & Tautz, 2010.
TEI( ExpressionSet, Phylostratum = NULL, split = 1e+05, showprogress = TRUE, threads = 1 )
TEI( ExpressionSet, Phylostratum = NULL, split = 1e+05, showprogress = TRUE, threads = 1 )
ExpressionSet |
expression object with rownames as GeneID (dgCMatrix) or standard PhyloExpressionSet object. |
Phylostratum |
a named vector representing phylostratum per GeneID with names as GeneID (not used if Expression is PhyloExpressionSet). |
split |
specify number of columns to split |
showprogress |
boolean if progressbar should be shown |
threads |
specify number of threads |
The TEI measure represents the weighted arithmetic mean (expression levels as weights for the phylostratum value) over all evolutionary age categories denoted as phylostra.
where TEI_s denotes the TEI value in developmental stage s,
e_is denotes the gene expression level of gene i in stage s,
and ps_i denotes the corresponding phylostratum of gene i,
and N = total number of genes.
a numeric vector containing the TEI values for all given cells or developmental stages.
Kristian K Ullrich
Domazet-Loso T. and Tautz D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (468): 815-818.
Quint M et al. (2012). A transcriptomic hourglass in plant embryogenesis. Nature (490): 98-101.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample, package = "myTAI") # computing the TEI profile of a given PhyloExpressionSet object TEI <- TEI(PhyloExpressionSetExample)
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample, package = "myTAI") # computing the TEI profile of a given PhyloExpressionSet object TEI <- TEI(PhyloExpressionSetExample)
This function transforms the gene expression set stored in an input PhloExpressionSet or DivergenceExpressionSet object and returns a PhloExpressionSet or DivergenceExpressionSet object with transformed expression levels. The resulting transformed PhloExpressionSet or DivergenceExpressionSet object can then be used for subsequent analyses based on transformed expression levels.
tf(ExpressionSet, FUN, pseudocount = 0, integerise = FALSE)
tf(ExpressionSet, FUN, pseudocount = 0, integerise = FALSE)
ExpressionSet |
a standard PhloExpressionSet or DivergenceExpressionSet object. |
FUN |
any valid function that transforms gene expression levels. |
pseudocount |
a numeric value to be added to the expression matrix prior to transformation. |
integerise |
a boolean specifying whether the expression data should be rounded to the nearest integer. |
Motivated by the dicussion raised by Piasecka et al., 2013, the influence of
gene expression transformation on the global phylotranscriptomics pattern does not seem negligible.
Hence, different transformations can result in qualitatively different TAI
or TDI
patterns.
Initially, the TAI
and TDI
formulas were defined for absolute expression levels.
So using the initial TAI
and TDI
formulas with transformed expression levels
might turn out in qualitatively different patterns when compared with non-transformed expression levels,
but might also belong to a different class of models, since different valid expression level transformation functions result in different patterns.
The purpose of this function is to allow the user to study the qualitative impact of different transformation functions on
the global TAI
and TDI
pattern, or on any subsequent phylotranscriptomics analysis.
The examples using the PhyloExpressionSetExample data set show that using common gene expression
transformation functions: log2
(Quackenbush, 2001 and 2002), sqrt
(Yeung et al., 2001),
boxcox
, or inverse hyperbolic sine transformation, each transformation results
in qualitatively different patterns.
Nevertheless, for each resulting pattern the statistical significance can be tested
using either the FlatLineTest
or ReductiveHourglassTest
(Drost et al., 2014)
to quantify the significance of interest.
a standard PhloExpressionSet or DivergenceExpressionSet object storing transformed gene expression levels.
Hajk-Georg Drost
Piasecka B, Lichocki P, Moretti S, et al. (2013) The hourglass and the early conservation models–co-existing patterns of developmental constraints in vertebrates. PLoS Genet. 9(4): e1003476.
Quint M., Drost H.G., Gabel A., Ullrich K.K., Boenn M., Grosse I. (2012) A transcriptomic hourglass in plant embryogenesis. Nature 490: 98-101.
Domazet-Loso T., Tautz D. (2010) A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature 468: 815-8.
Drost HG et al. (2015) Mol Biol Evol. 32 (5): 1221-1231 doi:10.1093/molbev/msv012
K.Y. Yeung et al.: Model-based clustering and data transformations for gene expression data. Bioinformatics 2001, 17:977-987
K.Y. Yeung et al.: Supplement to Model-based clustering and data transformations for gene expression data - Data Transformations and the Gaussian mixture assumption. Bioinformatics 2001, 17:977-987
P.A.C. Hoen et al.: Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Research 2008, Vol. 36, No. 21
H.H. Thygesen et al.: Comparing transformation methods for DNA microarray data. BMC Bioinformatics 2004, 5:77
John Quackenbush: Microarray data normalization and transformation. Nature Genetics 2002, 32:496-501
John Quackenbush: Computational Analysis of Microarray Data. Nature Reviews 2001, 2:418-427
R. Nadon and J. Shoemaker: Statistical issues with microarrays: processing and analysis. TRENDS in Genetics 2002, Vol. 18 No. 5:265-271
B.P. Durbin et al.: A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics 2002, 18:S105-S110
J. M. Bland et al.: Transforming data. BMJ 1996, 312:770
John B. Burbidge, Lonnie Magee and A. Leslie Robb (1988) Alternative Transformations to Handle Extreme Values of the Dependent Variable. Journal of the American Statistical Association, 83(401): 123-127.
G. E. P. Box and D. R. Cox (1964) An Analysis of Transformations. Journal of the Royal Statistical Society. Series B (Methodological), 26(2): 211-252.
TAI
, TDI
, FlatLineTest
, ReductiveHourglassTest
,
tfStability
## Not run: data(PhyloExpressionSetExample) # a simple example is to transform the gene expression levels # of a given PhyloExpressionSet using a sqrt or log2 transformation PES.sqrt <- tf(PhyloExpressionSetExample, sqrt) PES.log2 <- tf(PhyloExpressionSetExample, log2) # plotting the TAI using log2 transformed expression levels # and performing the Flat Line Test to obtain the p-value PlotSignature(ExpressionSet = tf(PhyloExpressionSetExample, log2), permutations = 1000) # in case the expression matrix contains 0s, a pseudocount can be added prior # to certain transformations, e.g. log2(x+1) where 1 is the pseudocount. PhyloExpressionSetExample[4,3] = 0 PES.log2 <- tf(PhyloExpressionSetExample, log2, pseudocount = 0) # this should return -Inf at PES.log2[4,3] the issue here is that # -Inf cannot be used to compute the phylotranscriptomic profile. PES.log2 <- tf(PhyloExpressionSetExample, log2, pseudocount = 1) # log2 transformed expression levels can now be used in downstream analyses. # to perform rank transformation PES.rank <- tf(PhyloExpressionSetExample, FUN = function(x) apply(x, 2, base::rank)) # rlog and vst transformations are now also possible by loading the DESeq2 package # and transforming the data with the parameter integerise = TRUE. library(DESeq2) # make sure the DESeq2 version >= 1.29.15 for rlog PES.vst <- tf(PhyloExpressionSetExample, vst, integerise = TRUE) ## End(Not run)
## Not run: data(PhyloExpressionSetExample) # a simple example is to transform the gene expression levels # of a given PhyloExpressionSet using a sqrt or log2 transformation PES.sqrt <- tf(PhyloExpressionSetExample, sqrt) PES.log2 <- tf(PhyloExpressionSetExample, log2) # plotting the TAI using log2 transformed expression levels # and performing the Flat Line Test to obtain the p-value PlotSignature(ExpressionSet = tf(PhyloExpressionSetExample, log2), permutations = 1000) # in case the expression matrix contains 0s, a pseudocount can be added prior # to certain transformations, e.g. log2(x+1) where 1 is the pseudocount. PhyloExpressionSetExample[4,3] = 0 PES.log2 <- tf(PhyloExpressionSetExample, log2, pseudocount = 0) # this should return -Inf at PES.log2[4,3] the issue here is that # -Inf cannot be used to compute the phylotranscriptomic profile. PES.log2 <- tf(PhyloExpressionSetExample, log2, pseudocount = 1) # log2 transformed expression levels can now be used in downstream analyses. # to perform rank transformation PES.rank <- tf(PhyloExpressionSetExample, FUN = function(x) apply(x, 2, base::rank)) # rlog and vst transformations are now also possible by loading the DESeq2 package # and transforming the data with the parameter integerise = TRUE. library(DESeq2) # make sure the DESeq2 version >= 1.29.15 for rlog PES.vst <- tf(PhyloExpressionSetExample, vst, integerise = TRUE) ## End(Not run)
This function performs transformation of phylostratum values.
tfPS(ExpressionSet, transform)
tfPS(ExpressionSet, transform)
ExpressionSet |
a standard PhyloExpressionSet object. |
transform |
a character vector of any valid function that transforms PS values. Possible values can be:
|
This function transforms the phylostratum assignment.
The return value of this function is a PhyloExpressionSet object with
transformed phylostratum tfPhylostratum
as the first column, satisfying
is.ExpressionSet
. Note that the input transform
must be an
available function, currently limited to only "qr"
(or "quantilerank"
).
a standard PhloExpressionSet object storing transformed Phylostratum levels.
Jaruwatana Sodai Lotharukpong and Lukas Maischak
# source the example dataset data(PhyloExpressionSetExample) # get the relative expression profiles for each phylostratum tfPES <- tfPS(PhyloExpressionSetExample, transform = "qr") head(tfPES)
# source the example dataset data(PhyloExpressionSetExample) # get the relative expression profiles for each phylostratum tfPES <- tfPS(PhyloExpressionSetExample, transform = "qr") head(tfPES)
tfStability aims to statistically evaluate the
stability of ReductiveHourglassTest
, FlatLineTest
,
ReverseHourglassTest
, EarlyConservationTest
, or
LateConservationTest
(all based on TAI
or TDI
computations) against different
data transformations.
The corresponding p-value quantifies the probability that a given TAI or TDI pattern (or any phylotranscriptomics pattern)
does not support the chosen test. A p-value < 0.05 indicates that the corresponding phylotranscriptomics pattern does
indeed support the chosen test.
tfStability( ExpressionSet, TestStatistic = "FlatLineTest", transforms = c("none", "sqrt", "log2", "rank", "squared"), modules = NULL, permutations = 1000, pseudocount = 1 )
tfStability( ExpressionSet, TestStatistic = "FlatLineTest", transforms = c("none", "sqrt", "log2", "rank", "squared"), modules = NULL, permutations = 1000, pseudocount = 1 )
ExpressionSet |
a standard PhyloExpressionSet or DivergenceExpressionSet object. |
TestStatistic |
a string defining the type of test statistics to be used to quantify the statistical significance the present phylotranscriptomics pattern. Possible values can be:
|
transforms |
a character vector of any valid function that transforms gene expression levels. |
modules |
a list storing three elements: early, mid, and late. Each element expects a numeric
vector specifying the developmental stages or experiments that correspond to each module.
For example, |
permutations |
a numeric value specifying the number of permutations to be performed for the |
pseudocount |
any valid number to add to the expression matrix prior to log transformations. |
An assessment for the stability of data transforms on the permutation test of choice.
For details, please consult tf
, ReductiveHourglassTest
,
FlatLineTest
, ReverseHourglassTest
,
LateConservationTest
or EarlyConservationTest
a vector object containing the vector elements:
p.value
: the p-value quantifying the statistical significance (depending on the chosen test) of the given phylotranscriptomics pattern under the given data transformation(s).
Jaruwatana Sodai Lotharukpong
Lotharukpong JS et al. (2023) (unpublished)
rhScore
, bootMatrix
, FlatLineTest
,
ReverseHourglassTest
, EarlyConservationTest
,
ReductiveHourglassTest
, PlotSignature
, LateConservationTest
data(PhyloExpressionSetExample) # perform the reductive hourglass test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late tfStability(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "ReductiveHourglassTest", permutations = 100, transforms = c("log2", "sqrt", "none"), modules = list(early = 1:2, mid = 3:5, late = 6:7)) # it is also possible to test the phylotranscriptomic pattern using rlog # and vst transforms from DESeq2 library(DESeq2) tfStability(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "ReductiveHourglassTest", permutations = 100, transforms = c("log2", "sqrt", "none", "vst"), modules = list(early = 1:2, mid = 3:5, late = 6:7))
data(PhyloExpressionSetExample) # perform the reductive hourglass test for a PhyloExpressionSet # here the prior biological knowledge is that stages 1-2 correspond to module 1 = early, # stages 3-5 to module 2 = mid (phylotypic module), and stages 6-7 correspond to # module 3 = late tfStability(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "ReductiveHourglassTest", permutations = 100, transforms = c("log2", "sqrt", "none"), modules = list(early = 1:2, mid = 3:5, late = 6:7)) # it is also possible to test the phylotranscriptomic pattern using rlog # and vst transforms from DESeq2 library(DESeq2) tfStability(ExpressionSet = PhyloExpressionSetExample, TestStatistic = "ReductiveHourglassTest", permutations = 100, transforms = c("log2", "sqrt", "none", "vst"), modules = list(early = 1:2, mid = 3:5, late = 6:7))
Select genes residing in the top quantile according to the mean of their expression across the stages
TopExpressionGenes(ExpressionSet, p = 0.99)
TopExpressionGenes(ExpressionSet, p = 0.99)
ExpressionSet |
A standard ExpressionSet |
p |
The quantile probability. Default is |
a character vector containing the gene ids residing in the top expression quantile
Stefan Manolache
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample) # select genes with highest variance (top 2%) genes.top_expression <- TopExpressionGenes(PhyloExpressionSetExample, p=.98) # remove top genes from the PhyloExpressionSet PhyloExpressionSet.top_removed <- subset(PhyloExpressionSetExample, !(GeneID %in% genes.top_expression)) # plot TAI of set with removed quantile PlotSignature(ExpressionSet = PhyloExpressionSet.top_removed, measure = "TAI", TestStatistic = "FlatLineTest", xlab = "Ontogeny", ylab = "TAI" )
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample) # select genes with highest variance (top 2%) genes.top_expression <- TopExpressionGenes(PhyloExpressionSetExample, p=.98) # remove top genes from the PhyloExpressionSet PhyloExpressionSet.top_removed <- subset(PhyloExpressionSetExample, !(GeneID %in% genes.top_expression)) # plot TAI of set with removed quantile PlotSignature(ExpressionSet = PhyloExpressionSet.top_removed, measure = "TAI", TestStatistic = "FlatLineTest", xlab = "Ontogeny", ylab = "TAI" )
Select genes residing in the top quantile according to the variance of their expression across the stages
TopVarianceGenes(ExpressionSet, p = 0.99)
TopVarianceGenes(ExpressionSet, p = 0.99)
ExpressionSet |
A standard ExpressionSet |
p |
The quantile probability. Default is |
a character vector containing the gene ids residing in the top variance expression quantile
Stefan Manolache
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample) # select genes with highest variance (top 5%) genes.top_variance <- TopVarianceGenes(PhyloExpressionSetExample, p=.95)
# reading a standard PhyloExpressionSet data(PhyloExpressionSetExample) # select genes with highest variance (top 5%) genes.top_variance <- TopVarianceGenes(PhyloExpressionSetExample, p=.95)
This function computes the Transcriptome Polymorphism Index (TPI) introduced by Gossmann et al., 2015.
TPI(PolymorphismExpressionSet)
TPI(PolymorphismExpressionSet)
PolymorphismExpressionSet |
a standard PolymorphismExpressionSet object. |
The TPI measure represents the weighted arithmetic mean (expression levels as weights) for the synonymous vs non-synonymous polymorphism ratios.
where TPI_s denotes the TPI value in developmental stage s, e_is denotes the gene expression level of gene i in stage s, n denotes the number of genes, PN and PS denote the numbers of non-synonymous and synonymous polymorphisms, and N and S are the numbers of nonsynonymous and synonymous sites, respectively.
Internally the function is written in C++ to speed up TPI computations.
a numeric vector containing the TPI values for all given developmental stages.
Hajk-Georg Drost
Gossmann et al. (2015). Transcriptomes of Plant Gametophytes Have a Higher Proportion of Rapidly Evolving and Young Genes than Sporophytes. Mol Biol Evol. 33 (7): 1669-1678.
TAI
, TDI
, PlotSignature
, PlotPattern
, FlatLineTest
, ReductiveHourglassTest
## Not run: # reading a standard PolymorphismExpressionSet data(PolymorphismExpressionSetExample) # computing the TPI profile of a given PolymorphismExpressionSet object TPIs <- TPI(PolymorphismExpressionSet) ## End(Not run)
## Not run: # reading a standard PolymorphismExpressionSet data(PolymorphismExpressionSetExample) # computing the TPI profile of a given PolymorphismExpressionSet object TPIs <- TPI(PolymorphismExpressionSet) ## End(Not run)