Alpha and Beta Diversity
Alpha diversity (diversity among samples) of provided community data is calculated using selected indices/methods. Pair-wise ANOVA of diversity measures between groups is computed and a plot is produced for each of the selected methods(indices) annotated with significance labels.
The diversity measure (method
) options include:"richness", "fisher", "simpson", "shannon" and "evenness". grouping_column
is a categorical variable for which the grouping should be based on during the analysis. pValueCutoff
specifies the p-value threshold for significance in ANOVA
, default is set to 0.05. For the following examples, we use simpson, richness and shannon indices for calculating diversity.
Grouping by Country categorical variable.
p<-plot_anova_diversity(physeq, method = c("richness","simpson", "shannon"),grouping_column = "Country",pValueCutoff=0.05)
print(p)
Grouping by Latrine categorical variable.
p<-plot_anova_diversity(physeq, method = c("richness","simpson", "shannon"),grouping_column = "Latrine",pValueCutoff=0.05)
print(p)
Grouping by Depth categorical variable.
p<-plot_anova_diversity(physeq, method = c("richness","simpson", "shannon"),grouping_column = "Depth",pValueCutoff=0.05)
print(p)
To measure degree of uniqueness of a given sample to the variation in community composition, LCBD is calculated.
In the example provided below, we relative normalised taxa abundance to obtain the proportion of most abundant taxa per sample. This shows the features which are responsible for observed values of LCBD for a given sample.
The plot produced has points at the bottom whose diameter corresponds to magnitude of LCBD value coresponding to a particular sample, the bars correspond to taxa that are most abundant with the top taxa sharing a bigger portion of the bar for each sample.
A character string for the variable to be used for grouping is be specified as grouping_column
. Dissimilarity coefficients method to be
used specified as a character string method
, default is set to "hellinger", other options for this include: "chord", "chisquare","profiles", "percentdiff", "ruzicka", "divergence", "canberra", "whittaker", "wishart", "kulczynski", "jaccard", "sorensen","ochiai", "ab.jaccard", "ab.sorensen","ab.ochiai", "ab.simpson" and "euclidean". Supplying a filename writes a file containing values for local contribution to beta diversity, corresponding p-value for each sample.
physeq <- normalise_data(physeq, norm.method = "relative")
p <- plot_taxa(physeq,grouping_column="Country",method="hellinger",number.taxa=21,filename=NULL)
print(p)
A file containing details of local contribution to beta diversity can be generated by setting supplying a filename. It contains LCBD values, associated p-values and group for each of the samples.
Sample LCBD p.LCBD Country
T_2_1 T_2_1 0.011716826 0.499 T
T_2_2 T_2_2 0.012600929 0.431 T
T_2_3 T_2_3 0.012908548 0.454 T
T_2_6 T_2_6 0.013977118 0.389 T
T_2_7 T_2_7 0.017756062 0.211 T
V_3_1 V_3_1 0.018815943 0.167 V
V_3_2 V_3_2 0.021974763 0.097 V
V_4_1 V_4_1 0.003868666 0.937 V
V_4_2 V_4_2 0.005543190 0.836 V
V_5_1 V_5_1 0.005736017 0.833 V
V_5_3 V_5_3 0.008149469 0.680 V
V_6_1 V_6_1 0.015033956 0.276 V
Ordination: This is the clustering procedure of samples to detect features that are more like each other in the dataset. We implement Non-metric multidimensional Scaling (NMDS) which is a rank based approach and PCoA also known as metric/classical multidimensional scaling which uses simmilarity or dissimilarity measure to group samples and provide a representation of original dataset in a lower dimension.
Beta-dispersion: This measures variances in abundance for a group of samples by computing average distance of individual groups to the group centroid, these distances are subjected to ANOVA to test whether they are different or not.The most significantly dispersed groups are annotated on the plot with corresponding significance labels.
We also implement permutation analysis of variance (PERMANOVA) and corresponding r-squared and p-values are anotated on NMDS plot, beta dispersion between all posible pairwise combinations of levels in the grouping variable is calculated and results presented as desired by using provided parameters.
The arguments include: physeq
which a required phyloseq object, distance
which is a dissimilarity distance measure with otions of "bray" (default), "wunifrac"and "unifrac". grouping_column
is a character string specifying a variable whose levels are the groups in the data. pvalue.cutoff
is threshold p-value for beta dispersion significance (default is 0.05). show.pvalues
a logical variable for whether to show p-values in beta dispersion results or not, setting it to FALSE
shows only the significance labels.num.signi.groups
(optional): An integer for the number of signicant beta dispersion results to report, this is could be necessary in case of grouping_column
variables has many levels to avoid overcrowding the plot area.method
is a character string for ordination method, "NMDS" is the only available method so far.
To produce ordination of the data;
ord.res <- ordination(physeq,distance="bray",method="NMDS",grouping_column="Depth",pvalue.cutoff=0.05)
To plot the ordination results, plot_ordination
function is used. It takes result of function ordination
.
p <- plot_ordination(ord.res , method="NMDS", pvalue.cutoff=0.05, show.pvalues=T, num.signi.groups=NULL)
print(p)
An example of a plot produced by NMDS ordination method with a Depth as grouping variable.
Selecting PCoA as the ordination method reports the variance in original dataset explained by the first and second dimensions on the axes labels as percentages.
p <- plot_ordination(ord.res, method="PCoA" ,pvalue.cutoff=0.05, show.pvalues=T,num.signi.groups=NULL)
print(p)