Skip to content

Alpha and Beta Diversity

Alfred Ssekagiri edited this page Jan 20, 2018 · 3 revisions

Alpha diversity with ANOVA

Alpha diversity (diversity among samples) of provided community data is calculated using selected indices/methods. Pair-wise ANOVA of diversity measures between groups is computed and a plot is produced for each of the selected methods(indices) annotated with significance labels.

The diversity measure (method) options include:"richness", "fisher", "simpson", "shannon" and "evenness". grouping_column is a categorical variable for which the grouping should be based on during the analysis. pValueCutoff specifies the p-value threshold for significance in ANOVA, default is set to 0.05. For the following examples, we use simpson, richness and shannon indices for calculating diversity.

Grouping by Country categorical variable.

p<-plot_anova_diversity(physeq, method = c("richness","simpson", "shannon"),grouping_column =  "Country",pValueCutoff=0.05)
print(p)

Grouping by Latrine categorical variable.

p<-plot_anova_diversity(physeq, method = c("richness","simpson", "shannon"),grouping_column =  "Latrine",pValueCutoff=0.05)
print(p)

Grouping by Depth categorical variable.

p<-plot_anova_diversity(physeq, method = c("richness","simpson", "shannon"),grouping_column =  "Depth",pValueCutoff=0.05)
print(p)

Beta diversity

Local Contribution to Beta diversity

To measure degree of uniqueness of a given sample to the variation in community composition, LCBD is calculated.

In the example provided below, we relative normalised taxa abundance to obtain the proportion of most abundant taxa per sample. This shows the features which are responsible for observed values of LCBD for a given sample.

The plot produced has points at the bottom whose diameter corresponds to magnitude of LCBD value coresponding to a particular sample, the bars correspond to taxa that are most abundant with the top taxa sharing a bigger portion of the bar for each sample.

A character string for the variable to be used for grouping is be specified as grouping_column. Dissimilarity coefficients method to be used specified as a character string method, default is set to "hellinger", other options for this include: "chord", "chisquare","profiles", "percentdiff", "ruzicka", "divergence", "canberra", "whittaker", "wishart", "kulczynski", "jaccard", "sorensen","ochiai", "ab.jaccard", "ab.sorensen","ab.ochiai", "ab.simpson" and "euclidean". Supplying a filename writes a file containing values for local contribution to beta diversity, corresponding p-value for each sample.

physeq <- normalise_data(physeq, norm.method = "relative")
p <- plot_taxa(physeq,grouping_column="Country",method="hellinger",number.taxa=21,filename=NULL)
print(p)

A file containing details of local contribution to beta diversity can be generated by setting supplying a filename. It contains LCBD values, associated p-values and group for each of the samples.

      Sample        LCBD p.LCBD Country
T_2_1   T_2_1 0.011716826  0.499       T
T_2_2   T_2_2 0.012600929  0.431       T
T_2_3   T_2_3 0.012908548  0.454       T
T_2_6   T_2_6 0.013977118  0.389       T
T_2_7   T_2_7 0.017756062  0.211       T
V_3_1   V_3_1 0.018815943  0.167       V
V_3_2   V_3_2 0.021974763  0.097       V
V_4_1   V_4_1 0.003868666  0.937       V
V_4_2   V_4_2 0.005543190  0.836       V
V_5_1   V_5_1 0.005736017  0.833       V
V_5_3   V_5_3 0.008149469  0.680       V
V_6_1   V_6_1 0.015033956  0.276       V

Ordination and beta dispersion

Ordination: This is the clustering procedure of samples to detect features that are more like each other in the dataset. We implement Non-metric multidimensional Scaling (NMDS) which is a rank based approach and PCoA also known as metric/classical multidimensional scaling which uses simmilarity or dissimilarity measure to group samples and provide a representation of original dataset in a lower dimension.

Beta-dispersion: This measures variances in abundance for a group of samples by computing average distance of individual groups to the group centroid, these distances are subjected to ANOVA to test whether they are different or not.The most significantly dispersed groups are annotated on the plot with corresponding significance labels.

We also implement permutation analysis of variance (PERMANOVA) and corresponding r-squared and p-values are anotated on NMDS plot, beta dispersion between all posible pairwise combinations of levels in the grouping variable is calculated and results presented as desired by using provided parameters.

The arguments include: physeq which a required phyloseq object, distance which is a dissimilarity distance measure with otions of "bray" (default), "wunifrac"and "unifrac". grouping_column is a character string specifying a variable whose levels are the groups in the data. pvalue.cutoff is threshold p-value for beta dispersion significance (default is 0.05). show.pvalues a logical variable for whether to show p-values in beta dispersion results or not, setting it to FALSE shows only the significance labels.num.signi.groups (optional): An integer for the number of signicant beta dispersion results to report, this is could be necessary in case of grouping_column variables has many levels to avoid overcrowding the plot area.method is a character string for ordination method, "NMDS" is the only available method so far.

To produce ordination of the data;

ord.res <- ordination(physeq,distance="bray",method="NMDS",grouping_column="Depth",pvalue.cutoff=0.05)

To plot the ordination results, plot_ordination function is used. It takes result of function ordination.

p <- plot_ordination(ord.res , method="NMDS", pvalue.cutoff=0.05, show.pvalues=T, num.signi.groups=NULL)
print(p)

An example of a plot produced by NMDS ordination method with a Depth as grouping variable.

Selecting PCoA as the ordination method reports the variance in original dataset explained by the first and second dimensions on the axes labels as percentages.

p <- plot_ordination(ord.res, method="PCoA" ,pvalue.cutoff=0.05, show.pvalues=T,num.signi.groups=NULL)
print(p)