Co ocurrence Pattern Analysis
Co-occurence pattern analysis is used to identify co-occuring features/taxa in community data under specified environmental conditions. Co-occurence is measured as positive correlation whose threshold(s) can be specified as indicated in arguments section. Amongst these features, pairwise co-occurences which are outstanding within sub communities are detected. p-values generated during pairwise correlations tests are adjusted for multiple comparisons by false discovery rate. The network statistics used to assign importance of taxa/features include betweenness, closeness and eigenvector centrality.
To generate network of taxa under different conditions as specified by the grouping variable, the function co_occurence_network
is used. In addition to a phyloseq object and grouping variable character string, other arguments include: rhos
which a list of threshold correlations. The default is set to c(0.5, -0.5, 0.75 and -0.75).
select.condition
is an optional list of conditions which should be among the levels of grouping column.
scale.vertex.size
and scale.edge.width
are numbers to adjust the size and width of vertices and edges respectively.
method
is a character string that specifies correlation method used in computing correlation between taxa. cor
is the default with an option of bicor
.
...
is for other arguments parsable to network plot for example layout, label size among others. A plot showing relationship between eigen value and betweenness centality is obtained by setting plotBetweennessEeigenvalue=T
.
We illustate this at Genus taxonomic level with a threshold correlation of 0.35 for Vietnam as specified in the selection of condition argument.
physeq <- taxa_level(physeq, which_level = "Genus")
co_occr <- co_occurence_network(physeq, grouping_column = "Country", rhos = 0.35, select.condition = "V", scale.vertex.size=3, scale.edge.width=15)
Note: Nodes are colored as per corresponding sub community. The size of the nodes is proportional to its own total degree. The width of the edges is proportional to the correlation between the two nodes to which it corresponds. Positive and negative correlations between taxa(nodes) are indicated by blue and red color of the edges respectively.
For purpose of visualisation, we use the visNetwork package to create a dynamic representation of the network. Zoom in and out to explore the network.
require(visNetwork)
g <- co_occr$net$graph
data <- toVisNetworkData(g)
visNetwork(nodes = data$nodes, edges = data$edges, width = 900)%>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
Setting plotBetweennessEeigenvalue=T
produces plot(s) of betweeness versus eigenvector centrality at each of the specified correlations and conditions. As noted earlier, these are measures of importance of taxa in the network. Betweenness of taxa in this case is a measure of taxa's control in the network. High betweenness centrality implies that a corresponding node has more influence in the network and viceversa. Eigen vector centrality measures taxa's linkage to others in the network taking into account how connected they are.Therefore, taxa with high eigenvector centrality is linked to highly linked taxa.
Features identified sub communities are assigned roles in the network. The metrics used include: within-module degree which measures how well a particular feature is connected to others in the same subcommunity (module) and among-module connectivity which measures how a feature is linked to other modules in the network. Features are classified as ultra peripherals, peripherals, provincial, connectors, kinless, module hubs, or non hubs.
The function takes a graph object returned from co_occurence_network
function as an argument and assigns roles to each of the features in the network.
We illustrate this using the graph obtained above.
taxa.roles <- module.roles(co_occur$net$graph)
head(taxa.roles)
z p roles
Dietzia -0.9114654 0.0000000 provincial hub
Streptomyces -0.9114654 0.0000000 provincial hub
Brumimicrobium -0.9114654 0.0000000 provincial hub
Nitrosomonas 0.6076436 0.0000000 provincial hub
Sphingobacterium 0.6076436 0.0000000 provincial hub
unclassified_Flavobacteriaceae -0.1519109 0.4444444 connector hub
To produce a visualisation of the results, use function plot_roles
which takes a result of module.roles
.
p <- plot_roles(taxa.roles)
print(p)
To explore how the sub communities respond to environmental traits, we consider the correlation betweeen the taxa with maximum betweenness within a sub community and the enviromental variables. This is chosen because it is a good representation of the subcommunity. This is implemented in accordance to [@mod_env_cor] where correlations between module-based eigengenes and environmental factors are used to detect the modules’ response to environmental change.
The function to perform this takes a result from co-occurence network. Other arguments include: select.variables
, method
,padjust.method
, adjustment
as they are available to function env_taxa_correlation
(see this function for explanation of arguments). It returns a data frame with correlations between each sub community (module) and environmental variables for all conditions. The example below illustrates this by using the co occurence network obtained above.
mod.env.cor <- module_env_correlation(co_occur)
A visualisation is produced using the function plot_tax_env
as illustrated below. Significant correlations are annotated with significant labels.
p <- plot_taxa_env(mod.env.cor)
print(p)
The plot shows correlation between module number (indicated by #number) and environmental traits.