Skip to content

Co ocurrence Pattern Analysis

Alfred Ssekagiri edited this page Jan 20, 2018 · 1 revision

Co-occurence pattern analysis

Co-occurence pattern analysis is used to identify co-occuring features/taxa in community data under specified environmental conditions. Co-occurence is measured as positive correlation whose threshold(s) can be specified as indicated in arguments section. Amongst these features, pairwise co-occurences which are outstanding within sub communities are detected. p-values generated during pairwise correlations tests are adjusted for multiple comparisons by false discovery rate. The network statistics used to assign importance of taxa/features include betweenness, closeness and eigenvector centrality.

To generate network of taxa under different conditions as specified by the grouping variable, the function co_occurence_network is used. In addition to a phyloseq object and grouping variable character string, other arguments include: rhos which a list of threshold correlations. The default is set to c(0.5, -0.5, 0.75 and -0.75). select.condition is an optional list of conditions which should be among the levels of grouping column. scale.vertex.size and scale.edge.width are numbers to adjust the size and width of vertices and edges respectively. method is a character string that specifies correlation method used in computing correlation between taxa. cor is the default with an option of bicor. ... is for other arguments parsable to network plot for example layout, label size among others. A plot showing relationship between eigen value and betweenness centality is obtained by setting plotBetweennessEeigenvalue=T.

We illustate this at Genus taxonomic level with a threshold correlation of 0.35 for Vietnam as specified in the selection of condition argument.

 physeq <- taxa_level(physeq, which_level = "Genus")
co_occr <- co_occurence_network(physeq, grouping_column = "Country", rhos = 0.35, select.condition = "V", scale.vertex.size=3, scale.edge.width=15)

Note: Nodes are colored as per corresponding sub community. The size of the nodes is proportional to its own total degree. The width of the edges is proportional to the correlation between the two nodes to which it corresponds. Positive and negative correlations between taxa(nodes) are indicated by blue and red color of the edges respectively.

For purpose of visualisation, we use the visNetwork package to create a dynamic representation of the network. Zoom in and out to explore the network.

require(visNetwork)

g <- co_occr$net$graph
data <- toVisNetworkData(g)
visNetwork(nodes = data$nodes, edges = data$edges, width = 900)%>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)       

Setting plotBetweennessEeigenvalue=T produces plot(s) of betweeness versus eigenvector centrality at each of the specified correlations and conditions. As noted earlier, these are measures of importance of taxa in the network. Betweenness of taxa in this case is a measure of taxa's control in the network. High betweenness centrality implies that a corresponding node has more influence in the network and viceversa. Eigen vector centrality measures taxa's linkage to others in the network taking into account how connected they are.Therefore, taxa with high eigenvector centrality is linked to highly linked taxa.

Roles of taxa

Features identified sub communities are assigned roles in the network. The metrics used include: within-module degree which measures how well a particular feature is connected to others in the same subcommunity (module) and among-module connectivity which measures how a feature is linked to other modules in the network. Features are classified as ultra peripherals, peripherals, provincial, connectors, kinless, module hubs, or non hubs.

The function takes a graph object returned from co_occurence_network function as an argument and assigns roles to each of the features in the network.

We illustrate this using the graph obtained above.

taxa.roles <- module.roles(co_occur$net$graph)
head(taxa.roles)

                                        z         p          roles
Dietzia                        -0.9114654 0.0000000 provincial hub
Streptomyces                   -0.9114654 0.0000000 provincial hub
Brumimicrobium                 -0.9114654 0.0000000 provincial hub
Nitrosomonas                    0.6076436 0.0000000 provincial hub
Sphingobacterium                0.6076436 0.0000000 provincial hub
unclassified_Flavobacteriaceae -0.1519109 0.4444444  connector hub

To produce a visualisation of the results, use function plot_roles which takes a result of module.roles.

p <- plot_roles(taxa.roles)

print(p)
Correlations between subcommunities and environmental variables

To explore how the sub communities respond to environmental traits, we consider the correlation betweeen the taxa with maximum betweenness within a sub community and the enviromental variables. This is chosen because it is a good representation of the subcommunity. This is implemented in accordance to [@mod_env_cor] where correlations between module-based eigengenes and environmental factors are used to detect the modules’ response to environmental change.

The function to perform this takes a result from co-occurence network. Other arguments include: select.variables, method,padjust.method, adjustment as they are available to function env_taxa_correlation (see this function for explanation of arguments). It returns a data frame with correlations between each sub community (module) and environmental variables for all conditions. The example below illustrates this by using the co occurence network obtained above.

mod.env.cor <- module_env_correlation(co_occur)

A visualisation is produced using the function plot_tax_env as illustrated below. Significant correlations are annotated with significant labels.

p <- plot_taxa_env(mod.env.cor)
print(p)

The plot shows correlation between module number (indicated by #number) and environmental traits.